AI Infrastructure Alliance
timezone
+00:00 GMT
SIGN IN
Livestream
AI at Scale

Welcome to the age of industrialized AI!

With bigger and bigger models on the rise, how do you know you can scale to meet the incredible demands of tomorrow?

Join us and learn how to run massively distributed scaling, serve inference to millions or billions of users and deploy thousands of models to production now.

Speakers
Daniel Jeffries
Daniel Jeffries
Managing Director @ AI Infrastructure Alliance
Fabiana Clemente
Fabiana Clemente
CDO @ YData
Davit Buniatyan
Davit Buniatyan
CEO @ Activeloop
Daniel Wibowo
Daniel Wibowo
Sr. Product Marketing Manager @ TruEra
Seth Clark
Seth Clark
Head of Product & Co-founder @ Modzy
Miriam Santos
Miriam Santos
Data Advocate @ YData
Amber Roberts
Amber Roberts
Machine Learning Engineer @ Arize
Greg Throne
Greg Throne
Technical Product Manager @ FeatureBase
Niccolò Zanichelli
Niccolò Zanichelli
Community Lead @ OpenBioML
Alex Havrilla
Alex Havrilla
StabilityAI PhD Fellow @ StabilityAI
Herbie Bradley
Herbie Bradley
Research Scientist at EleutherAI, PhD student at University of Cambridge @ EleutherAI
Gilad Shaham
Gilad Shaham
Director of Product Management @ Iguazio
Jennifer Prendki
Jennifer Prendki
Founder and CEO @ Alectio
Eric Landau
Eric Landau
Co-founder and CEO @ Encord
Victor Sonck
Victor Sonck
Evangelist @ ClearML
Luca Antiga
Luca Antiga
CTO @ Lightning.ai
Alejandro (Alex) Muller
Alejandro (Alex) Muller
Founder @ Savvi AI
Bernease Herman
Bernease Herman
Senior Data Scientist @ WhyLabs
Hoon Han
Hoon Han
ML Engineer @ Superb AI
Ed Shee
Ed Shee
Head of Developer Relations @ Seldon
Daniel Langkilde
Daniel Langkilde
CEO and Co-Founder @ Kognic
Amit Phadke
Amit Phadke
Chief Product Officer @ Bosch AIShield
Daniel Jeffries
Daniel Jeffries
Managing Director @ AI Infrastructure Alliance
Fabiana Clemente
Fabiana Clemente
CDO @ YData
Davit Buniatyan
Davit Buniatyan
CEO @ Activeloop
Daniel Wibowo
Daniel Wibowo
Sr. Product Marketing Manager @ TruEra
Seth Clark
Seth Clark
Head of Product & Co-founder @ Modzy
Miriam Santos
Miriam Santos
Data Advocate @ YData
Amber Roberts
Amber Roberts
Machine Learning Engineer @ Arize
Greg Throne
Greg Throne
Technical Product Manager @ FeatureBase
Niccolò Zanichelli
Niccolò Zanichelli
Community Lead @ OpenBioML
Alex Havrilla
Alex Havrilla
StabilityAI PhD Fellow @ StabilityAI
Herbie Bradley
Herbie Bradley
Research Scientist at EleutherAI, PhD student at University of Cambridge @ EleutherAI
Gilad Shaham
Gilad Shaham
Director of Product Management @ Iguazio
Jennifer Prendki
Jennifer Prendki
Founder and CEO @ Alectio
Eric Landau
Eric Landau
Co-founder and CEO @ Encord
Victor Sonck
Victor Sonck
Evangelist @ ClearML
Luca Antiga
Luca Antiga
CTO @ Lightning.ai
Alejandro (Alex) Muller
Alejandro (Alex) Muller
Founder @ Savvi AI
Bernease Herman
Bernease Herman
Senior Data Scientist @ WhyLabs
Hoon Han
Hoon Han
ML Engineer @ Superb AI
Ed Shee
Ed Shee
Head of Developer Relations @ Seldon
Daniel Langkilde
Daniel Langkilde
CEO and Co-Founder @ Kognic
Amit Phadke
Amit Phadke
Chief Product Officer @ Bosch AIShield
Agenda
Day 1
Day 2
Track 1
Track 2
4:00 PM
4:40 PM
Presentation
calendar

The Age of Industrialized AI and What It Means for the Future

Daniel Jeffries
4:40 PM
5:00 PM
Presentation
calendar

Hyperparameter Optimizing a Transformer on an Autoscaling Cluster. In less than 20 minutes!

Victor Sonck
5:00 PM
5:20 PM
1:1 networking
calendar

Networking

5:20 PM
5:40 PM
Presentation
calendar

The 4 Types of ML Model Testing that Drive AI Performance

Too often, data scientists and ML engineers are rushing to develop their ML models and either skip over testing, or don’t have the proper tools to do systematic testing. The irony is that insufficient testing often slows down their ability to get to an effective, approved ML model.

So what are the tests and processes that make for faster development and better ML performance? Daniel Wibowo will cover how testing makes a big difference in scaling high performance AI, including:

  • The top 4 types of ML model testing
  • At what points in ML model development should you be testing?
  • What are the key metrics that you should be paying attention to?
  • Demonstration of test harness in action
+ Read More
Daniel Wibowo
5:20 PM
5:50 PM
Presentation
calendar

Maintaining Up-to-the-Millisecond Freshness Across 20B Devices

Tremor Video is a global advertising platform that monitors data from video views across 20B devices every day. With their 1000-node Hadoop infrastructure, it was taking 24 hours to make device data available within their platform for their users. Learn how they adjusted their architecture, implementing FeatureBase as the real-time linchpin, increasing their ability to handle updates to 120B/day, decreasing their cost by 70%, and enabling buyers and sellers on their platform to deliver personalized experiences based on fresh, accurate data.

+ Read More
Greg Throne
5:40 PM
6:10 PM
Presentation
calendar

Labeling and Managing Massive Datasets In The Age of Industrialized AI

Powerful new large language or foundation models like Dalle-2, Midjourney, Stable Diffusion, ChatGPT, Galactica, and more have taken the AI space by storm — thanks to incredible capabilities around text generation, image synthesis, and more. Noticeably, these models are trained on vast amounts of data on the Internet. Properly labeling and managing these massive datasets is crucial for the successful training and deployment of foundation models. However, this task can be daunting and time-consuming, particularly for companies with limited resources.
This talk will explore best practices for labeling and managing massive datasets in the age of industrialized AI, such as:


  • The role of human annotators versus automated labeling methods
  • 
Strategies for efficiently and accurately labeling large datasets

  • Tools and technologies for managing and versioning data

  • Tips for training and evaluating AI models with massive datasets
+ Read More
Hoon Han
5:50 PM
6:10 PM
Presentation
calendar

Open and Efficient Reinforcement Learning from Human Feedback

Over the past couple months CarperAI has built trlX, one of the first open source RLHF implementations capable of fine-tuning large language models at scale. We test offline reinforcement algorithms to reduce compute requirements and explore the practicality of synthetic preference data, finding both can be combined to significantly reduce expensive RLHF costs

+ Read More
Alex Havrilla
6:10 PM
6:40 PM
Presentation
calendar

Deploy once, run anywhere: A new approach for scaling production ML to any environment

As teams turn to machine learning (ML) to drive innovation and transform their operations, they often face the challenge of scaling ML models to a variety of environments. These environments can include on-premises data centers, private clouds, public clouds, hybrid clouds, air gapped systems, and edge devices. Each of these environments brings its own set of challenges and considerations, from infrastructure and security to data management and regulatory compliance. In this talk, we will delve into these challenges and explore how to overcome them in order to successfully deploy and scale ML models across a wide range of environments.

+ Read More
Seth Clark
6:10 PM
6:40 PM
Presentation
calendar

Fairness Metrics and Bias Tracing in Production

"Algorithmic bias in machine learning is both incredibly important and deeply challenging for any organization that uses ML because bias can occur in all stages of the ML lifecycle: in the generation of the dataset, in the training and evaluation phases of a model, and after shipping the model. This presentation will dive into bias tracing on a multidimensional level by enabling the features and cohorts likely contributing to algorithmic bias in production. Teams who are working on image or speech recognition, product recommendations or automation, can all unwittingly encode biases from exploration of the data or from historical biases present in the data itself. Understanding if metrics are impacted by a lack of adequate representation of a sensitive class (or overrepresentation by a base class) is critical for ML teams who want to be able to trace biases before they are integrated into systems that are deployed world-wide. We’ll review some common challenges to building, troubleshooting, and evaluating systems with equity as a top of mind issue, and the path forward. This talk is relevant to data managers, scientists and engineers who want to get to the root of a fairness issue–or are looking to build products with fairness in mind."

+ Read More
Amber Roberts
6:40 PM
7:00 PM
1:1 networking
calendar

Networking

6:40 PM
7:00 PM
1:1 networking
calendar

Networking

7:00 PM
7:30 PM
Presentation
calendar

Truly scalable monitoring and observability for massive NLP and CV datasets

Bernease Herman
7:00 PM
7:40 PM
Presentation
calendar

Scaling Up Foundation Models with Lightning

Lightning powers many of the large-scale models in the wild today. We will start off by looking at a few examples and we’ll introduce Lightning Fabric, a new package specifically designed to provide the extra flexibility needed when large models are at stake.

+ Read More
Luca Antiga
7:30 PM
7:50 PM
Presentation
calendar

Scaling DataPrepOps

Data preparation: automation and scale using Fabric?

+ Read More
Fabiana Clemente
7:40 PM
8:00 PM
Presentation
calendar

Production-First Approach: How to Scale up your ML Operations

Many organizations encounter barriers to deploy their machine-learning and deep-learning models to production. In this talk, Gilad will share how you can optimize your process therefore allowing your data science team to focus on developing the model. This approach enables you to support batch and real-time deployments while minimizing operational complexity. We will also discuss ways to help the data scientists be more effective without having to deal with complexities of the underlying technologies.

+ Read More
Gilad Shaham
4:00 PM
4:30 PM
Presentation
calendar

Optimal Representations for Human Feedback

With the emergence of machine learning, we moved to a paradigm of “programming by example”. While that unlocks new and amazing capabilities, it is also causing a lot of manual work to produce the examples from which our models can learn. In this talk we will learn about how data can be represented such that human feedback has maximum impact on model performance.

+ Read More
Daniel Langkilde
4:30 PM
4:40 PM
Presentation
calendar

Speeding up Inference of Large Language Models With Triton and FasterTransformer

Triton Inference Server and FasterTransformer are solutions from Nvidia for deploying Transformer language models for fast inference at scale. I will talk about my experience successfully deploying these libraries to speed up inference of our code generation models in research by up to 10x.

+ Read More
Herbie Bradley
4:40 PM
5:00 PM
Presentation
calendar

Mind the GAP

You build great models – that’s not enough. Understand the expectations GAP between data science and product, business and engineering teams.

+ Read More
Alejandro (Alex) Muller
5:00 PM
5:20 PM
1:1 networking
calendar

Networking

5:20 PM
5:50 PM
Presentation
calendar

Optimizing Inference For State Of The Art Python Models

Machine learning models are often created with an emphasis on how they run during training but with little regard for how they’ll perform in production. In this talk, you’ll learn what those issues are and how to address them using some state of the art models as an example. We’ll introduce the open source project, MLServer, and look at how features, such as multi-model serving and adaptive batching, can optimize performance for your models. Finally, you’ll learn how using an inference server locally can speed up the time to deployment when moving to production.

+ Read More
Ed Shee
5:50 PM
6:10 PM
Presentation
calendar

Active Learning & the ML Team of the Future

In this session, Eric Landau, CEO & co-founder of Encord, will provide an overview of the current state of active learning as well as its applications in the future. He will dive into how leading ML and data teams across industries are starting to embed active learning into their ML pipelines, what a best-in-class team will look like in 2030, and what teams leading the way in the next decade will be doing differently to stay ahead.

+ Read More
Eric Landau
6:10 PM
6:35 PM
Presentation
calendar

Securing AI systems at Scale

AI security at scale is a crucial concern for organizations as they increasingly adopt AI-based systems and devices. In this talk, we will delve into the various challenges and threats facing AI security and present effective measures for protection. Using relevant examples, we will demonstrate how vulnerabilities in AI models can have severe impacts when not addressed during scaling for multi-model systems and deployments across millions of devices. We will also explore the importance of a security-by-design approach, automation tools, cloud-based solutions, threat intelligence feeds, and the adoption of security standards and best practices in achieving AI security at scale. By the end of the talk, attendees will have a comprehensive understanding of the key considerations and steps necessary for implementing AI security at scale to ensure the security and integrity of their AI systems.

+ Read More
Amit Phadke
6:35 PM
6:50 PM
1:1 networking
calendar

Networking

6:50 PM
7:10 PM
Presentation
calendar

Training Large Visual Models with Deep Lake

We will explore the use of Deep Lake, data lake for deep learning, to train Large Visual Models (LVMs) such as CLIP, Stable Diffusion, etc. We will cover the challenges of scaling visual models and the benefits of using Deep Lake to overcome these challenges. The talk will cover a range of topics, including data scaling laws, data feeding strategies, how to use it to train and deploy large visual models, and best practices for managing and monitoring the training process. The talk will be valuable for anyone interested in training large visual models, and how to use Deep Lake to overcome the challenges of scaling.

+ Read More
Davit Buniatyan
7:10 PM
7:30 PM
Presentation
calendar

Auditing Data Quality with Pandas Profiling

Data quality can make or break the success of any data science project and data profiling is an indispensable process to monitor it. Pandas Profiling is currently the top data profiling package available as open source. In this lighting talk, I'll go over the importance of data quality and data profiling, and the remarkable features of Pandas Profiling that made the data science community fall in love.

+ Read More
Miriam Santos
7:30 PM
8:00 PM
Presentation
calendar

AI at Scale in Computational Biology

Niccolò Zanichelli
8:00 PM
8:30 PM
Presentation
calendar

What the *%& is DataPrepOps?

Today, there are more and more talks of MLOps and DataOps, however the nascent space of DataPrepOps is still misunderstood of even top experts. Where does DataPrepOps fit in the ML tooling market, and how is it revolutionalizing the way we do Machine Learning? To discover the answers to those question, join Jennifer, founder and CEO of Alectio, for this session.

+ Read More
Jennifer Prendki
Event has finished
February 23, 4:00 PM, GMT
Online
Organized by
AI Infrastructure Alliance
AI Infrastructure Alliance
Event has finished
February 23, 4:00 PM, GMT
Online
Organized by
AI Infrastructure Alliance
AI Infrastructure Alliance