Welcome to the age of industrialized AI!
With bigger and bigger models on the rise, how do you know you can scale to meet the incredible demands of tomorrow?
Join us and learn how to run massively distributed scaling, serve inference to millions or billions of users and deploy thousands of models to production now.
The agenda is not final yet; stay tuned!
Data preparation: automation and scale using Fabric?
We will explore the use of Deep Lake, data lake for deep learning, to train Large Visual Models (LVMs) such as CLIP, Stable Diffusion, etc. We will cover the challenges of scaling visual models and the benefits of using Deep Lake to overcome these challenges. The talk will cover a range of topics, including data scaling laws, data feeding strategies, how to use it to train and deploy large visual models, and best practices for managing and monitoring the training process. The talk will be valuable for anyone interested in training large visual models, and how to use Deep Lake to overcome the challenges of scaling.
Powerful new large language or foundation models like Dalle-2, Midjourney, Stable Diffusion, ChatGPT, Galactica, and more have taken the AI space by storm — thanks to incredible capabilities around text generation, image synthesis, and more. Noticeably, these models are trained on vast amounts of data on the Internet. Properly labeling and managing these massive datasets is crucial for the successful training and deployment of foundation models. However, this task can be daunting and time-consuming, particularly for companies with limited resources. This talk will explore best practices for labeling and managing massive datasets in the age of industrialized AI, such as:
Too often, data scientists and ML engineers are rushing to develop their ML models and either skip over testing, or don’t have the proper tools to do systematic testing. The irony is that insufficient testing often slows down their ability to get to an effective, approved ML model.
So what are the tests and processes that make for faster development and better ML performance? Daniel Wibowo will cover how testing makes a big difference in scaling high performance AI, including:
As teams turn to machine learning (ML) to drive innovation and transform their operations, they often face the challenge of scaling ML models to a variety of environments. These environments can include on-premises data centers, private clouds, public clouds, hybrid clouds, air gapped systems, and edge devices. Each of these environments brings its own set of challenges and considerations, from infrastructure and security to data management and regulatory compliance. In this talk, we will delve into these challenges and explore how to overcome them in order to successfully deploy and scale ML models across a wide range of environments.
Data quality can make or break the success of any data science project and data profiling is an indispensable process to monitor it. Pandas Profiling is currently the top data profiling package available as open source. In this lighting talk, I'll go over the importance of data quality and data profiling, and the remarkable features of Pandas Profiling that made the data science community fall in love.
"Algorithmic bias in machine learning is both incredibly important and deeply challenging for any organization that uses ML because bias can occur in all stages of the ML lifecycle: in the generation of the dataset, in the training and evaluation phases of a model, and after shipping the model. This presentation will dive into bias tracing on a multidimensional level by enabling the features and cohorts likely contributing to algorithmic bias in production. Teams who are working on image or speech recognition, product recommendations or automation, can all unwittingly encode biases from exploration of the data or from historical biases present in the data itself. Understanding if metrics are impacted by a lack of adequate representation of a sensitive class (or overrepresentation by a base class) is critical for ML teams who want to be able to trace biases before they are integrated into systems that are deployed world-wide. We’ll review some common challenges to building, troubleshooting, and evaluating systems with equity as a top of mind issue, and the path forward. This talk is relevant to data managers, scientists and engineers who want to get to the root of a fairness issue–or are looking to build products with fairness in mind."