Motivation and aim
Machine Learning (ML) models, despite their prowess, often encounter challenges when deployed in the real world. Model failure cases are especially consequential in high-stakes domains like healthcare. While it is well-known that the quality of data used to train Machine Learning models is crucial to their success or failure, it is often undervalued. The richness and diversity of the training set become particularly important in dynamic scenarios, where the data distribution shifts as time goes on. When this happens, models must incorporate new developments without forgetting previous knowledge. The emergence of Data-Centric AI gives the data used in AI/ML and its quality center stage and seeks to develop tools for systematic characterization, evaluation, and monitoring of the data used to train and evaluate ML models.
Learning objectives
We will address the following topics:
- Motivate and introduce Data-centric AI, a topic of emerging importance for AI.
- Familiarize attendees with the challenges that we encounter when deploying ML models in dynamic, changing, scenarios; and present solutions from the field of federated and continual learning.
- Increase hands-on practical coding experience via demonstrations with recent Data-centric and Continual AI tools/methods.
Schedule
The tutorial will consist of two parts. The first half of the tutorial will provide participants with a comprehensive introduction to recent advances in Data-Centric AI with a focus on medical imaging and unique challenges in healthcare. We will frame the data-centric lens in terms of (i) model performance and robustness and (ii) bias and fairness. The tutorial will show applicability to the entire ML pipeline, providing practical use-cases on medical imaging. This end-to-end approach will enable participants to practically engage with Data-Centric AI for their own problems - from a researcher and practitioner perspective.
The second part of the tutorial will focus on adapting the learning process to the ever-changing world. Medical data is - rightly - subjected to spatial and temporal availability constraints designed to protect patient privacy. Federated learning handles spatial restrictions by training models in a distributed fashion, so samples never need to leave their place of acquisition. Continual learning addresses the situation where data collected at a later time point comprises new acquisition conditions or demographics. We will introduce methods from federated and continual learning and explore the challenges associated with actually building, approving, and deploying medical dynamic learning solutions.
In both parts, the tutorial will contain hands-on sessions: We will link each tutorial component with practical case studies including coding demonstrations and software tools. This will ensure an immersive learning experience for our participants.
We will explore our learning objectives with expert presentations and in a practical setting. Participants will be divided into small groups. Each group will be assigned a medical imaging task, such as lung nodule detection or cardiac MR segmentation, alongside an existing architecture that achieves good performance in a static setting.