Data Science Colloquium

of the ENS

Welcome to the Data Science Colloquium of the ENS.

This colloquium is organized around data sciences in a broad sense with the goal of bringing together researchers with diverse backgrounds (including for instance mathematics, computer science, physics, chemistry and neuroscience) but a common interest in dealing with large scale or high dimensional data.

The colloquium is followed by an open buffet around which participants can meet and discuss collaborations.

These seminars are made possible by the support of the CFM-ENS Chair “Modèles et Sciences des Données.

Next seminars

27 June 2024, 11h30-12h30 (Paris time), room Amphi Jaures (29 Rue d'Ulm).
Andrea Montanari (Stanford University)
Title: Statistical phenomena in data selection and data enrichment
Abstract: Building powerful machine learning models and training them has become increasingly possible thanks to new architecture, software infrastructure, and the prevalence of foundation models. Nowadays, developing a high-quality dataset for a specific use case of interest is often the key bottleneck to successful machine learning applications. I will discuss two approaches towards alleviating this problem: selecting highly informative samples from a large dataset, and merging a small data sat with surrogate data from a different source. In will overview some of the ideas in the literature on this problem, and present some findings arising from the analysis of simple statistical models. [Based on joint work with Germain Kolossov, Ayush Jain, Eren Sasoglu, Pulkit Tandon]