You can find below the list of past seminars. Videos of some of the past seminars are available online, or via the link bellow.
14 November 2024, Luca Biferale (Università degli Studi di Roma Tor Vergata)
Title: Data driven tools for Lagrangian Turbulence
Abstract: We present a stochastic method for generating and reconstructing complex signals along the trajectories of small objects passively advected by turbulent flows [1]. Our approach makes use of generative Diffusion Models, a recently proposed data-driven machine learning technique. We show applications to 3D tracers and inertial particles in highly turbulent flows, 2D trajectories from NOAA’s Global Drifter Program and dynamics of charged particles in astrophysics. Supremacy against linear decomposition and Gaussian Regression Processes is analyzed in terms of statistical and point-wise metrics concerning intermittency and multi-scale properties.
[1] Li, T., Biferale, L., Bonaccorso, F. et al. Synthetic Lagrangian turbulence by generative diffusion models. Nat Mach Intell 6, 393–403 (2024).
27 June 2024, Andrea Montanari (Stanford University)
Title: Statistical phenomena in data selection and data enrichment
Abstract: Building powerful machine learning models and training them has become increasingly possible thanks to new architecture, software infrastructure, and the prevalence of foundation models.
Nowadays, developing a high-quality dataset for a specific use case of interest is often the key bottleneck to successful machine learning applications. I will discuss two approaches towards alleviating this problem: selecting highly informative samples from a large dataset, and merging a small data sat with surrogate data from a different source. In will overview some of the ideas in the literature on this problem, and present some findings arising from the analysis of simple statistical models. [Based on joint work with Germain Kolossov, Ayush Jain, Eren Sasoglu, Pulkit Tandon]
02 May 2024, Michael Jordan (UC Berkeley and INRIA Paris)
Title: Collaborative Learning, Information Asymmetries, and Incentives
Abstract:
04 Avril 2024, Lénaïc Chizat (EPFL)
Title: A Formula for Feature Learning in Large Neural Networks
Abstract: Deep learning succeeds by doing hierarchical feature learning, but tuning hyperparameters such as initialization scales, learning rates, etc., only give indirect control over this behavior. This calls for theoretical tools to predict, measure and control feature learning. In this talk, we will first review various theoretical advances (signal propagation, infinite width dynamics, etc) that have led to a better understanding of the subtle impact of hyperparameters and architectural choices on the training dynamics. We will then introduce a formula which, in any architecture, quantifies feature learning in terms of more tractable quantities: statistics of the forward and backward passes, and a notion of alignment between the feature updates and the backward pass which captures an important aspect of the nature of feature learning. This formula suggests normalization rules for the forward and backward passes and for the layer-wise learning rates. To illustrate these ideas, I will discuss the feature learning behavior of ReLU MLPs and ResNets in the infinite width and depth limit.
21 Mars 2024, Antoine Georges (Collège de France, Paris and Flatiron Institute, New York)
Title: Applications of Machine Learning and Neural Networks to Quantum Systems
Abstract: Applications of learning algorithms using deep neural networks have developed considerably recently, often with spectacular results. The physics of complex quantum systems is no exception, with multiple applications that constitute a new field of research. Examples include the representation and optimization of wave functions of quantum systems with large numbers of degrees of freedom (neural quantum states), the determination of wave functions from measurements (quantum tomography), and applications to the electronic structure of materials, such as the determination of more precise density functionals or the learning of force fields to accelerate molecular dynamics simulations. I will survey some of these applications, with an emphasis on neural quantum states.
29 February 2024, Noah A. Smith (University of Washington)
Title: Breaking Down Language Models
Abstract: Language models are the only thing we have in natural language processing that could be considered scientific.” A collaborator of mine said this more than a decade ago, long before LMs emerged as the single most important technology to come out of our field. In these exciting times, I seek both to make the study of LMs more scientific, and to make LMs more practically beneficial. In this talk, I’ll first draw from recent work from my UW group that starts to tackle questions about LMs that could help “break them down” for a deeper scientific understanding. Then I’ll turn to some developments that try to broaden the usefulness of language models by literally “breaking them down” into more modular components. Finally, I’ll shamelessly advertise some newly delivered artifacts that I believe will help the research community make progress on both of these directions and more
04 October 2023, Julia Kempe (NYU Centre for Data Science and Courant Institute)
Title: Towards Understanding Adversarial Robustness
Abstract: Adversarial vulnerability of neural nets, their failure under small, imperceptible perturbations, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. In this talk I will introduce the problem and current defenses and then explore how tools and insights coming from statistical physics, in particular certain infinite-width limits of neural nets, help shed more light on the origins of the interplay between models and adversarial perturbations, and how these tools can help us devise strategies to circumvent them.
25 May 2023, Valentin De Bortoli (CNRS and ENS)
Title: Generative modelling with diffusion: theory and practice
Abstract: Generative modeling is the task of drawing new samples from an underlying distribution known only via an empirical measure. There exists a myriad of models to tackle this problem with applications in image and speech processing, medical imaging, forecasting and protein modeling to cite a few. Among these methods score-based generative models (or diffusion models) are a new powerful class of generative models that exhibit remarkable empirical performance. They consist of a 'noising' stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a `denoising' process defined by approximating the time-reversal of the diffusion. In this talk I discuss three aspects of diffusion models. First, I will present some of their theoretical guarantees with an emphasis on their behavior under the so-called manifold hypothesis. Such theoretical guarantees are non-vacuous and provide insight on the empirical behavior of these models. Then, I will turn to the extension of diffusion models to non Euclidean data. Indeed, classical generative models assume that data is supported on a Euclidean space, i.e. a manifold with flat geometry. In many domains such as robotics, geoscience or protein modeling, data is often naturally described by distributions living on Riemannian manifolds which require new methodologies to be appropriately handled. Finally, I will turn to constraints on the generative process itself. A well-known limitation of diffusion models is that the forward-time stochastic process must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem, i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. I will present Diffusion Schrödinger Bridge, an original approximation of the Iterative Proportional Fitting procedure to solve the Schrödinger Bridge problem.
29 March 2023, Thomas Serre (Brown University)
Title: Feedforward and feedback processes in visual processing
Abstract: Progress in deep learning has spawned great successes in many engineering applications. As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching - and sometimes even surpassing - human accuracy on a variety of visual recognition tasks. In this talk, however, I will show that these neural networks (and recent extensions) exhibit a limited ability to solve seemingly simple visual reasoning problems. Our group has developed a computational neuroscience model of the feedback circuitry found in the visual cortex. The model was constrained by the anatomy and physiology of the visual cortex and shown to account for diverse visual illusions - providing computational evidence for a novel canonical circuit that is shared across visual modalities. I will show that this computational neuroscience model can be turned into a modern end-to-end trainable deep recurrent network architecture that addresses some of the shortcomings exhibited by state-of-the-art feedforward networks for visual reasoning. This suggests that neuroscience may contribute powerful new ideas and approaches to computer science and artificial intelligence.
Thursday 2nd of February 2022, Remi Gribonval (INRIA)
Title: Rapture of the deep: highs and lows of sparsity in a world of depths
Abstract: Promoting sparse connections in neural networks is natural to control their complexity. Besides, given its thoroughly documented role in inverse problems and variable selection, sparsity also has the potential to give rise to learning mechanisms endowed with certain interpretability guarantees. Through an overview of recent explorations around this theme, I will compare and contrast classical sparse regularization for inverse problems with multilayer sparse regularization. During our journey, I will notably highlight the role of rescaling-invariances in deep parameterizations. In the process we will also be remembered that there is life beyond gradient descent, as illustrated by an algorithm that brings speedups of up to two orders of magnitude when learning certain fast transforms via multilayer sparse factorization.
Thursday December 15 2022, Wolfram Pernice (University of Münster)
Title: Computing beyond Moore's law with photonic hardware
Abstract: Conventional computers are organized around a centralized processing architecture, which is well suited to running sequential, procedure-based programs. Such an architecture is inefficient for computational models that are distributed, massively parallel and adaptive, most notably those used for neural networks in artificial intelligence. In these application domains demand for high throughput, low latency and low energy consumption is driving the development of not only new architectures, but also new platforms for information processing. Photonic circuits are emerging as one promising candidate platform and allow for realizing the underlying computing architectures, which process optical signals in analogy to electronic integrated circuits. Therein electrical connections are replaced with photonic waveguides which guide light to desired locations on chip. Through heterogeneous integration, photonic circuits, which are normally passive in their response, are able to display active functionality and thus provide the means to build neuromorphic systems capable of learning and adaptation. In reconfigurable photonic architectures in-memory computing allows for overcoming separation between memory and central processing unit as a route for designing artificial neural networks, which operate entirely in the optical domain.
Thursday 10th of November 2022, Andrea Liu (University of Pennsylvania)
Title: Machine Learning Glassy Dynamics
Abstract: The three-dimensional glass transition is an infamous example of an emergent collective phenomenon in many-body systems that is stubbornly resistant to microscopic understanding using traditional statistical physics approaches. Establishing the connection between microscopic properties and the glass transition requires reducing vast quantities of microscopic information to a few relevant microscopic variables and their distributions. I will demonstrate how machine learning, designed for dimensional reduction, can provide a natural way forward when standard statistical physics tools fail. We have harnessed machine learning to identify a useful microscopic structural quantity for the glass transition, have applied it to simulation and experimental data, and have used it to build a new model for glassy dynamics.
Tuesday October 4 2022, Freddy Bouchet (ENS Lyon)
Title: Probabilistic forecast of extreme heat waves using convolutional neural networks and rare event simulations
Abstract: Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Extreme heatwaves are, and likely will be in the future, among the deadliest weather events. Forecasting their occurrence probability a few days, weeks, or months in advance is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies. We will demonstrate that deep neural networks can predict the probability of occurrence of long lasting 14-day heatwaves over France, up to 15 days ahead of time for fast dynamical drivers (500 hPa geopotential height fields), and at much longer lead times for slow physical drivers (soil moisture). This forecast is made seamlessly in time and space, for fast hemispheric and slow local drivers. A key scientific message is that training deep neural networks for predicting extreme heatwaves occurs in a regime of drastic lack of data. We suggest that this is likely the case for most other applications of machine learning to large scale atmosphere and climate phenomena. We discuss perspectives for dealing with this lack of data issue, for instance using rare event simulations. Rare event simulations are a very efficient tool to oversample drastically the statistics of rare events. We will discuss the coupling of machine learning approaches, for instance the analogue method, with rare event simulations, and discuss their efficiency and their future interest for climate simulations.
Thursday June 30th, 2022, Chris Wiggins (New York Times, Columbia)
Title: DS@NYT
Abstract: The Data Science group at The New York Times develops and deploys machine learning solutions to newsroom and business problems. Re-framing real-world questions as machine learning tasks requires not only adapting and extending models and algorithms to new or special cases but also sufficient breadth to know the right method for the right challenge. I'll first outline how
- unsupervised,
- supervised, and
- reinforcement learning methods
are increasingly used in human applications for
- description,
- prediction, and
- prescription,
respectively. I'll then focus on the 'prescriptive' cases, showing how methods from the reinforcement learning and causal inference literatures can be of direct impact in
- engineering,
- business, and
- decision-making more generally.
Thursday May 12th, 2022, Gerard Ben Arous (New York University)
Title: Effective dynamics and critical scaling for Stochastic Gradient Descent in high dimensions
Abstract: SGD in high dimension is a workhorse for high dimensional statistics and machine learning, but understanding its behavior in high dimensions is not yet a simple task. We study here the limiting 'effective' dynamics of some summary statistics for SGD in high dimensions, and find interesting and new regimes, i.e. not the expected one given by the population gradient flow. We find that a new corrector term is needed and that the phase portrait of these dynamics is substantially different from what would be predicted using the classical approach including for simple tasks. (joint work with Reza Gheissari (UC Berkeley) and Aukosh Jagannath (Waterloo))
Friday April 8th, 2022, Florence d'Alché-Buc (Telecom ParisTech)
Title: Learning to predict complex outputs: a kernel view
Abstract: Motivated by prediction tasks such as molecule identification or functional regression, we propose to leverage the notion of kernel to take into account the nature of output variables whether they be discrete structures or functions. This approach boils down to encode output data as vectors of the Reproducing kernel Hilbert Space associated to the so-called output kernel. We present vector-valued kernel machines to implement it and discuss different learning problems linked with the chosen loss function. Eventually large scale approaches can be developed using low rank approximations of the outputs. We illustrate the framework on graph prediction and infinite task learning.
Thurs. Dec. 9th, 2021, Brice Ménard (John Hopkins University)
Title: Data science and science with data
Abstract: The young field of Machine learning has changed the ways we interact with data and neural networks have made us appreciate the potential of working with millions of parameters. Interestingly, the vast majority of scientific discoveries today are not based on these new techniques. I will discuss the contrast between these two regimes and I will show how an intermediate approach, i.e. neural network inspired but mathematically defined statistics (scattering and phase harmonic transforms), can provide the long-awaited tools in scientific research. I will illustrate these points using astrophysics as an example.
Thurs. Oct. 21th, 2021, Eric Vanden-Eijnden (NYU)
[video]
Title: Machine learning and applied mathematics
Abstract: The recent success of machine learning suggests that neural networks may be capable of approximating high-dimensional functions with controllably small errors. As a result, they could outperform standard function interpolation methods that have been the workhorses of scientific computing but do not scale well with dimension. In support of this prospect, here I will review what is known about the trainability and accuracy of shallow neural networks, which offer the simplest instance of nonlinear learning in functional spaces that are fundamentally different from classic approximation spaces. The dynamics of training in these spaces can be analyzed using tools from optimal transport and statistical mechanics, which reveal when and how shallow neural networks can overcome the curse of dimensionality. I will also discuss how scientific computing problem in high-dimension once thought intractable can be revisited through the lens of these results, focusing on applications related to (i) solving Fokker-Planck equations associated with high-dimensional systems displaying metastability and (ii) sampling Boltzmann-Gibbs distributions using generative models to assist MCMC methods.
Thurs. April 29th, 2021, Giuseppe Carleo (EPFL)
[Slides]
Title: Learning Solutions to the Schrödinger equation with Neural-Network Quantum States
Abstract: The theoretical description of several complex quantum phenomena fundamentally relies on many-particle wave functions and our ability to represent and manipulate them. Variational methods in quantum mechanics aim at compact descriptions of many-body wave functions in terms of parameterised ansatz states, and are at present living exciting transformative developments informed by ideas developed in machine learning. In this presentation I will discuss variational representations of quantum states based on artificial neural networks [1] and their use in approximately solving the Schrödinger equation. I will further highlight the general representation properties of such states, the crucial role of physical symmetries, as well as the connection with other known representations based on tensor networks [2]. Finally, I will discuss how some classic ideas in machine learning, such as the Natural Gradient, are being used and re-purposed in quantum computing applications [3].
[1] Carleo and Troyer, Science 365, 602 (2017)
[2] Sharir, Shashua, and Carleo, arXiv:2103.10293 (2021)
[3] Stokes, Izaac, Killoran, and Carleo, Quantum 4, 269 (2020)
Thurs. March 25th, 2021, Caroline Uhler (MIT)
Title: Causality and Autoencoders in the Light of Drug Repurposing for COVID-19
Abstract: Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (genomics, advertisement, education, etc.). In order to obtain mechanistic insights from such data, a major challenge is the integration of different data modalities (video, audio, interventional, observational, etc.). Using genomics as an example, I will first discuss our recent work on coupling autoencoders to integrate and translate between data of very different modalities such as sequencing and imaging. I will then present a framework for integrating observational and interventional data for causal structure discovery and characterize the causal relationships that are identifiable from such data. We then provide a theoretical analysis of autoencoders linking overparameterization to memorization. In particular, I will characterize the implicit bias of overparameterized autoencoders and show that such networks trained using standard optimization methods implement associative memory. We end by demonstrating how these ideas can be applied for drug repurposing in the current COVID-19 crisis.
Thurs. Feb. 18th, 2021, Josh McDermott (MIT)
[Slides]
Title: New Models of Human Hearing via Machine Learning
Abstract: Humans derive an enormous amount of information about the world from sound. This talk will describe our recent efforts to leverage contemporary machine learning to build neural network models of our auditory abilities and their instantiation in the brain. Such models have enabled a qualitative step forward in our ability to account for real-world auditory behavior and illuminate function within auditory cortex. But they also exhibit substantial discrepancies with human perceptual systems that we are currently trying to understand and eliminate.
Jan. 14th, 2021, Eero Simoncelli (New York University)
[video]
Title: Sampling and Solving Linear Inverse Problems Using the Prior Implicit in a Denoiser
Abstract: Prior probability models are a central component of many image processing problems, but density estimation is a notoriously difficult problem for high dimensional signals such as photographic images. Deep neural networks have provided impressive solutions for problems such as denoising, which implicitly rely on a prior probability model of natural images. I’ll describe our progress in understanding and using this implicit prior. We rely on a little-known statistical result due to Miyasawa (1961), who showed that the least-squares solution for removing additive Gaussian noise can be written directly in terms of the gradient of the log of the noisy signal density. We use this fact to develop a stochastic coarse-to-fine gradient ascent procedure for drawing high-probability samples from the implicit prior embedded within a CNN trained to perform blind (i.e., unknown noise level) least-squares denoising. A generalization of this algorithm to constrained sampling provides a method for using the implicit prior to solve any linear inverse problem, with no additional training. We demonstrate this general form of transfer learning in multiple applications, using the same algorithm to produce high-quality solutions for deblurring, super-resolution, inpainting, and compressive sensing. Joint work with Zahra Kadkhodaie, Sreyas Mohan, and Carlos Fernandez-Granda
Nov. 12th, 2020, Alice Guionnet (ENS Lyon)
Title: Rare events in Random Matrices and Applications
Abstract: In this talk, I will discuss recent developements in the theory of large deviations in random matrix theory and their applications in statistical learning.
Oct. 8th, 2020, Andrew Saxe (Oxford)
[video]
Title: The Neural Race Reduction: Dynamics of learning in ReLU networks
Abstract: What is the relationship between task geometry, network architecture, and emergent learning dynamics in nonlinear deep networks? I will describe the neural race reduction, which describes gradient descent learning dynamics in ReLU networks in the feature learning regime for a subset of nonlinear tasks. The reduction reveals a bias in gradient descent dynamics toward exploiting shared structure and abstraction where possible. I will then turn to an fMRI experiment testing predicted representational geometry in a nonlinear context-dependent task. These results provide a new window into learning dynamics in nonlinear neural networks.
Feb. 27th, 2020, Michael Biehl (University of Groningen)
[video]
Title: Prototype-Based Classifiers and Their Application in the Life Science
Abstract: This talk briefly reviews important aspects of prototype based systems in the context of supervised learning. A key issue is the choice of an appropriate distance or similarity measure for the task at hand. The powerful framework of relevance learning will be discussed, in which parameterized distance measures are adapted together with the prototypes in the same data-driven training process. Example applications in the bio-medical domain are presented in order to illustrate the concept: (I) the classification of adrenocortical tumors using steroid metabolomics data, (II) the early diagnosis of rheumatoid arthritis based on cytokine expression and (III) the detection and discrimination of neuro- degenerative diseases in 3D brain images.
Feb. 6th, 2020, Martin Weigt (Sorbonne Université)
[video]
Title: From generative models of protein sequences to evolution-guided protein design
Abstract: Thanks to the sequencing revolution in biology, protein sequence databases have been growing exponentially over the last years. Data-driven computational approaches are becoming more and more popular in exploring this increasing data richness. In my talk, I will show that global statistical modeling approaches, like (Restricted) Boltzmann Machines are able to accurately capture the natural variability of amino-acid sequences across entire families of evolutionarily related but distantly diverged proteins. We show that these models are biologically interpretable; they allow to extract information about the three-dimensional protein structure and about protein-protein interactions from sequence data, and they unveil distributed sequence motifs. These models can be seen as highly performant generative models - they capture the natural sequence variability far beyond fitted quantities, and they allow to design novel, fully functional proteins by simple MCMC sampling approaches.
Bio: Martin Weigt is Professor for Computational Biology at Sorbonne Université, Paris, where he heads the research team 'Statistical Genomics and Biological Physics' within the Laboratory of Computational and Quantitative Biology (LCQB). Combining his original scientific background in theoretical and statistical physics with the exploding data richness in genomics and biology, he is particularly interested in the development of data-driven modeling approaches for biological sequences, their evolution and de novo design.
Nov. 26th, 2019, Yue M. Lu (John A. Paulson School of Engineering and Applied Sciences, Harvard University)
Title: Exploiting the Blessings of Dimensionality in Big Data
Abstract: The massive datasets being compiled by our society present new challenges and opportunities to the field of signal and information processing. The increasing dimensionality of modern datasets offers many benefits. In particular, the very high-dimensional settings allow one to develop and use powerful asymptotic methods in probability theory and statistical physics to obtain precise characterizations that would otherwise be intractable in moderate dimensions. In this talk, I will present recent work where such blessings of dimensionality are exploited. In particular, I will show (1) the exact characterization of a widely-used spectral method for nonconvex statistical estimation; (2) the fundamental limits of solving the phase retrieval problem via linear programming; and (3) how to use scaling and mean-field limits to analyze nonconvex optimization algorithms for high-dimensional inference and learning. In these problems, asymptotic methods not only clarify some of the fascinating phenomena that emerge with high-dimensional data, they also lead to optimal designs that significantly outperform heuristic choices commonly used in practice.
June 11th, 2019, Jean-Remi King (ENS)
[video]
Title: From brains to algorithms: parsing neuroimaging data to infer the computational architecture of human cognition.
Abstract: While machine learning is an autonomous research field, a number of historical (e.g. artificial neural networks) as well as more recent computational strategies (e.g. attentional gating) have been influenced by cognitive and neuroscientific findings. To what extent can cognitive neuroscience continue to guide and intersect with the development of machine learning? To highlight potential directions to this major issue, I will present three studies that investigate the computational organization of brain processing. For each of them, I will show that we can use non-invasive neuroimaging techniques with high temporal precision to parse the computational stages of visual processing in the healthy human brain. Our results show that the raw visual input that bombards our retina is progressively transformed into meaningful representations by a hierarchical algorithm, distributed both over time and space. Finally, I will briefly show how these methods can now be applied to understand language processing in humans, and thus help us tackle the modern challenges of machine learning.
May 9th, 2019, Matthieu Husson (Observatoire de Paris)
Title: Artificial Intelligence and data sets from the history of astronomy: new opportunities?
Abstract: The recent development of Digital Humanities and the exigences of publishing research data besides research results transform the availability of historical sources, along with the means to analyse, edit, and relate them. In this context, DISHAS relies on a network of international projects in Chinese, Sanskrit, Arabic, Latin and Hebrew sources in the history of astral sciences and aims at providing tools to edit and analyse the different types of sources usually treated in the field, namely, prose and versified texts, iconography and technical/geometrical diagrams, and astronomical tables. This leads to the progressive constitution of precisely described datasets that are a promissing field of experimentation for data sciences in general and artificial intelligence in particular. In this presentation we want to introduce our datasets, their characteristics and historical meaning. We will discuss different lines of research that could converge toward data sciences topics, with a specific focus on the understanding of historical actors computations from the analysis of the numerical tables they produced.
March 25th, 2019, Béatrice Prunel and Gregory Chatonsky (ENS)
Title: Art and artificial imagination
Abstract: Contemporary media are fascinated by the applications of neural networks in creation. They regularly highlight moments when artificial artistic productions have "deceived" humans and "replaced" artists. All this seems to confirm that AI would have conquered up to the last ramparts of humanity: interiority and creativity. The dialogue between an art historian, invested in the digital humanities, and an artist who is himself familiar with deep learning, invites us to change our perspective. A historical and materialistic approach makes it possible to better distinguish what is new in the apparent emergence of AI in the arts and to better grasp the implicit conception of art that develops there: the change of purpose of a new technique, which generates surprising results, is also a way of thwarting the assumptions of the contemporary economic system. It suggest criticism of it as much as it opens up new possibilities.
October 17th, 2018, Emmanuel Dupoux (EHESS)
[video]
Title: Towards developmental AI
Abstract: Even though current machine learning techniques yield systems that achieve parity with humans on several high level tasks, the learning algorithms themselves are orders of magnitude less data efficient than those used by humans, as evidenced by the speed and resilience with which infants learn language and common sense. I review some of our recent attempts to reverse engineer such abilities in the area of unsupervised or weakly supervised learning of speech representations, the segmentation of speech terms, and the learning the laws of intuitive physics by observation of videos. I argue that a triple effort in data collection, algorithm development and fine grained human/machine comparisons is needed to uncover these developmental algorithms.
Bio:E. Dupoux is full professor at the Ecole des Hautes Etudes en Sciences Sociales (EHESS), and directs the Cognitive Machine Learning team at the Ecole Normale Supérieure (ENS) in Paris and INRIA (www.syntheticlearner.com). His education includes a PhD in Cognitive Science (EHESS), a MA in Computer Science (Orsay University) and a BA in Applied Mathematics (Pierre & Marie Curie University, ENS). His research mixes developmental science, cognitive neuroscience, and machine learning, with a focus on the reverse engineering of infant language and cognitive development using unsupervised or weakly supervised learning. He is the recipient of an Advanced ERC grant, the organizer of the Zero Ressource Speech Challenge (2015, 2017), the Intuitive Physics Benchmark (2017) and led in 2017 a Jelinek Summer Workshop at CMU on multimodal speech learning. He has authored 150 articles in various peer reviewed outlets.
June 12th, 2018, Joan Bruna (New York University)
[video]
Title: Learning Graph Inverse Problems with Neural Networks
Abstract: Inverse Problems on graphs encompass many areas of physics, algorithms and statistics, and are a confluence of powerful methods, ranging from computational harmonic analysis and high-dimensional statistics to statistical physics. Similarly as with inverse problems in signal processing, learning has emerged as an intriguing alternative to regularization and other computationally tractable relaxations, opening up new questions in which high-dimensional optimization, neural networks and data play a prominent role. In this talk, I will argue that several tasks that are ‘geometrically stable’ can be well approximated with Graph Neural Networks, a natural extension of Convolutional Neural Networks on graphs. I will present recent work on supervised community detection, quadratic assignment, neutrino detection and beyond showing the flexibility of GNNs to extend classic algorithms such as Belief Propagation.
Bio: Joan Bruna is an Assistant Professor of Computer Science, Data Science and Mathematics (affiliated) at the Courant Institute of Mathematical Sciences, New York University, and at the Center for Data Science. His research interests touch several areas of Machine Learning, Signal Processing and High-Dimensional Statistics. In particular, in the past few years he has been working on Convolutional Neural Networks, studying some of its theoretical properties, extensions to more general geometries, and applications to physical sciences and statistics. Before that, he worked at FAIR (Facebook AI Research) in New York. Prior to that, he was a postdoctoral researcher at Courant Institute, NYU. He completed his PhD in 2013 at Ecole Polytechnique, France. He is the recipient of an Alfred. P. Sloan Fellowship (2018), and he has organized multiple tutorials and workshops on geometric deep learning, including NIPS and CVPR in 2017.
May 15th, 2018, Balázs Kégl (Université Paris Saclay)
[video]
Title: Machine learning in scientific workflows
Abstract: I will describe our contributions to scientific ML workflow building and optimization, which we have carried out within the Paris-Saclay Center for Data Science. I will start by mapping out the different use cases of machine learning in sciences (data collection, inference, simulation, hypothesis generation). Then I will detail some of the particular challenges of ML/science collaborations and the solutions we built to solve these challenges. I will briefly describe the open code submission RAMP tool that we built for collaborative prototyping, detail some of the workflows (e.g., the Higgs boson discovery pipeline, El Nino forecasting, detecting Mars craters on satellite images), and present results on rapidly optimizing machine learning solutions.
Bio: Balázs Kégl received the Ph.D. degree in computer science from Concordia University, Montreal, in 1999. From January to December 2000 he was a Postdoctoral Fellow at the Department of Mathematics and Statistics at Queen's University, Kingston, Canada, receiving NSERC Postdoctoral Fellowship. He was in the Department of Computer Science and Operations Research at the University of Montreal, as an Assistant Professor from 2001 to 2006. Since 2006 he has been a research scientist in the Linear Accelerator Laboratory of the CNRS. He has published more than hundred papers on unsupervised and supervised learning (principal curves, intrinsic dimensionality estimation, boosting), large-scale Bayesian inference and optimization, and on various applications ranging from music and image processing to systems biology and experimental physics. At his current position he has been the head of the AppStat team working on machine learning and statistical inference problems motivated by applications in high-energy particle and astroparticle physics. Since 2014, he has been the head of the Center for Data Science of the University of Paris Saclay. In 2016 he is co-created the RAMP (www.ramp.studio).
March 13rd, 2018, Elizabeth Purdom (Berkeley)
[video]
Title: Statistical challenges in analyzing high-dimensional experiments in molecular biology
Abstract: Molecular biology experiments frequently measure tens of thousands of measurements on a cell in order to obtain a full snapshot of the activity of the cell. Analysis of these experiments requires the integration of statistical techniques to the unique biological aspects of these experiments. In this talk I will give an overview the data challenges faced in these settings, as well as highlight how the solutions to these problems compare to those used in the wider data science community. I will illustrate these points with examples from my research in developing methods for the analysis of the measurements of mRNA abundance of cells.
Feb. 6th, 2018, Maureen Clerc (INRIA)
[video]
Title: Brain-computer interfaces: two concurrent learning problems
Abstract: Brain-Computer Interfaces (BCI) are systems which provide real-time interaction through brain activity, bypassing traditional interfaces such as keyboard or mouse. A target application of BCI is to restore mobility or autonomy to severely disabled patients. In BCI, new modes of perception and interaction come into play, which users must learn, just as infants learn to explore their sensorimotor system. Feedback is central in this learning. From the point of view of the system, features must be extracted from the brain activity, and translated into commands. Feature extraction and classification issues, are important components of a BCI. Adaptive learning strategies, because of the high variability of the brain signals. Moreoever, additional markers may also be extracted to modulate the system's behavior. It is for instance possible to monitor the brain's reaction to the BCI outcome. In this talk I will present some of the current machine learning methods which are used in BCI, and the adaptation of BCI to users' needs.
Nov. 14th, 2017, Rémi Monasson (ENS)
[video]
Title: Searching for interaction networks in proteins: from statistical physics to machine learning, and back
Abstract: Over the last century, statistical physics was extremely successful to predict the collective behaviour of many physical systems from detailed knowledge about their microscopic components. However, complex systems, whose properties result from the delicate interplay of many strong and heterogenous interactions, are notoriously difficult to tackle with first-principle approaches. It is therefore tempting to use data to infer adequate microscopic models. I will present some efforts made along this direction for proteins, based on the well-known Potts model of statistical mechanics, with an emphasis on computational and theoretical aspects. I will then show how machine learning, whose unsupervised models encompass the Potts model, can be an inspiring source of new questions for statistical mechanics.
Oct. 3rd, 2017, Jean-Luc Starck (CEA)
[video]
Title: Cosmostatistics: Tackling Big Data from the Sky
Abstract: Since the dawn of time, humans have been wondering about their place in the Universe. Over the past century, advances in modern physics, technology and engineering, along with the unique possibilities offered by space missions, have opened new windows to explore the cosmos. All-sky surveys, with observations across the entire electromagnetic spectrum, are the best strategy to fully understand and model the Universe in detail. Major upcoming research facilities, such as the Large Synoptic Survey Telescope (LSST), the Square Kilometer Array (SKA) and the Euclid space telescopes will provide key elements to addressing this challenge, by producing high quality data of petabyte volumes. These surveys prove to be a major 'big data' challenge, which require the development of innovative statistical methods essential both for the data analysis and their physical interpretation. I will present some highlights of this methodology and more specifically show how novel techniques of sparsity and compressed sensing open new perspectives in analysing cosmological data. These enable us to answer fundamental questions about the nature of our Universe with impressive accuracy.
June 27th, 2017, Guillermo Sapiro (Duke University)
Title: Learning to Succeed while Teaching to Fail: Privacy in Closed Machine Learning Systems
Abstract: Security, privacy, and fairness have become critical in the era of data science and machine learning. More and more we see that achieving universally secure, private, and fair systems is practically impossible. We have seen for example how generative adversarial networks can be used to learn about the expected private training data; how the exploitation of additional data can reveal private information in the original one; and how what seems as unrelated features can teach us about each other. Confronted with this challenge, in this work we open a new line of research, where the security, privacy, and fairness is in a closed environment. The goal is to ensure that a given entity, trusted to infer certain information with our data, is blocked from inferring protected information from it. For example, a hospital might be allowed to produce diagnosis on the patient (the positive task), without being able to infer the irrelevant gender of the subject (negative task). Similarly, a company can guarantee they internally are not using the provided data for any undesired task, an important goal that is not contradicting the virtually impossible challenge of blocking everybody from the undesired task. We design a system that learns to perform the positive task while simultaneously being trained to fail at the negative one, and illustrate this with challenging cases where the positive task is actually harder than the negative one. The particular framework and examples open to door to security, privacy, and fairness in very important closed scenarios. Joint work with Jure Sokolic and Miguel Rodrigues.
Tuesday, May 16th, 2017, Sophie Deneve (ENS)
[video]
Title: The brain as an optimal efficient adaptive learner
Abstract: Understanding how neural networks can learn to predict and represent time-varying variables is a fundamental challenge in neuroscience. A key complication is the error credit assignment problem: how to determine the local contribution of each synapse to the network’s global output error. Previous work on solving this problem in spiking networks has either been restricted to linear systems (Boerlin, Machens, Deneve 2013; Bourdoukan, Deneve 2015), or to non-local learning rules (FORCE learning; Sussillo, Abbott 2009; Thalmeier et al 2016). Here we show how to learn arbitrary non-linear dynamical systems with local learning rules. Our approach uses tools from adaptive control theory, and applies them to a spiking network with nonlinear dendrites. The spiking network receives its own tracking error through feedback and learns to approximate a nonlinear dynamical systems using a purely local learning rule. The local credit assignment problem is solved because each neuron effectively contains partial information of the error made by the entire network. This error is captured by the tightly balanced voltage of each neuron. Here, a balanced network effectively acts as a predictive auto-encoder that learns to cancel its own error and feedback. The resulting network is extremely efficient in terms of the number of spikes fired, and it is highly robust to noise and neural elimination. It produces asynchronous, irregular spiking activity matching the Poisson-like neural variability observed experimentally. Our framework has several important implications. It suggests that a global learning problem, like learning to implement complex nonlinear dynamics from examples, can be solved with local rules, as long as output errors are fed back as driving input signals. Our approach inherits the analytical tools of control theory such as convergence and stability theorems that can now be applied to learning in spiking networks.
March. 7th, 2017, Francis Bach (INRIA)
[video]
Title: Beyond stochastic gradient descent for large-scale machine learning
Abstract: Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ('large n') and each of these is large ('large p'). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. In this talk, I will show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. (joint work with Nicolas Le Roux, Eric Moulines and Mark Schmidt).
Feb. 21st, 2017, Yann Ollivier (CNRS and Paris-Sud)
Title: Intelligence artificielle et raisonnement inductif : de la théorie de l'information aux réseaux de neurones artificiels
Abstract: Les problèmes de raisonnement inductif ou d'extrapolation comme "deviner la suite d'une série de nombres", ou plus généralement, "comprendre la structure cachée dans des observations", sont fondamentaux si l'on veut un jour construire une intelligence artificielle. On a parfois l'impression que ces problèmes ne sont pas mathématiquement bien définis. Or il existe une théorie mathématique rigoureuse du raisonnement inductif et de l'extrapolation, basée sur la théorie de l'information. Cette théorie est très élégante, mais difficile à appliquer.
En pratique aujourd'hui, ce sont les réseaux de neurones qui donnent les meilleurs résultats sur toute une série de problèmes concrets d'induction et d'apprentissage (vision, reconnaissance de la parole, récemment le jeu de Go ou les voitures sans pilote...) Je ferai le point sur quelques-uns des principes mathématiques sous-jacents et sur leur lien avec la théorie de l'information.
Short bio: Yann Ollivier is a researcher in computer science and mathematics at the CNRS, LRI, Université Paris-Saclay. After starting his career in pure mathematics, with topics ranging from probability to group theory, he decided to move to artificial intelligence, with an emphasis on artificial neural network training, deep learning, and their links with information theory. In 2011 he was awarded the bronze medal of the CNRS for his work.
Jan. 10th, 2017, Bertrand Thirion (INRIA and Neurospin)
Title: A big data approach towards functional brain mapping
Abstract: Functional neuroimaging offers a unique view on brain functional organization, which is broadly characterized by two features: the segregation of brain territories into functionally specialized regions, and the integration of these regions into networks of coherent activity. Functional Magnetic Resonance Imaging yields a spatially resolved, yet noisy view of this organization. It also yields useful measurements of brain integrity to compare populations and characterize brain diseases. To extract information from these data, a popular strategy is to rely on supervised classification settings, where signal patterns are used to predict the experimental task performed by the subject during a given experiment, which is a proxy for the cognitive or mental state of this subject. In this talk we will describe how the reliance on large data copora changes the picture: it boosts the generalizability of the results and provides meaningful priors to analyze novel datasets. We will discuss the challenges posed by these analytic approaches, with an emphasis on computational aspects, and how the use of non-labelled data can be further used to improve the model learned from brain activity data.
Dec. 15th, 2016, Special Inauguration Conference (-)
Title: Inauguration of the Chair CFM-ENS
Abstract: The ENS and CFM organizes the inauguration conference of the Chair 'Modèles et Sciences des Données'.
Nov. 8th, 2016, Cristopher Moore (Santa Fe Institute)
[video]
Title: What physics can tell us about inference?
Abstract: There is a deep analogy between statistical inference and statistical physics; I will give a friendly introduction to both of these fields. I will then discuss phase transitions in two problems of interest to a broad range of data sciences: community detection in social and biological networks, and clustering of sparse high-dimensional data. In both cases, if our data becomes too sparse or too noisy, it suddenly becomes impossible to find the underlying pattern, or even tell if there is one. Physics both helps us locate these phase transiitons, and design optimal algorithms that succeed all the way up to this point. Along the way, I will visit ideas from computational complexity, random graphs, random matrices, and spin glass theory.
Oct. 11th, 2016, Jean-Philippe Vert (Mines ParisTech, Institut Curie and ENS)
[video]
Title: Can Big Data cure Cancer?
Abstract: As the cost and throughput of genomic technologies reach a point where DNA sequencing is close to becoming a routine exam at the clinics, there is a lot of hope that treatments of diseases like cancer can dramatically improve by a digital revolution in medicine, where smart algorithms analyze « big medical data » to help doctors take the best decisions for each patient or to suggest new directions for drug development. While artificial intelligence and machine learning-based algorithms have indeed had a great impact on many data-rich fields, their application on genomic data raises numerous computational and mathematical challenges that I will illustrate on a few examples of patient stratification or drug response prediction from genomic data.