Past reading groups

You can find below the list of past Laplace reading groups.

Oct. 23rd, 2019, Carlo Lucibello (Bocconi University)

**Title:** *Approximate Survey Propagation for High-Dimensional Estimation*

**Abstract:** In Generalized Linear Estimation (GLE) problems, we seek to reconstruct a signal that is observed through a linear transform, possibly followed by a component-wise, nonlinear and noisy operation. In the Bayesian optimal setting, where the statistical properties of the generative process are perfectly known, Approximate Message Passing (AMP) is known to achieve optimal performance for GLE in terms of reconstruction error. However, its performance can significantly degrade whenever the model assumed for inference differs from the true posterior distribution, a situation frequently encountered in practice. Approximate Survey Propagation (ASP) is a message passing algorithm rooted in the statistical physics of glassy systems that shows the potential to address some of these shortcomings while adding little computational complexity. We will substantiate this claim by analyzing the performance of ASP in two archetypical settings: maximum likelihood estimation in the phase retrieval problem and L0 based reconstruction in compressed sensing.

May 10th, 2019, Matthew Colbrook (Cambridge University)

**Title:** *Can neural networks always be trained? On the boundaries of deep learning *

**Abstract:** Deep learning has emerged as a competitive new tool in image reconstruction. However, recent results demonstrate such methods are typically highly unstable – tiny, almost undetectable perturbations cause severe artefacts in the reconstruction, a major concern in practice. This is paradoxical given the existence of stable state-of-the-art methods for these problems. Thus, approximation theoretical results non-constructively imply the existence of stable and accurate neural networks. Hence the fundamental question: Can we explicitly construct/train stable and accurate neural networks for image reconstruction? I will discuss two results in this direction. The first is a negative result, saying such constructions are in general impossible, even given access to the solutions of common optimisation algorithms such as basis pursuit. The second is a positive result, saying that under sparsity assumptions, such neural networks can be constructed. These neural networks are stable and theoretically competitive with state-of-the-art results from other methods. Numerical examples of competitive performance are also provided.

Nov. 29th, 2018, Gui-Song Xia (Wuhan University)

**Title:** *Texture Synthesis with Conditional Generative CNNs*

**Abstract:** Example-based texture synthesis (EBTS) is a well-studied yet challenging problem in computer vision and graphics, it aims to produce textures that are visually similar to given exemplars. Although some breakthroughs have been achieved by recently proposed methods using deep neural networks, they still have some difficulties in synthesizing textures with complex structures, e.g. non-local and near-/regular geometric patterns. Moreover, most of them rely on large-scale deep networks pre-trained for recognition tasks. Aiming at better generation of highly structured textures with lighter deep neural networks, we present a conditional generative convolutional neural net-work (cgCNN) model for texture synthesis in this paper. Given an exemplar, our model defines a conditional distribution of synthesized textures using a light-weight CNN. Our model is then trained by iteratively learning and sampling from the conditional distribution. Instead of using a pre-trained CNN, our model learns the weights of CNN in the training process. Furthermore, our model can be easily extended to synthesize dynamic textures by directly adding a temporal dimension to the convolution kernels. Experiments demonstrate that our model achieve state-of-the-art performances, especially on highly structured textures.

Nov. 8th, 2018, Arthur Mensch (ENS)

**Title:** *Learning representations from functional MRI data*

**Abstract:** Thanks to the advent of functional brain-imaging technologies, cognitive neuroscience is accumulating maps of neural activity responses to specific tasks or stimuli, or of spontaneous activity. In this work, we consider data from functional Magnetic Resonance Imaging (fMRI), that we study in a machine learning setting: we learn a model of brain activity that should generalize on unseen data. After reviewing the standard fMRI data analysis techniques, we propose new methods and models to benefit from the recently released large fMRI data repositories. Our goal is to learn richer representations of brain activity. We first focus on unsupervised analysis of terabyte-scale fMRI data acquired on subjects at rest (resting-state fMRI). We perform this analysis using matrix factorization. We present new methods for running sparse matrix factorization/dictionary learning on hundreds of fMRI records in reasonable time. Our leading approach relies on introducing randomness in stochastic optimization loops and provides speed-up of an order of magnitude on a variety of settings and datasets. We provide an extended empirical validation of our stochastic subsampling approach, for datasets from fMRI, hyperspectral imaging and collaborative filtering. We derive convergence properties for our algorithm, in a theoretical analysis that reaches beyond the matrix factorization problem. We then turn to work with fMRI data acquired on subject undergoing behavioral protocols (task fMRI). We investigate how to aggregate data from many source studies, acquired with many different protocols, in order to learn more accurate and interpretable decoding models, that predicts stimuli or tasks from brain maps. Our multi-study shared-layer model learns to reduce the dimensionality of input brain images, simultaneously to learning to decode these images from their reduced representation. This fosters transfer learning in between studies, as we learn the undocumented cognitive common aspects that the many fMRI studies share. As a consequence, our multi-study model performs better than single-study decoding. Our approach identifies universally relevant representation of brain activity, supported by a few task-optimized networks learned during model fitting. Finally, on a related topic, we show how to use dynamic programming within end-to-end trained deep networks, with applications in natural language processing.

July 11th, 2018, Juan Peypouquet (Universidad Técnica Federico Santa María (Chile))

**Title:** *Inertial proximal algorithms for maximally monotone operators*

**Abstract:** We present a Regularized Inertial Proximal Algorithm to solve convex optimization problems and variational inequalities. It is obtained by means of a convenient finite-difference discretization of a second-order differential equation with vanishing damping, governed by the Yosida regularization of a maximally monotone operator with time-varying index. These systems are the counterpart to accelerated forward–backward algorithms in the context of maximally monotone operators. A simple example illustrates the behavior of these systems compared with some of their close relatives.

June 8th, 2018, Mathieu Blondel (NTT (Kyoto))

**Title:** *Smoothing / Regularization Techniques for Probabilistic and Structured Classification*

**Abstract:** In this talk, I will present how smoothing / regularization techniques can be used to turn a prediction function that outputs a hard decision (e.g top-scoring class or top-scoring sequence) into a differentiable "soft" decision with a probabilistic perspective. I will present a novel principled loss family to learn such "soft" prediction functions from training data. In addition, I will show how to use them as a layer in a neural network trained end-to-end by backpropagation. Finally, I will present experimental results on natural language processing applications such as named entity recognition, dependency parsing and machine translation.

Joint work with Vlad Niculae, André Martins, Claire Cardie, Arthur Mensch

The talk will be a condensed overview of the following papers:

[1] https://arxiv.org/abs/1802.04223 (ICML 2018)

[2] https://arxiv.org/abs/1802.03676 (ICML 2018)

[3] https://arxiv.org/abs/1805.09717 (arXiv preprint)

June 8th, 2018, Alberto Bietti (Inria)

**Title:** *Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations*

**Abstract:** The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this work, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms.

https://arxiv.org/abs/1706.03078

May 18th, 2018, Jonathan Dong (LKB)

**Title:** *Scaling up Random Projections with multiple light scattering*

**Abstract:** Random Projections have proven extremely useful in many signal processing and machine learning applications. However, they often require either to store a very large random matrix, or to use a different, structured matrix to reduce the computational and memory costs. We overcome this difficulty with an analog, optical device, that performs the random projections literally at the speed of light without having to store any matrix in memory. This is achieved using the physical properties of multiple coherent scattering of light in random media. These efficient optical random projections are used in two different settings: to generate Random Features for kernel approximation and to iterate an Echo-State Network (a Recurrent Neural Network with fixed internal weights). This new method is fast, power efficient and easily scalable to very large networks: we reach sizes that exceed the RAM memory limit.

May 4th, 2018, Jean Feydy (ENS, DMA)

**Title:** *Riemannian Geometry for Computational Anatomy*

**Abstract:** How can one do statistics on datasets deprived of a suitable (0,+,*) algebraic structure? Can we compute the mean and PCA of a population of 3D meshes while guaranteeing the preservation of each shape's topology?

A Riemannian structure on a dataset (i.e. a collection of local euclidean metrics) is the most general tool allowing us to speak of "angles" and "continuous straight paths" between samples. Focusing on the case of 3D meshes endowed with a tearing-adverse structure, we will show that Riemannian priors can be enforced through the use of a generic 5-line program: the Hamiltonian shooting routine.

As automatic differentiation libraries now relieve us from the hassle of computing our metrics' derivatives, this talk may help you to consider using Riemannian manifolds in situations where linear models are too limited.

April 13th, 2018, Jerome Tubiana (ENS, LPT)

**Title:** *Compositional Representations in Restricted Boltzmann Machines: theory and applications.*

**Abstract:** Restricted Boltzmann Machines (RBM) form a family of probability distributions simple yet powerful for modeling high-dimensional, complex data sets. Besides learning a generative model, they also extract features, producing a graded and distributed representation of data. However, not all variants of RBM perform equally well, and little theoretical arguments exist for these empirical observations. By analyzing an ensemble of RBMs with random weights using statistical physics tools, we characterize the structural conditions (statistics of weights, choice of non-linearity…) allowing the emergence of such efficient representations.
Lastly, we present a new application of RBMs: the analysis of protein sequences alignments. We show that RBMs extract high-order patterns of coevolution that arise from the structural and functional constraints of the protein family. These patterns can be recombined to generate artificial protein sequences with prescribed chemical properties.

March 30th, 2018, Aaron Tranter (Australian National University)

**Title:** *Deep learning cold atomic systems for quantum memories*

**Abstract:** Quantum memories are integral to the realization of quantum information networks and quantum information processing. A promising platform is gradient echo memory (GEM) in cold atomic systems with demonstrated efficiencies of ~=87%. We demonstrate the first application of a deep learning algorithm to a cold atomic system in order to increase the optical depth (OD) of our atomic trap and thus increase memory efficiency. We demonstrate an improvement in OD from 530 +- 8 to 970 +- 20 by performing an optimization over 63 experimental parameters. We also observe a physical change in the atomic cloud corresponding to the spatial distribution of the atom cloud and apply the optimization to the GEM protocol.

Mar. 16th, 2018, Marine Le Morvan (Mines-Paristech)

**Title:** *Scaling up the LASSO with interaction features*

**Abstract:** Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1- regularised problems with two-way interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art.

Feb. 16th, 2018, Tomas Angles, Aude Genevay (ENS (DI, DMA))

**Title:** *Learning generative models beyond GANs*

**Abstract:** Generative Models (i.e. high dimensional probabilistic models that are supported on low dimensional manifolds) have become a popular topic in machine learning after the now famous paper on Generative Adversarial Networks (Goodfellow et al. ‘14) that introduced an intuitive algorithm to learn those types of models to generate images resembling those of a given dataset. This two part tutorial will start by an introduction to generative models, and focus on some of their theoretical aspects such as structure in the latent space, and an auto-encoding perspective. The second part will focus on learning such models with algorithms that minimize a certain divergence between the model distribution and the ‘true’ distribution of the data. This state-of-the-art approach has a strong connection to the original GAN algorithm but is both more stable and better formulated from a mathematical point of view.

Jan. 12th, 2018, Augustin Cosse (ENS, DMA)

**Title:** *Semidefinite programming in the era of big data*

**Abstract:** Semidefinite programming (SDP) has now emerged as a powerful tool to derive approximations to hard problems (such as maxCut), or as a stable algorithm to compute exact solutions to some particular (computationally tractable) instances of such problems (such as in phase retrieval and matrix completion).
The work of Shor, Nesterov, Parrilo and Lasserre brought semidefinite programming to another level by introducing hierarchies of semidefinite programs. In those hierarchies, an original polynomial optimization problem is approximated by a sequence of SDPs on variables of increasing size. Despite their theoretical interest in terms of approximability and stability, the use of such hierarchies is hindered by the size of the variables involved.
In this talk I will address applications of semidefinite programming to some large scale problems from engineering and information theory. In particular, I will start by discussing how SDP hierarchies can be used to improve the current convex relaxation for rank one matrix completion which is based on the minimization of the nuclear norm. Going against the conventional wisdom, I will also discuss possible scalable numerical schemes for those hierarchies. As additional applications, I will briefly address inverse scattering as well as deconvolution and super-resolution.

Dec. 18th, 2017, Michael Elad (Technion)

**Title:** *Sparse Modeling in Image Processing and Deep Learning*

**Abstract:** Sparse approximation is a well-established theory, with a profound impact on the fields of signal and image processing. In this talk we start by presenting this model and its features, and then turn to describe two special cases of it – the convolutional sparse coding (CSC) and its multi-layered version (ML-CSC). Amazingly, as we will carefully show, ML-CSC provides a solid theoretical foundation to … deep-learning. Alongside this main message of bringing a theoretical backbone to deep-learning, another central message that will accompany us throughout the talk: Generative models for describing data sources enable a systematic way to design algorithms, while also providing a complete mechanism for a theoretical analysis of these algorithms' performance. This talk is meant for newcomers to this field - no prior knowledge on sparse approximation is assumed.

Nov. 24th, 2017, Paul Hand (Rice University)

**Title:** *Deep Compressed Sensing*

**Abstract:** Combining principles of compressed sensing with deep neural network-based generative image priors has recently been empirically shown to require 10X fewer measurements than traditional compressed sensing in certain scenarios. As deep generative priors (such as those obtained via generative adversarial training) improve, analogous improvements in the performance of compressed sensing and other inverse problems may be realized across the imaging sciences. In joint work with Vladislav Voroninski, we provide a theoretical framework for studying inverse problems subject to deep generative priors. In particular, we prove that with high probability, the non-convex empirical risk objective for enforcing random deepgenerative priors subject to compressive random linear observations of the last layer of the generator has no spurious local minima, and that for a fixed network depth, these guarantees hold at order-optimal sample complexity.

Oct. 20th, 2017, Elizabeth Purdom (Berkeley)

**Title:** *Robust strategies for analysis of single cell mRNA data*

**Abstract:** mRNA sequencing of single cells is a relatively recent biological technology that allows researchers to query what genes are active in a single cell. This allows for many detailed biological queries that were not possible previously. However, because of the complexities of the experimental process, single cell mRNA-Seq results in quite noisy measurements. In this talk we will introduce for a general audience the data that is being produced and discussed the data challenges that are present in this data in the area of clustering and other forms of estimation. We will then introduce some of our strategies for the analysis of single cell mRNA-Seq.