# Analytic and Geometric Approaches to Machine Learning

#### 11/07/2022 - 15/07/2022

Organisers:

Dr Bamdad Hosseini (University of Washington), Dr Lisa Kreusser (University of Bath), Dr Matthew Thorpe (The University of Manchester), Dr Patricia Vitoria Carrera (Huawei Zurich)

Funding: International Centre for Mathematical Sciences (ICMS)

#### EVENT OVERVIEW

Whilst the impact of Machine Learning methods continue to grow, with applications in medicine, computer vision, climatology and many more the theory has often lagged behind. This is the second workshop, following from the first online workshop held on the 26th-30th July 2021, that aims to bridge the gap between theory and applications. In particular, the workshop will bring together researchers that apply mathematical methodology to machine learning. We particularly want to emphasise how mathematical theory can inform applications and vice versa.

If you are interested in attending this event please complete the application form that can be found here. The deadline for completing the form is 20th June 2022, and to apply for funding you should apply before the 15th May 2022.

summary

## Confirmed Participants

#### Bayesian Imaging with Plug & Play Priors: Implicit, Explicit & Unrolled Cases.

We consider inverse problems in imaging where the likelihood is known and the prior is encoded by a neural network, and we discuss different ways to maximise the posterior distribution and to take samples from it. The first part of the talk concentrates in the case where the plug & play prior is implicit in a pretrained neural network denoiser, we present the plug & play stochastic gradient descent (PnP-SGD) algorithm for posterior maximisation and the plug & play adjusted Langevin algorithm for posterior sampling. The second part of the talk shows how to repurpose a pretrained hierarchical VAE model as a prior for posterior sampling in the particular case of single image super-resolution. The third part of the talk considers the case of posterior maximisation for super-resolution and deblurring with a spatially varying blur kernel. The plug and play ADMM algorithm would require a computationally expensive inversion of the spatially varying blur operator. In order to avoid that we propose a linearized plug and play algorithm and an unrolled version thereof, which produces competitive results with respect to state of the art algorithms.

Joint work with: Rémi Laumont, Jean Prost, Charles Laroche, Valentin De Bortoli, Julie Delon, Antoine Houdard, Nicolas Papadakis, Marcelo Pereyra, Matias Tassano.

Related preprints:
https://arxiv.org/abs/2103.04715
https://arxiv.org/abs/2201.06133
https://arxiv.org/abs/2205.10347
https://arxiv.org/abs/2204.10109

#### Learning Low Bending and Low Distortion Manifold Embeddings.

Autoencoders, which consist of an encoder and a decoder, are widely used in machine learning for dimension reduction of high-dimensional data. The encoder embeds the input data manifold into a lower-dimensional latent space, while the decoder represents the inverse map, providing a parameterization of the data manifold by the manifold in latent space. A good regularity and structure of the embedded manifold may substantially simplify further data processing such as cluster analyses or data interpolation. We propose and analyze a novel regularization for learning the encoder component of an autoencoder: a loss functional that prefers isometric, extrinsically flat embeddings and allows to train the encoder on its own. To perform the training it is assumed that for pairs of nearby points on the input manifold their local Riemannian distance and their local Fréchet mean can be evaluated. The loss functional is computed via Monte Carlo integration. We further identify the $\Gamma$-limit of the sampling-dependent loss functionals. Numerical tests, using image data that encodes different explicitly given data manifolds, show that smooth manifold embeddings into latent space are obtained. Due to the promotion of extrinsic flatness, these embeddings are regular enough such that interpolation between not too distant points on the manifold is well approximated by linear interpolation in latent space as one possible postprocessing application.

#### Image Processing with "PDEs on Graphs" and their Continuum Limits.

Over the last decade, an emerging technique for solving image processing tasks has been that of "PDEs on Graphs", stemming from work such as (Bertozzi, Flenner, 2012). This talk will discuss some applications of this technique. First, I will discuss how these techniques can be applied to image segmentation, the task of identifying the key features of an image. Next, I will discuss how these techniques can be extended to (joint) reconstruction-segmentation, whereby the image to be segmented must also be reconstructed from incomplete/noisy measurements. Finally, there has been significant recent work examining the continuum limits of these graph PDEs, and I will discuss how these continuum limits might be useful in applications.

#### Regularising Inverse Problems with Generative Machine Learning Models.

Deep neural network approaches to inverse imaging problems have produced impressive results in the last few years. Here we consider the use of generative models in a variational regularisation approach to inverse problems. Generative models learn, from observations, approximations to high-dimensional data distributions. The considered regularisers penalise images that are far from the range of a generative model that has learned to produce images similar to a training dataset. We name this family generative regularisers.

In contrast to other data-driven approaches, generative regularisers do not require paired training data and are learned independently of the forward model. This makes the method very flexible in real-world scenarios where noise levels and forward model parameters may change. The success of generative regularisers depends on the quality of the generative model and so we propose a set of desired criteria to assess models and highlight avenues for future research.

#### Practical Connection Between Sparse Gaussian Process and Neural Networks via Inducing Variable Design.

Gaussian Process (GP) models and Neural Networks (NN) are two well established frameworks for tackling supervised learning problems. Although they are very different approaches with their own advantages and drawbacks, it is well known that a connection can be drawn between infinitely wide networks and GP priors. In this work, we explore a more practical connection that links trained NN and Deep GP posteriors. This can be done by establishing an interplay between GPs defined on the sphere and on classic Euclidean space, and by using specifically designed inducing variables such that propagating through the mean of the Deep GP corresponds to a forward pass in a NN. This connection can benefit both worlds: initialising a Deep GP with a NN can make the former more accurate while using the GP loss function to further train a NN improves the uncertainty quantification of the latter.

#### Bilevel Learning for Inverse Problems.

Variational regularization techniques are dominant in the field of inverse problems. A drawback of these techniques is that they are dependent on a number of parameters which have to be set by the user. This issue can be approached by machine learning where we estimate these parameters from data. This is known as "Bilevel Learning" and has been successfully applied to many tasks, some as small-dimensional as learning a regularization parameter, others as high-dimensional as learning a sampling pattern in MRI. While mathematically appealing this strategy leads to a nested optimization problem which is computationally difficult to handle. In this talk we discuss several applications of bilevel learning for imaging as well as new computational approaches.

#### Kernel Embedding of Measures and Nystroem Approximation: It's All About the Square.

Integral operators with positive-semidefinite (PSD) kernels play a central role in numerous problems involving reproducing kernel Hilbert spaces (RKHSs); as an important instance, this class of operators encompasses the PSD matrices, emphasising its prevalence in Mathematics and Data Science. Sparse and low-rank approximation schemes for this type of operators are essential for the implementation of numerical strategies involving PSD kernels or large-scale PSD matrices.

In this talk, we will give an overview of the connections between the approximation of integral operators with PSD kernels, the low-rank approximation of PSD matrices, and the kernel embedding of measures. We will in particular describe how trace-class integral operators defined from a given PSD kernel can be isometrically represented as potentials in the RKHS associated with the absolute square of this kernel. We will then discuss various sampling-based approximation strategies that take advantage of the considered isometric representation.

#### Deep Learning and Mean Field Optimal Control from a Stochastic Optimal Control perspective.

We provide a concrete proposal for the mathematical formulation of Deep Learning (DL) based approaches within the contexts of both Stochastic Optimal Control (SOC) and Mean-Field Control (MFC). Following a dynamical system perspective, we conduct an in-depth analysis of the Supervised Learning (SL) procedures characterizing Neural Networks (NNs) based models, also showing how to translate a SL task into an optimal (stochastic) MFC one. Moreover, we derive two methods within such a Mean Field setting: the first is obtained considering the Hamilton–Jacobi–Bellman (HJB) approach in the Wasserstein space of measures, while the second is based on the MF-Pontryagin maximum principle, providing a necessary, weaker, condition the solution must satisfy. We conclude sketching future research directions we are considering. This talk is based on a joint work with Prof. Luca Di Persio (University of Verona).

#### Unsupervised Learning of the Total Variation Flow.

The total variation (TV) flow generates a scale-space representation of an image based on the TV functional. This gradient flow observes desirable features for images such as sharp edges and enables spectral, scale, and texture analysis. The standard numerical approach for TV flow requires solving multiple non-smooth optimisation problems. Even with state-of-the-art convex optimisation techniques, this is often prohibitively expensive and strongly motivates the use of alternative, faster approaches. Inspired by and extending the framework of physics-informed neural networks (PINNs), we propose the TVflowNET, a neural network approach to compute the solution of the TV flow given an initial image and a time instance. We significantly speed up the computation time by more than one order of magnitude and show that the TVflowNET approximates the TV flow solution with high fidelity.

#### Gradient Step Denoiser for Convergent Plug-and-Play Image Restoration with Nonconvex Regularization.

Plug-and-Play methods constitute a class of iterative algorithms for imaging problems where regularization is performed by an off-the-shelf denoiser. Although Plug-and-Play methods can lead to tremendous visual performance for various image problems, the few existing convergence guarantees are based on unrealistic (or suboptimal) hypotheses on the denoiser, or limited to strongly convex data terms. We propose a new type of Plug-and-Play methods, based on half-quadratic splitting, for which the denoiser is realized as a gradient descent step on a functional parameterized by a deep neural network. Exploiting convergence results for proximal gradient descent algorithms in the non-convex setting, we show that the proposed Plug-and-Play algorithm is a convergent iterative scheme that targets stationary points of an explicit global functional. Besides, experiments show that it is possible to learn such a deep denoiser while not compromising the performance in comparison to other state-of-the-art deep denoisers used in Plug-and-Play schemes. We apply our proximal gradient algorithm to various ill-posed inverse problems, e.g. deblurring, super-resolution and inpainting. For all these applications, numerical results empirically confirm the convergence results. Experiments also show that this new algorithm reaches state-of-the-art performance, both quantitatively and qualitatively.

#### Laplace Learning in Hilbert Spaces

Graph-based approaches, such as Laplace learning, have been successfully used in semi-supervised learning for finite dimensional data. In this talk we formulate the Laplace learning problem in a functional setting, that is where the feature vectors live in an infinite-dimensional space. We show that the method is consistent in the large data limit (as the number of feature vectors goes to infinity) under appropriate conditions on the graph connectivity.

#### Posterior Sampling With Auto-Encoding Prior.

We consider Variational AutoEncoders (VAE) as priors for solving inverse problems in imaging. Markov Chain Monte Carlo (MCMC) methods for sampling from the posterior distribution permit to explore the solutions space and to compute point estimates as well as other statistics about the solutions such as uncertainty estimates. However, the performance of widely used methods like Metropolis-Hastings depends on having precise proposal distributions which can be challenging to define on high-dimensional spaces. Using data augmentation techniques, we develop a Gibbs-like posterior sampling algorithm that exploits the bidirectional nature of VAE networks. To accelerate the burn-in period we explore the adaptation of the annealed importance sampling with resampling method.

Joint work with Mario González (Universidad de la República) and Andrés Almansa (Université Paris Descartes).

#### Coverage and Connectivity in Stochastic Geometry.

Consider a random uniform sample of size $n$ over a bounded region $A$ in $R^d$, $d \geq 2$, having a smooth boundary (or more generally, a smooth $d$-dimensional manifold with boundary, embedded in a higher-dimensional Euclidean space). The coverage threshold $T_n$ is the smallest $r$ such that the union $Z$ of Euclidean balls of radius $r$ centred on the sample points covers $A$. The connectivity threshold $K_n$ is twice the smallest $r$ required for $Z$ to be connected. These thresholds are random variables determined by the sample, and are of interest, for example, in wireless communications, set estimation, and topological data analysis.

We discuss results on the large-$n$ limiting distributions of $T_n$ and $K_n$. For example, when $d=2$, and $A$ has unit area, $\mathbb{P}[ n \pi K_n^2 - \log n] \leq \beta$ tends to $\exp (-e^{-\beta})$. The corresponding result in higher dimensions is more complicated, essentially because the `most isolated' point of the sample is likely to lie near the boundary of $A$. Similarly, the limiting behaviour of $T_n$ is determined by boundary effects when $d \geq 3$.

Some of the work described here is joint work with Xiaochuan Yang.

#### Motion Correction for Cardiac MRI Images using Generative Neural Networks with Disentangled Latent Space Variables.

Patient motion poses a great challenge in magnetic resonance imaging (MRI) due to its effects on the image reconstruction as well as for its subsequent image interpretation. Although dynamic MRI aims to image particular physiological dynamics, for example heart motion, dynamic images might also be distorted by additional motion artifacts, such as those arising from breathing motion. The focus of this work is to remove such distortions while maintaining the desired dynamics. To this aim, a method based on generative neural networks is proposed, that represents an image sequence as the forward mapping via a generative neural network of latent space variables such that different sources of motion are characterized independently via latent space disentanglement using one dimensional prior information. This learned representation allows to freeze one (unwanted) type of motion while maintaining the relevant dynamics. The proposed learning-free method is fully unsupervised as all the network parameters are learned directly from the given data. Subsequently, this method is not susceptible to bias coming from a training dataset because it does not require any prior training. The performance of the algorithm is shown both on synthetic and real dynamic MRI data that display two physically different motion types, breathing and heart-beat motion, where we can successfully disentangle the different motions in the latent space and use that to freeze the breathing motion while maintaining an accurate representation of the heart-beat motion.

This is joint work with Abdullah, Martin Holler and Karl Kunisch.

#### The Linearized Hellinger-Kantorovich Distance.

Optimal transport provides a geometrically intuitive Lagrangian way of comparing distributions by mass rearrangement. The metric can be approximated by representing each sample as deformation of a reference distribution. Formally this corresponds to a local linearization of the underlying Riemannian structure. When combined with subsequent data analysis and machine learning methods this new embedding usually outperforms the standard Eulerian representation. We show how the framework can be extended to the unbalanced Hellinger--Kantorovich distance to improve robustness to noise and mass fluctuations. Joint work with Tianji Cai, Junyi Cheng, Matthew Thorpe.

#### Functional Umbrella Sampling.

We investigate applications of the Eigenvector Method for Umbrella Sampling (EMUS) to the exploration of the marginal likelihood in Bayesian models. This method is more general, hence more efficient than popular algorithms in this context, such as bridge sampling. We formulate inference for the marginal likelihood as a function estimation problem and appropriately combine EMUS with a Gaussian process such that EMUS provides a data model with a structured error distribution and the Gaussian process prior imposes smoothness. Inference for the marginal likelihood proceeds by building on Gaussian process regression ideas. Importantly, our framework allows for a dynamic experimental design and the refinement of the lattice on which EMUS is applied.

#### Which Networks Can Be Learned By An algorithm? --- Expressivity Meets Turing In Deep Learning.

Deep neural networks show great performance in a variety of applications and are now also used in security and safety-sensitive areas like self-driving cars and medical health care. However, besides all the success stories of the last decade, there is overwhelming empirical evidence suggesting that modern AI is often non-robust (unstable), may generate hallucinations, and thus can produce nonsensical output with high levels of prediction confidence.

When we look at the collection of approximation results, we see that the problem does not lie in the expressivity of the networks. However, recent results demonstrate that stable networks cannot be computed by an algorithm from point samples. In this talk, we discuss deep learning in the light of Turing computability and present a first step to extend the results from expressivity theory to computability. The main result is in a similar spirit to the universal approximation theorem and demonstrates that there are no limitations for computable neural networks to approximate computable functions.

Joint work with Anders C. Hansen.

#### Some of the Many Faces of Graph-Based MBO.

In 1992 Merriman, Bence, and Osher introduced a threshold dynamics scheme (later named the MBO scheme) for the approximation of flow by mean curvature. This scheme has proven to be very versatile and in the last decade its graph-based variants have found usage in many applications. In this talk we will take a look at some of those applications, such as graph clustering, maximum cut computations, signed graph clustering, and modularity optimisation. We will also discuss the relationship between graph-based MBO and the graph-based Allen--Cahn equation.

#### Radially Symmetric Risk Minimizing Shallow Neural Networks.

We construct (infinitely wide) two-layer neural networks minimizing a weight decay regularizer among those that take the value 1 at the origin and 0 outside the unit ball. We address the geometry and regularizer scaling of the minimizers, as well as the growth of the Lipschitz constant in dimension and the use of these functions as mollifiers.

## ORGANISING COMMITTEE

#### DR BAMDAD HOSSEINI (CARLIFORNIA INSTITUTE OF TECHNOLOGY)

Bamdad Hosseini is a Senior Postdoctoral Fellow in Computing and Mathematical Sciences at California Institute of Technology (Caltech). His research interests lie at the intersection of applied mathematics, probability theory, and statistics focusing on the analysis and development of methods for extracting meaningful information from data. Hosseini has previously served as a von Karman instructor at Caltech and received a PhD in Mathematics from Simon Fraser University.

#### DR LISA KREUSSER (UNIVERSITY OF BATH)

Lisa Kreusser is a lecturer in Applied Mathematics at the University of Bath. Previously, she was a Junior Research Fellow at Magdalene College, Cambridge, and completed her PhD at the University of Cambridge in 2019. Her recent research focusses on the synergies between differential equations, numerical analysis and data analysis/machine learning.

#### DR MATTHEW THORPE (UNIVERSITY OF MANCHESTER)

Matthew Thorpe is a lecturer in Applied Mathematics at the University of Manchester and works on optimal transport and large data limits of graph based variational problems that arise in semi-supervised and unsupervised learning. Previously he was a research fellow at the University of Cambridge and a Postdoctoral Research Associate at Carnegie Mellon University. He completed his PhD in 2015 at the University of Warwick.

#### DR PATRICIA VITORIA CARRERA (UNIVERSITAT POMPEU FABRA, BARCELONA)

Patricia Vitoria submitted her PhD thesis at the Image Processing Group of the Universitat Pompeu Fabra in Barcelona under the supervision of Coloma Ballester. During her PhD studies, she actively collaborated with the University of Cambridge and Universidad de la Republica. She is currently working at Huawei Zurich Research Centre. She obtained a bachelor degree in Audiovisual Systems Engineering from the University Pompeu Fabra, after which she completed an MSc. degree in Informatics at the Technical University of Munich.
Her research interests are in the field of image restoration and computer vision, more specifically camera artefacts, image and video in-painting, colorization and de-blurring using both model-based and machine learning approaches.

Mon
Tue
Wed
Thu
Fri
Sat
Sun