Connection Between Optimization and Sampling Algorithms
Dr Matthew Thorpe (The University of Manchester)
Dr Patricia Vitoria Carrera (Universitat Pompeu Fabra, Barcelona)
Dr Bamdad Hosseini (California Institute of Technology)
Funding: International Centre for Mathematical Sciences (ICMS)
Sponsor: Inistitute for Mathematial Innovation (IMI)
The aim of the workshop is to bring together researchers that apply mathematical methodology to machine learning. We particularly want to emphasise how mathematical theory can inform applications and vice versa.
This workshop is the first of two workshops on this topic. The second will be an in-person workshop to be held at the University of Bath, in December 2021. In this first workshop, invited speakers are encouraged to present open problems and explore interesting directions for potential research as part of their talk. The schedule allows participants time to initiate conversations and collaborations that can be developed at the winter workshop.
The two workshops in this series follow on from the <a href=”https://mathml2020.github.io/index”>LMS-Bath Symposium on the Mathematics of Machine Learning</a> held 3-7 August 2020.
Connection Between Optimization and Sampling Algorithms
Optimization and Sampling problems lie in the heart of Bayesian inverse problems. The ability to solve such inverse problems depends crucially on the eﬃcient calculation of quantities relating to the posterior distribution, giving thus rise to computationally challenging high dimensional optimization and sampling problems. In this talk, we will connect the corresponding optimization and sampling problems to the large time behaviour of solutions to (stochastic) differential equations. Establishing such a connection allows to utilise existing knowledge from the ﬁeld of numerical analysis of diﬀerential equations. In particular, two very important concepts are numerical stability and numerical contractivity. In the case of linear differential equations these two concepts coincide, but with the exception of some very simple Runge-Kutta methods such the Euler method in the non-linear case numerical stability doesn’t imply numerical contractivity . However, the recently introduced framework of integral quadratic constraints and Lyapunov functions [2, 3] allows for bridging this gap between linearity and non-linearity in the case of (strongly) convex functions. We will use this framework, to study a large class of strongly convex optimization methods and give an alternative explanation for the good properties of Nesterov method, as well as highlight the reasons behind the failure of the heavy ball method . In addition, using similar ideas , we will present a general framework for the non-asymptotic study of the 2-Wasserstein distance between the invariant distribution of an ergodic stochastic diﬀerential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a uniﬁed way a number of diﬀerent integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. If times allows us we will also talk about the application of these ideas to imaging inverse problems [5, 6].
In (strongly log-concave) sampling should you go explicit or implicit? I will describe some very recent (and perhaps incomplete) work that aims to study the computational eﬃciency of explicit methods for sampling problems vs implicit integrators. In particular, the class of strongly convex potentials is ideal as it allows for the explicit computation of the number of function evaluations that one needs to do in order to solve the optimization problem related to performing an implicit numerical step.
Gradient Flows and Thresholding Schemes on Graphs
We present unconstrained and constrained gradient ﬂows and thresholding schemes on graphs, which can be used in applications such as image segmentation and data clustering. We study connections between the ﬂows and schemes and investigate discrete-to-continuum limit properties.
Besides clustering and the maximum-cut problem, which other combinatorial problems on graphs are amenable to relaxation via differential equations?
Linearization of Balanced and Unbalanced Optimal Transport
Optimal transport provides a geometrically intuitive Lagrangian way of comparing distributions by mass rearrangement. The metric can be approximated by representing each sample as deformation of a reference distribution. Formally this corresponds to a local linearization of the underlying Riemannian structure. When combined with subsequent data analysis and machine learning methods this new embedding usually outperforms the standard Eulerian representation. We show how the framework can be extended to the unbalanced Hellinger–Kantorovich distance to improve robustness to noise.
For the balanced case the properties of the linearized optimal transport em-bedding are already understood quite well. In the unbalanaced case we seek to conﬁrm the well-posedness of linearized interpolations and to understand the range in which we can expect the approximation to be accurate. In both cases we look for ways to go beyond a simple one-point-linearization.
Density Estimation and Conditional Simulation Using Triangular Transport
Triangular transformations of measures, such as the Knothe–Rosenblatt rearrangement, underlie many new computational approaches for density estimation and conditional simulation. This talk will discuss several aspects of such constructions.
First we present new approximation results for triangular maps, focusing on analytic densities with bounded support. We show that there exist sparse polynomial approximations and deep ReLU network approximations of such maps that converge exponentially, where error is assessed on the pushforward distributions of such maps in a variety of distances and divergences. We also give an explicit a priori description of the polynomial ansatz space guaranteeing a given error bound.
Second, we discuss the problem of estimating a triangular transformation given a sample from a distribution of interest—and hence, transport-driven density estimation. We present a general functional framework for representing monotone triangular maps between distributions on unbounded domains, and analyze properties of maximum likelihood estimation in this framework. We demonstrate that the associated optimization problem is smooth and, under appropriate conditions, has no spurious local minima. This result provides a foundation for a greedy semi-parametric estimation procedure.
Time permitting, we may also discuss a conditional simulation method that employs a speciﬁc composition of maps, derived from the Knothe–Rosenblatt rearrangement, to push forward a joint distribution to any desired conditional. We show that this composed-map approach reduces variability in conditional density estimates and reduces the bias associated with any approximate map representation. For context, and as a pointer to an interesting application domain, we elucidate links between conditional simulation with composed maps and the ensemble Kalman ﬁlter used in geophysical data assimilation.
This is joint work with Ricardo Baptista (MIT), Olivier Zahm (INRIA), and Jakob Zech (Hei-delberg).
1. Removal of dependence, i.e., estimating and minimizing mutual information over a given space of transformations. In other words, how to ﬁnd a transformation T such that Z = T (Y, X) has minimal dependence with Y ? This turns out to be the essential underlying problem of conditional simulation, in my view.
2. Transport-based methods for (approximate) inference are a natural complement to ensemble methods for data assimilation. Yet embedding an approximate inference step into a dynamical system presents many new challenges and considerations. In what sense should inference be made accurate, given the subsequent evolution of the approximate posterior under the dynamics? How do errors in inference aﬀect stability of the associated interacting particle system? How can we use this understanding to design better algorithms?
Generative Methods for Some Inverse Problems in Imaging
Generative approaches aim to model the probability distribution of certain data so that we can sample new samples out of the distribution. In this talk, we will discuss them together with its use to tackle several imaging problems. They include a method for out-of-distribution that leverages the learning of the probability distribution of normal data through generative adversarial networks while simultaneously keeping track of the states of the learning to ﬁnally estimate an eﬃcient anomaly detector. Also, a method for image colorization will be discussed. It is based on an adversarial strategy that captures geometric, perceptual and semantic information. Finally, the use of generative methods for inpainting will be discussed.
An open/interesting problem related to this work is the suitability of generative methods to obtain multiple solutions of problems that do not have a unique solution. Another open problem is how the learning of the underlying distribution of color images is aﬀected by the color space in which those images are given.
Bayesian Estimators for Inverse Imaging Problems with Decoupled Learned Priors
Decoupled methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that was previously trained on a large natural image database. In this talk we present recent developments on such decoupled methods for two kinds of learned priors.
Part I: Concentrates on so called Plug & Play (PnP) methods, where the prior is implicitly deﬁned by an image denoising algorithm. In  we introduce theory, methods, and a provably convergent algorithm for performing Bayesian inference with PnP priors. We introduce two algo-rithms: 1) PnP-ULA (Plug & Play Unadjusted Langevin Algorithm) for Monte Carlo sampling and MMSE inference; and 2) PnP-SGD (Plug & Play Stochastic Gradient Descent) for MAP infer-ence. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for these two algorithms under realistic assumptions on the denoising op-erators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well-posed. The proposed algorithms are demonstrated on several canonical problems such as image deblurring, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and quantiﬁcation.
Part II: Concentrates on generative models where the prior in the image domain is expressed as the pushforward measure of a Gaussian distribution in latent space. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, the JPMAP approach proposed in  computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisﬁes a weak bi-convexity property which is suﬃcient to guarantee that our optimization scheme converges to a stationary point. Experimental results also show the higher quality of the solutions obtained by our joint posterior maximization approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima. Current work also explores how the quasi-bi-convexity property of the joint posterior could be exploited to provide eﬃcient posterior sampling schemes that lead to faster MMSE and uncertainty estimators.
Mario González, Rémi Laumont, Jean Prost, Valentin de Bortoli, Julie Delon, Alain Durmus, Antoine Houdard, Pablo Musé, Nicolas Papadakis, Marcelo Pereyra, Pauline Tan.
We are currently exploring theoretical, methodological and applied generalizations of the methods and algorithms described in the abstract.
Resizeable generative priors: PnP priors which are usually implemented as convolutional neural network denoisers that can be trained on image patches of a certain size, and then used on images of any size. Generative priors, in contrast, are usually trained and used on images of a ﬁxed size. This issue signiﬁcantly limits the applicability of generative priors for general inverse problems in imaging. A few solutions to this issue have been explored: (a) In  a generative prior is trained on natural image patches and then the regularization term is applied patch-wise, but this is computationally ineﬃcient and results are not competitive with other PnP approaches. (b) In  the architecture of the generative model is fully convolutional which allows the generative model to be resized. The authors show a few relatively convincing applications of this resizability feature but no in-depth theoretical explanation was provided to justify that this operation is well-founded.
Faster posterior sampling schemes: The PnP-ULA scheme only provides stability and convergence guarantees as long as the step-size satisﬁes certain limits which makes it quite slow. In order to allow for larger step-sizes and faster schemes we can consider semi-implicit or higher-order schemes. These introduce, however, the need for denoisers D such that the operator (D+ a Id) can be eﬃciently inverted. For generative priors, the pCN scheme  can perform posterior sampling based on the generator network only. We are exploring whether this scheme can be accelerated when an encoder network is also available like in the JPMAP approach .
Applications: On the applications side, we are exploring the use of PnP posterior sampling techniques for semi-blind restoration in the context of camera shake, as well as for uncertainty
Overparametrization and the Bias-Variance Dilemma
For several machine learning methods such as neural networks, good generalisation performance has been reported in the overparametrized regime. In view of the classical bias-variance trade-oﬀ, this behaviour is highly counterintuitive. The talk summarizes recent theoretical results on overparametrization and the bias-variance trade-oﬀ. This is joint work with Alexis Derumigny.
In the talk we present universal lower bounds on the bias-variance trade-oﬀ. These bounds have also to be obeyed in the overparametrized regime. It is an open problem to derive rate optimal bounds on bias and variance in the overparametrized regime for standard nonparametric and high-dimensional models. Another challenging problem is to extend the results to the trade-oﬀ between bias and mean absolute deviation, see the paper arXiv:2006.00278 for more details and a ﬁrst result in this direction.
Scalable Computation for Bayesian Hierarchical Models
I will provide an overview of very recent theory that establishes conditions under which coordinate-wise updating algorithms, in particular Gibbs samplers for posterior sampling, coordinate ascent variational inference for posterior approximation, and backﬁtting algorithms for MAP estimation, are scalable. Scalability here means that their computational complexity in order to achieve a certain approximation precision scales linearly in the size of the data and the size of the model. The results refer to a very important family of statistical models, known as crossed-eﬀect hierarchical models, which are the canonical predictive/inference models for regressing against categorical predictors. The theory brings together multigrid decompositions, concentration inequalities and random graphs.
I will describe some very recent (and perhaps incomplete) work that aims to study the computational eﬃciency of explicit methods for sampling problems vs implicit integrators. In particular, the class of strongly convex potentials is ideal as it allows for the explicit computation of the number of function evaluations that one needs to do in order to solve the optimization problem related to performing an implicit numerical step.
Optimal Cuts of Random Geometric Graphs
Given a “cloud” of n points sampled independently uniformly at random from a Eu-clidean domain D, one may form a geometric graph by connecting nearby points using a distance parameter r(n). We consider the problem of partitioning the cloud into two pieces to minimise the number of “cut edges” of this graph, subject to a penalty for an unbalanced partition. The optimal score is known as the Cheeger constant of the graph. We discuss convergence of the Cheeger constant (suitably rescaled) for large n with suitably chosen r(n), towards an analogous quantity deﬁned for the original domain D, and the related problem of optimal bisection into two pieces of equal size.
These results are for when the mean degree nr(n)d grows faster than logn; open problems include ﬁnding analogous results (i) when the mean degree grows more slowly than this, or (ii) for the k-nearest neighbour graph with k = k(n) tending to inﬁnity, or (iii) extending the results to manifolds.
Partitioning problems are are of interest in themselves when dealing with multidimensional data, and moreover the Cheeger constants provide bounds on Laplacian eigenvalues, both for graphs and in the continuum.
Discrete-Continuum Interplay: Formulations for Supervised and Semi-Supervised Learning
Graph Laplacians encode geometric information contained in data, via the eigenfunctions associated with their small eigenvalues. These spectral properties provide powerful tools in data clustering and data classiﬁcation. When a large number of data points are available one may consider continuum limits of the graph Laplacian, both to give insight and, potentially, as the basis for numerical methods. We summarize recent insights into the properties of a family of weighted elliptic operators arising in the large data limit of different graph Laplacian normalizations, and the role these operators play, both in the discrete and in the continuum, for clustering and classiﬁcation algorithms. We show consistency of optimization-based techniques for graph-based semi-supervised learning (SSL), in the limit where the labels have small noise and the underlying unlabelled data is well clustered. This formalism suggests continuous versions of SSL algorithms, making use of these differential operators.
How to leverage continuum formulations for algorithm design and implementations? How to evaluate and compare these implementations?
PDE Continuum Limits for Prediction with Expert Advice
Prediction with expert advice refers to a class of machine learning problems that is concerned with how to optimally combine advice from multiple experts whose prediction qualities may vary greatly. We study a stock prediction problem with history-dependent experts and an adversarial (worst-case) market, posing the problem as a repeated two-player game. The game is a generalization of the Kohn-Serfaty two-player game for curvature motion. We prove that when the game is played for a long time, the discrete value functions converge in the continuum to the solution of a nonlinear parabolic partial differential equation (PDE) and the optimal strategies are characterized by the solution of a Poisson equation over a De Bruijn graph, arising from the history-dependence of the experts. Joint work with Nadejda Drenska (UMN).
PDE continuum limits for adversarial multi-armed bandits. Multi-armed bandits is a machine learning problem centered around exploration vs exploitation in an online learning environment, and is used to evaluate strategies for managing large research projects or allocating funds to researchers, among other problems. The user has many possible “arms” to query (the language relates to pulling an arm of a slot machine), each of which gives the player a randomized reward (following a diﬀerent distribution for each arm). The user wants to maximize their rewards over the long run, which requires trading oﬀ exploration (pulling new arms that may end up giving poor rewards) and exploitation (repeatedly pulling the same arm that is known to give a good reward). The adversarial version of the mulit-armed bandit problem, where the rewards are controlled by an adversary whose goal is to minimize the player’s rewards, is very closely related to the prediction from expert advice problem covered in my talk, except that the player has only partial knowledge in the multi-armed bandit setting. An interesting problem would center around determining a PDE continuum limit for the value function of an adversarial multi-armed bandit problem, and using this to construct asymptotically optimal strategies for the player and adversary.
Stochastic Gradient Descent in Continuous Time: Discrete and Continuous Data
Optimisation problems with discrete and continuous data appear in statistical estimation, machine learning, functional data science, robust optimal control, and variational inference. The ‘full’ target function in such an optimisation problems is given by the integral over a family of parameterised target functions with respect to a discrete or continuous probability measure. Such problems can often be solved by stochastic optimisation methods: performing optimisation steps with respect to the parameterised target function with randomly switched parameter values. In this talk, we discuss a continuous-time variant of the stochastic gradient descent algorithm. This so-called stochastic gradient process couples a gradient ﬂow minimising a parameterised target function and a continuous-time ‘index’ process which determines the parameter.
We ﬁrst brieﬂy introduce the stochastic gradient processes for ﬁnite, discrete data which uses pure jump index processes. Then, we move on to continuous data. Here, we allow for very general index processes: reﬂected diffusions, pure jump processes, as well as other L´evy processes on compact spaces. Thus, we study multiple sampling patterns for the continuous data space. We show that the stochastic gradient process can approximate the gradient ﬂow minimising the full target function at any accuracy. Moreover, we give convexity assumptions under which the stochastic gradient process with constant learning rate is geometrically ergodic. In the same setting, we also obtain ergodicity and convergence to the minimiser of the full target function when the learning rate decreases over time suﬃciently slowly. We illustrate the applicability of the stochastic gradient process in a simple polynomial regression problem with noisy functional data, as well as in physics-informed neural networks approximating the solution to certain partial differential equations.
We propose a continuous-time optimisation algorithm that needs to be discretised to be used in practice. Hence, we need to ﬁnd a suitable time-stepping method, for which we have two constraints: (1) a suitable time-stepping method should retain the same/similar ergodic behaviour as the stochastic gradient process; (2) the method should be computationally cheap to be applicable in large scale estimation and learning problems.
Approximation Properties of Two-Layer Neural Networks with Values in a Banach Space
Approximation properties of inﬁnitely wide neural networks have been studied by several authors in the last few years. New function spaces have been introduced that consist of functions that can be eﬃciently (i.e., with dimension-independent rates) approximated by neural networks of ﬁnite width. Typically, these functions are supposed to act between Euclidean spaces, typically with a high-dimensional input space and a lower-dimensional output space. As neural networks gain popularity in inherently inﬁnite-dimensional settings such as imaging, it becomes necessary to analyse the properties of neural networks as nonlinear operators acting between inﬁnite-dimensional spaces. In this talk, I will present dimension-independent Monte-Carlo rates for neural networks acting between Banach spaces with a partial order (vector lattices), where the ReLU no linearity will be interpreted as the lattice operation of taking the positive part.
1. Studying inﬁnitely wide/deep vector-valued networks with a more complex architecture: multilayer networks, ResNets, CNNs. Perhaps also other types of networks such as recurrent neural networks (I know very little about them);
2. Neural networks with a ‘regularisation parameter’ that could be used to solve inverse problems for diﬀerent noise levels without retraining (e.g., variable depth, scaling of the bias terms).
Equivariant Neural Networks for Inverse Problems
In recent years the use of convolutional layers to encode an inductive bias (translational equivariance) in neural networks has proven to be a very fruitful idea. The successes of this approach have motivated a line of research into incorporating other symmetries into deep learning methods, in the form of group equivariant convolutional neural networks. Much of this work has been focused on rototranslational symmetry of the Rd, but other examples are the scaling symmetry of Rd and rotational symmetry of the sphere. In this work, we demonstrate that group equivariant convolutional operations can naturally be incorporated into learned reconstruction methods for inverse problems that are motivated by the variational regularisation approach. Indeed, if the regularisation functional is invariant under a group symmetry, the corresponding proximal operator will satisfy an equivariance property with respect to the same group symmetry. As a result of this observation, we design learned iterative methods in which the proximal operators are modelled as group equivariant convolutional neural networks. We use rototranslationally equivariant operations in the proposed methodology and apply it to the problems of low-dose computerised tomography reconstruction and subsampled magnetic resonance imaging reconstruction. The proposed methodology is demonstrated to improve the reconstruction quality of a learned re-construction method with a little extra computational cost at training time but without any extra cost at test time.
In this talk we will make use of the learned proximal gradient method where the “proximal operator” is replaced by a neural network which can be trained either before or after it is being inserted into the algorithm. If these neural networks are indeed proximal operators with the correct scaling, then classical convergence results can be utilized. More interesting in applications is to relax these conditions as much as possible in order to allow for more ﬂexibility and adaptation to data. Some results for averaged operators exist in this direction but these are not yet fully satisfactory. Beyond the convergence of the algorithm it is of interest to mathematically analyze its limit and its regularization properties in an inverse problems context.
Structured Singularities in Deep Learning
We will discuss various instances of structured singularities appearing in modern machine learning applications. A structured singularity is a discontinuity or a region where a function is non-smooth, which is itself a (not necessarily smooth) manifold. Such structured singularities appear naturally in classiﬁcation problems as class boundaries. It can be shown that in this case, the regularity of the singularities is the decisive factor in describing the hardness of the underlying learning problem. In addition, learning classiﬁcation problems with smooth boundaries is one of the few examples where deep neural network architectures provably outperform classical methods in learning applications. We will describe some partial results quantifying the eﬀect of singularities and discuss the regularity assumption. The second example of structured singularities in learning will be discussed in the framework of inverse problems appearing in imaging. In many inverse problems, based on pseudodifferential operators, one has a precise characterisation of how the for-ward and inverse operators transform singularities. Due to this, it seems promising to understand how deep neural network architectures that attempt to approximate the inverse operator transform singularities through the layers. Here, however, one runs into the fundamental problem that singularities are a phenomenon of the continuum, whereas neural networks acting on digital images are inherently discrete. We will discuss this question in detail for the inverse tomography problem.
1. How appropriate is the smoothness assumption for classiﬁcation problems? Is it possible to estimate the regularity of classiﬁcation problems in real-world scenarios? Is it possible to set up a meaningful numerical example? Can additional implicit assumptions on the decision boundary, such as locally minimal perimeter, be used to obtain reasonable regularity estimates?
2. Assuming a CNN is used to approximate an operator. What is the natural continuum analogue to a CNN when singularity propagation is concerned? What is the correct notion of a singularity in a digital image in a learning-theoretical context? How is this notion of a digital singularity affected by an application of a neural network?
Geometric DL - From Learning ODE Dynamics towards Graph Neural Diffusion
In recent years the area of geometric deep learning has attracted significant attention in machine learning theory as well as in data science applications.
Two reasons for that development are that graph neural networks can effectively address different non-Euclidean geometric data domains, and that geometric priors can induce appealing analytical properties like symmetry and scale separation for representing physically structured data.
Although deep learning has been effectively analyzed and used for learning and discovering dynamics from data, e.g., in the form of neural ODEs, physics-informed NNs or mean field control models, the combination of PDEs with graph neural networks to explain spatio-temporal dynamics has been hardly explored.
This talk presents new methods for learning normal forms directly from dynamic data and new mesh graph neural networks useful for physics-informed 4d medical imaging raising an outlook towards graph neural diffusion and beyond.
Model-Informed Graph Neural Networks - Are graph neural networks more generalizable for model-informed machine learning problems compared to classical neural networks?
1. Modeling: How to combine graph neural networks properly with PDEs to achieve model-informed GNN?
2.Expressiveness: Are flow-induced function spaces more expressive for GNNs with model-informed constraints than Barron spaces?
3. Generalization: How does the double descent phenomena show up in GNNs when we know more about the dynamics underlying?
The Nonlocal p-Laplacian Evolution Problem on Sparse Graphs: The Continuum Limit
The non-local p-Laplacian evolution equation, governed by a given kernel, has applications invarious areas of science and engineering. In particular, it is modern tools for massive data processing (including signals, images, geometry), and machine learning tasks such as classiﬁcation. In practice, however, this model is implemented in, time and space, discrete form as a numerical approximation to a continuous problem, where the kernel is replaced by an adjacency matrix of graph. In this work, we propose a far-reaching generalization of the results in   to a much more general class of kernels and initial data and the case p = 1 which was not handled in  . Combining tools from graph theory, convex analysis, non-linear semigroup theory and evolution equations, we give a rigorous interpretation to the continuous limit of the discrete non-local p-Laplacian evolution on sparse graphs. More specifically, we consider a sequence of graphs converging to a so-called limit object known as the Lq-graphon  [3, 4]. If the continuous p-Laplacian evolution is properly discretized on this graph sequence, we prove that the solutions of the sequence of discrete problems converge to the solution of the continuous problem governed by Lq-graphon, when the number of graph vertices grows to inﬁnity. Along the way, we provide a consistency/error estimate between the solutions of two evolution problems governed by two kernels and two initial data. For random graph sequences, using sharp deviation inequalities,we deliver nonasymptotic convergence rates in probability and exhibit the diﬀerent regimes depending on p and the regularity of the Lq-graphon and the initial condition.
By iterating the p-Laplacian operator (for p = 2), we get so-called the nonlocal bilaplacian operator  , which can be extended to a general operators called the nonlocal p-bilaplacian, for p ∈ (1, +∞). It will be very interesting to study the well-posedness of diﬀerent problems governed by this family of operators e.g. evolution, variational and boundary value problems associated to this operators. It will be also interesting to study their continuum limits
Self-supervised Learning for 3D Shape Analysis
While neural networks have swept the ﬁeld of computer vision and replaced classical methods in most areas of image analysis and beyond, extending their power to the domain of 3D shape analysis remains an important open challenge. In my presentation, I will focus on the problems of shape matching, correspondence estimation and shape interpolation and develop suitable deep learning approaches to tackle these challenges. In particular, I will focus on the difficult problem of computing correspondence and interpolation for pairs of shapes from different classes – say a human and a horse – where traditional isometry assumptions no longer hold.
1. How can we extend the power of deep networks to the ﬁeld of 3D shape analysis?
2. How can deep learning help to perform camera-based 3D reconstruction of static and dynamic scene?
3. How can deep learning be applied to analyzing 3D shapes – for problems such as estimating correspondence, computing interpolations between shapes and measuring shape similarity?
4. What are the strengths and weaknesses of different representations of shape in the age of deep learning?
Non-Uniform Motion Blur Kernel Estimation via Adaptive Decomposition
Motion blur estimation remains an important task for scene analysis and image restoration. In recent years, the removal of motion blur in photographs has seen impressive progress in the hands of deep learning-based methods, trained to map directly from blurry to sharp images. Characterization of the motion blur, on the other hand, has received less attention, and progress in model-based methods for deblurring lags behind that of data-driven end-to-end approaches.
In this work we revisit the problem of characterizing dense, non-uniform motion blur in a single image and propose a general non-parametric model for this task. Given a blurry image, a neural network is trained to estimate a set of image-adaptive basis motion kernels as well as the mixing coefficients at the pixel level, producing a per-pixel motion blur ﬁeld.
When applied to real motion-blurred images, a variational non-uniform blur removal method fed with the estimated blur kernels produces high-quality restored images. Qualitative and quantitative evaluation shows that these results are competitive or superior to results obtained with existing end-to-end deep learning (DL) based methods, thus bridging the gap between model-based and data-driven approaches.
We are interested in deriving image restoration methods that combine model-driven and data-driven approaches. In this application, the image formation model is used to learn the local blur kernels from data. Deblurring is performed in a second stage, using a classic non-blind deblurring method. The next step would be to include kernel estimation and deblurring into the same pipeline.
Matthew Thorpe is a lecturer in Applied Mathematics at the University of Manchester and works on optimal transport and large data limits of graph based variational problems that arise in semi-supervised and unsupervised learning. Previously he was a research fellow at the University of Cambridge and a Postdoctoral Research Associate at Carnegie Mellon University. He completed his PhD in 2015 at the University of Warwick.
Patricia Vitoria submitted her PhD thesis at the Image Processing Group of the Universitat Pompeu Fabra in Barcelona under the supervision of Coloma Ballester. During her PhD studies, she actively collaborated with the University of Cambridge and Universidad de la Republica. She is currently working at Huawei Zurich Research Centre. She obtained a bachelor degree in Audiovisual Systems Engineering from the University Pompeu Fabra, after which she completed an MSc. degree in Informatics at the Technical University of Munich.
Her research interests are in the field of image restoration and computer vision, more specifically camera artefacts, image and video in-painting, colorization and de-blurring using both model-based and machine learning approaches.
Bamdad Hosseini is a Senior Postdoctoral Fellow in Computing and Mathematical Sciences at California Institute of Technology (Caltech). His research interests lie at the intersection of applied mathematics, probability theory, and statistics focusing on the analysis and development of methods for extracting meaningful information from data. Hosseini has previously served as a von Karman instructor at Caltech and received a PhD in Mathematics from Simon Fraser University.