arxivst stuff from arxiv that you should probably bookmark

Deep Relaxation: partial differential equations for optimizing deep neural networks

Abstract · Apr 17, 2017 11:21 ·

equation differential hamilton stochastic jacobi entropy pde viscous local cs-lg math-ap math-oc

Arxiv Abstract

  • Pratik Chaudhari
  • Adam Oberman
  • Stanley Osher
  • Stefano Soatto
  • Guillame Carlier

We establish connections between non-convex optimization methods for training deep neural networks (DNNs) and the theory of partial differential equations (PDEs). In particular, we focus on relaxation techniques initially developed in statistical physics, which we show to be solutions of a nonlinear Hamilton-Jacobi-Bellman equation. We employ the underlying stochastic control problem to analyze the geometry of the relaxed energy landscape and its convergence properties, thereby confirming empirical evidence. This paper opens non-convex optimization problems arising in deep learning to ideas from the PDE literature. In particular, we show that the non-viscous Hamilton-Jacobi equation leads to an elegant algorithm based on the Hopf-Lax formula that outperforms state-of-the-art methods. Furthermore, we show that these algorithms scale well in practice and can effectively tackle the high dimensionality of modern neural networks.

Read the paper (pdf) »