NIPS 2017: Themes and Takeaways

Ideas that caught my eye, some that went over my head and few that permeated straight through to the brain. All while at NIPS ’17.

Disclaimer: This list is in no way exhaustive. There was a LOT of information all around me, and I could only make note of a limited themes that I either found interesting or relevant.

  1. Phase retrieval/solving random quadratic system of equations.

    I’ll jump straight to this topic, because it was the most relevant one to me, and I found a couple of interesting papers on this, namely:
    i. Solving Most Systems of Random Quadratic Equations [Poster]
    (uses iteratively reweighted gradient descent approach, to achieve information theoretically optimal Gaussian sample complexity).
    ii. Convolutional Phase Retrieval [Poster]
    (proposes a new sensing procedure composed of convolutions of a Gaussian distributed filter).
    iii. A Local Analysis of Block Coordinate Descent for Gaussian Phase Retrieval [Workshop]
    (uses block coordinate descent for alternating minimization based recovery procedure from phaseless Gaussian measurements).
    iv. Fast, Sample-efficient Algorithms for Structured Phase Retrieval [Poster]
    (using alternating minimization to recover structurally sparse signals from phaseless Gaussian measurements). (this was mine, so I can’t not publicize it 😀 )

    The main takeaways, in my opinion would be
    – experimenting with structure of the signals to be recovered,
    – experimenting with the measurement setup, and
    – modifying the two standard optimization approaches:
    Wirtinger flow based gradient descent and Alternating minimization.

  2. Looking beyond gradient descent for training neural networks.

    i. Gradient Descent Can Take Exponential Time to Escape Saddle Points [Spotlight]
    (discusses advantages of perturbed gradient descent over standard gradient descent, under random initialization schemes).
    ii. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent [Workshop]

    Takeaway:
    -Gradient descent in itself is ill-equipped to escape saddle points. Perturbed/accelerated versions perform better.

  3. Sparse Bayesian learning.

    i. From Bayesian Sparsity to Gated Recurrent Nets [Orals]
    (the authors connect the sparse Bayesian learning problem to RNNs and present an LSTM model that can be used for sparse signal estimation).
    ii. DNNs for sparse coding and dictionary learning [Workshop]
    (learning sparse regularizers for sparse signal estimation, via deep networks).

    Takeaway:
    -Interesting connections between Bayesian sparse signal estimation and deep nets.

  4. Optimization techniques.

    i. A Conservation Law Method in Optimization [Workshop]
    (an interesting parallel between non-convex optimization and Newton’s second law)
    ii. Faster Non-Convex Optimization than SDG [Workshop]
    (using ε-approximations of local minima of smooth nonconvex functions)
    vi. The marginal value of adaptive gradient methods in machine learning [Orals]
    (SGD can perform better, with adequately chosen learning rate, as compared to adaptive optimizers like ADAM. One needs to rethink the optimizers being used for training deep networks) (this paper seems to have sparked a debate and even has a dedicated Reddit thread)
    v. Implicit Regularization in Matrix Factorization [Spotlight]
    (theoretical guarantees for convergence of gradient descent to minimum nuclear norm solution for matrix factorization problem, under firm initialization and step size constraints).
    vi. Generalized Linear Model Regression under Distance-to-set Penalties [Spotlight]
    (introduces a new penalty method to overcome drawback of shrinkage, while using Lasso).
    vii. Unbiased estimates for linear regression via volume sampling [Spotlight]
    (interesting technique to obtain (fat) matrix pseudo-inverse by picking only a subset of columns, hence speeding up the pseudo-inverse operation).

    Takeaway:
    -new techniques with refined theoretical guarantees for convergence of optimization procedures.

  5. New directions/miscellaneous

    i. Deep Sets [Orals] (design objective functions defined on sets that are invariant to permutations).
    ii. Unsupervised object learning from dense equivarient image labeling. [Orals]
    (using a large number of images of an object and no other supervision, to
    extract a dense object-centric coordinate frame, for 3D modelling).
    iii. Geometric deep learning on graphs and manifolds [Tutorial]
    iv. A Unified Approach to Interpreting Model Predictions [Orals]
    v. Diving into the shallows: a computational perspective on large-scale shallow learning. [Spotlight] (demonstrates that only a vanishingly small fraction of the function space is reachable after a polynomial number of gradient descent iterations when used in conjunctions with smooth kernels/shallow methods, hence exposing the limitation of shallow networks on large-scale data).

  6. Generative adversarial networks

    i. Gradient Descent GANs are Locally Stable [Orals] (utilizes non-linear systems theory to show local exponential stability of GAN optimization)
    ii. Unsupervised image-to-image translation networks [Spotlight]
    iii. Dual discriminator GANs [Spotlight] (theoretical analysis to show that, given the maximal discriminators, optimizing the generator of 2-discriminator GAN helps avoiding the mode collapsing problem).
    iv. Bridging the gap between theory and practice of GANs [Workshop]

    Takeaways:
    -new applications
    -new breakthroughs in terms of theoretical results for convergence and solving the “mode collapse” problem.

I think overall I was exposed to a lot of interesting ideas, and hopefully I will be able to make time to go through each of these papers in further detail.