The Chaos of Deep Learning

chaos

Source: xkcd

But maybe something that deepens our understanding on deep neural networks? And transports us to an uber-cool science fantasy? Well, maybe not the later.

I have a background in physics, and I’ve been pursuing problems in machine learning for quite some time now. So my brain often tries to make connections (eh? eh? neural network puns anyone?) between the glorious physics literature, and what seems to be engineers (including myself) struggling to wade through a math dump and explain why deep networks work, theoretically.

My first step towards this was issuing a book from my campus library on Nonlinear Dynamics and Chaos by Strogatz, something I’ve been meaning to read for the longest time. And the next step (though this should have been the first one), was to see if there were other people who had been making these connections before me. And there were! So here are some interesting articles that I came across:

  1. How to Explain Deep Learning using Chaos and Complexity
  2. Understanding the depth in deep learning through the lense of chaos theory

Now, I don’t know if I can do as good a job as these guys in simplifying the text, but I’ll surely be posting something on this shortly. Till then, do check these articles out!

Advertisements

Optimization in Life

I’ve been doing a lot of optimization related work and courses, for my PhD, most notably in convex optimization and non-linear programming. They say that the best way to learn theory is to implement it in real life, and so I thought that it wouldn’t hurt to find ways to optimize… life… eh? On that optimistic enough thought, here we go:

  1. The steepest descent is not necessarily the fastest. A common thing that people do when they are in an unwanted situation is to do starkly opposite of what they were initially doing, i.e. -\nabla f(x). This seems to be a go-to solution for minimizing conflict. However, it is well known that to reach the point of minimum (conflict), steepest descent can take far more number of iterations than other gradient based methods. So take it easier, guys. Extremeness is not a smart option.
  2. When bogged down by multiple issues, solve one-problem at-a-time. Coordinate descent is an approach in which the objective (life’s problems) is minimized w.r.t a fixed coordinate at a time. It’s known for its simplicity of implementation.
  3. The apple does not fall far from the tree. So when Newton came up with his method for optimizing functions, the initial estimates did not fall far from the optimum, most notably in the case of quadratic functions. Turns out, it helps to approximate functions at each point with quadratic estimates, and then to minimize that quadratic estimate. Basically, take a problem and convert it into an easier sub-problem that has a known minimum. Move on to the next sub-problem. This fetches you the global optimum much faster.
  4. While positive definiteness is ideal, positive semi-definiteness is good too. If the Hessian of the function to be minimized is positive semi-definite, then the function is convex and can be minimized easily (its local optimum is the global optimum). So keep calm (and kinda positive) and minimize issues.
  5. Often when there are too many parameters to handle, we tend to overfit a fairly complicated model to our life. In such cases, it is a good idea to penalize over-complication by adding a regularization term. Regularization also helps in solving an ill-posed problem. If we tend to focus on only a specific set of problems, we forget other facets of life, which leads us into making poorer choices. The key is to find the right balance or trade-off.
  6. Some problems actually have closed-form or unique solutions. There’s just one possible answer which is apparent enough. In that scenario the optimal strategy should be to stop optimizing. Stop contemplating, just go-get-it!
  7. On a closing note, heuristically speaking, one would need to try out a bunch of optimizing techniques to find the optimal optimization technique.
optimization

XKCD

To make this post even more meta, how optimal would it be if the moral of this post converged to this statement?