optimization

Over the past week, I attended (and presented at) one of the biggest conferences in Machine Learning: Neural Information Processing Systems (NIPS) 2017 at Long Beach, California, and the experience was nothing short of exhilarating. There were a number of themes that I made note of, and one blog post is not enough to illustrate them all. So I’ll try to enforce some structural sparsity here to reduce the complexity of this text.

1. NIPS 2017 was humongous.

About 8000 people from academia and industry, thronging to the Long Beach Convention (epi)Center to talk about ground breaking research. It was chaotic. Took me an hour of standing in line to just get my registration badge!

2. Star studded. Both in terms of people and sponsor companies.

3. GANs were an audience favorite. You know something is the new buzz word when companies turn it into a catchphrase and print it on a t-shirt (yes, I did manage to get one for myself!).

You can’t find a better endorsement! There was an entire track of talks specially dedicated to recent advancements in GANs.

4. Bridging the gap between Theory and Practice.

Ali Rahimi’s talk before accepting the Test of Time award was something that was recommended for multiple viewings by multiple people to me. And the entire focus of the talk was about bettering the current brittle algorithmic frameworks, by theoretically analyzing the entire optimization problem, and not treating it like alchemy. There was also an entire workshop dedicated to this theme.

5. “Where’s the party tonight?”

I was asked by at least 5 different people if I was attending a certain sponsor after-party. I had actually got invites to most and RSVP-ed as well, but I found myself extremely exhausted (also, running out of mingling-with-random-strangers stamina). In fact there were people who were particularly interested in the parties and had no clue about what the next talk was about. I guess beyond a point, certain level of sponsor involvement could get worrisome.

6. “Do you want some swag?”

With so many sponsor booths, they had to try different strategies to attract the best minds around. Which meant, flashy sponsor swag (translation: goodies). You could collect enough t-shirts to get through 2 weeks without laundry. These companies certainly know their target, deprived grad students, well.

7. Orals, spotlights and posters.

So much information to gather! NIPS this year, had a record 678 accepted papers, with main themes being Algorithms, Theory, Optimization, Reinforcement Learning, Applications, GANs.

8. Even more orals and posters, in the form of numerous workshops. Also guest appearances from the Women in Machine Learning (WiML) community.

9. Debates and panels.

An interesting debate on the relevance of studying the interpretability problem, sparked a conversation on the various interpretations of the term itself, and whether the problem was motivated well enough, to begin with.

10. A free flow of ideas from every corner.

Some highlights were talks (that I managed to attend) by Bertsekas, Goodfellow, etc. Couldn’t attend some of the morning ones though! And of course, there was a lot to take away from several of the posters sessions. I think I also learnt how to sell an idea better, through my own poster presentation.

I think overall, it was a great learning experience and incredible exposure for a first-timer like me. Hopefully, I will get a chance to visit again! I’m also going to write a part 2 of my experience, which will focus more on some of the more technical ideas that caught my attention at NIPS 2017. Should make a good follow up read after this one! Watch out!

I’ve been doing a lot of optimization related work and courses, for my PhD, most notably in convex optimization and non-linear programming. They say that the best way to learn theory is to implement it in real life, and so I thought that it wouldn’t hurt to find ways to optimize… life… eh? On that optimistic enough thought, here we go:

The steepest descent is not necessarily the fastest. A common thing that people do when they are in an unwanted situation is to do starkly opposite of what they were initially doing, i.e. $-\nabla f(x)$ . This seems to be a go-to solution for minimizing conflict. However, it is well known that to reach the point of minimum (conflict), steepest descent can take far more number of iterations than other gradient based methods. So take it easier, guys. Extremeness is not a smart option.
When bogged down by multiple issues, solve one-problem at-a-time. Coordinate descent is an approach in which the objective (life’s problems) is minimized w.r.t a fixed coordinate at a time. It’s known for its simplicity of implementation.
The apple does not fall far from the tree. So when Newton came up with his method for optimizing functions, the initial estimates did not fall far from the optimum, most notably in the case of quadratic functions. Turns out, it helps to approximate functions at each point with quadratic estimates, and then to minimize that quadratic estimate. Basically, take a problem and convert it into an easier sub-problem that has a known minimum. Move on to the next sub-problem. This fetches you the global optimum much faster.
While positive definiteness is ideal, positive semi-definiteness is good too. If the Hessian of the function to be minimized is positive semi-definite, then the function is convex and can be minimized easily (its local optimum is the global optimum). So keep calm (and kinda positive) and minimize issues.
Often when there are too many parameters to handle, we tend to overfit a fairly complicated model to our life. In such cases, it is a good idea to penalize over-complication by adding a regularization term. Regularization also helps in solving an ill-posed problem. If we tend to focus on only a specific set of problems, we forget other facets of life, which leads us into making poorer choices. The key is to find the right balance or trade-off.
Some problems actually have closed-form or unique solutions. There’s just one possible answer which is apparent enough. In that scenario the optimal strategy should be to stop optimizing. Stop contemplating, just go-get-it.
On a closing note, heuristically speaking, one would need to try out a bunch of optimizing techniques to find the optimal optimization technique.

XKCD

To make this post even more meta, how optimal would it be if the moral of this post converged to this statement?

The Cognitive Vortex

The Cognitive Vortex

The NIPS experience: Newbie edition

Optimization in Life