The Chaos of Deep Learning


Source: xkcd

But maybe something that deepens our understanding on deep neural networks? And transports us to an uber-cool science fantasy? Well, maybe not the later.

I have a background in physics, and I’ve been pursuing problems in machine learning for quite some time now. So my brain often tries to make connections (eh? eh? neural network puns anyone?) between the glorious physics literature, and what seems to be engineers (including myself) struggling to wade through a math dump and explain why deep networks work, theoretically.

My first step towards this was issuing a book from my campus library on Nonlinear Dynamics and Chaos by Strogatz, something I’ve been meaning to read for the longest time. And the next step (though this should have been the first one), was to see if there were other people who had been making these connections before me. And there were! So here are some interesting articles that I came across:

  1. How to Explain Deep Learning using Chaos and Complexity
  2. Understanding the depth in deep learning through the lense of chaos theory

Now, I don’t know if I can do as good a job as these guys in simplifying the text, but I’ll surely be posting something on this shortly. Till then, do check these articles out!

Why Murphy was probably right

So, there’s this law by Murphy that most of you must be aware of.

If anything can go wrong, it will.

Now, the origin of Murphy’s law is quite well explained here.

And so goes the original Murphy’s law:

If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it.

Now the situation that gave rise to this quote is something like this.

Edward A. Murphy, Jr. was one of the engineers on the rocket-sled experiments that were done by the U.S. Air Force in 1949 to test human acceleration tolerances (USAF project MX981). One experiment involved a set of 16 accelerometers mounted to different parts of the subject’s body. There were two ways each sensor could be glued to its mount, and somebody methodically installed all 16 the wrong way around. Murphy then made the original form of his pronouncement, which the test subject (Major John Paul Stapp) quoted at a news conference a few days later. (Source)

I’d think the odds of failure were quite high. How?

The person in charge of installing the accelerometers can be called Mike. Why? It’s a standard enough name. Now, Mike probably wasn’t a smart enough guy.

  1. He did not know which side of the sensor went where and randomly installed all accelerometers, using no common sense, failing to set the right combinations = 0.5 \times (1 - 0.5^{16}) [FAIL]
  2. He did not know which side went where and randomly installed all accelerometers, using no common sense  but luckily fixing the right combinations = 0.5 \times (0.5^{16}) [SUCCESS]
  3. He had the sides interchanged and installed all in the same way. Well, at least he had some common sense to install all in the same way = 0.5 \times 0.5 [FAIL]
  4. Mike was smart. He got the sides right and had the sense to install all in the right way = 0.5 \times 0.5 [SUCCESS]

Let’s give Mike the benefit of doubt. Maybe he was smart. Let’s assign that a probability of 0.5. Probability Mike was dumb is 0.5.

The probability of failure is then approximately 0.75. If you think of any ideal situation too, the probability of the chain of events leading to a success, when multiplied, is quite low.

Let’s look at it this way. The event: Me getting a sound night’s sleep. Shouldn’t be hard right?

Why it doesn’t work: I have a roommate who keeps talking loudly on the phone till wee hours. Why would I have a roommate? I am a research assistant, we don’t get paid well enough for me to be able to afford a better room. Why am I a research assistant? I want to do a PhD. Why do I want to do a PhD? You get the drift.

Turns out, I was almost destined to have a painful right ear, being subjected to continuous loud mindless rants in the middle of the night. The consequences of a lot of our actions aren’t really predictable until events transpire in due course of time. But when they do happen, it’s not that hard to chart out the trajectory of what might have caused them. And so, if anything wrong can happen, perhaps your brain is able to trace that trajectory in advance to forecast what will go wrong.

Here’s the catch though. When things are expected to go wrong and they don’t, we are so happy with the outcome, we barely recognize it as a failure of the law. So Murphy was a genius, in framing a law whose exceptions would go easily unnoticed. Whoever thought that something so iconic would come out of so much pessimism?

Then again, as Phil Dunphy from Modern Family would say…


The Quantum Key to Understanding

I had taken a course on Quantum Information and Computation during my undergrad, and I learnt a lot of cool encryption strategies. For those that are new to the field of cryptography, the objective is to encode information using a shared key between an encoder (Alice) and a decoder (Bob). Anyone who doesn’t have the shared key will not be able to decode this information. Of course, the eavesdropper (Eve) may iteratively try out several different keys to successfully decode the message. The ease of decoding by a third party would determine the robustness (or lack of) of the encryption strategy.

Now, in quantum information theory, bits of information is encoded in terms of the spin of the particle (for instance, an electron can have a spin quantum number +\frac{1}{2} or -\frac{1}{2} ).

These two states are orthogonal to each other, as if the electron suffers from a split personality disorder. It can either have a positive spin (0) or a negative spin (1) but not both at the same time. The glass is either full or empty. There are associated probabilities with both events. Since these two events constitute a partition, their probabilities add up to  1 and equal to \frac{1}{2} each.

Now suppose the electrons think of getting a better perceptive. For half the time, it has a positive spin and for the remaining half, negative (0H and 1H). A glass half full or half empty kinda situation.

This forms the Hadamard basis. I suppose Hadamard was a rational guy*.


Let’s figure out the whole encoding  and decoding strategy.


Now, the probability that the eavesdropper Eve picks the same measurement basis as the encoder Alice, is \frac{1}{2} ( because there are only two possible bases: the standard basis and the Hadamard basis). If she does, she correctly decodes the bit encoded by Alice. If instead, Eve chooses the wrong basis, the probability of that is \frac{1}{2}. Subsequently the probability she guesses the right bit is \frac{1}{2}.

So, probability ( Eve guesses correctly ) = \frac{1}{2} + \frac{1}{2} x \frac{1}{2} = \frac{3}{4}.

Probability ( failure of encryption ) = Probability ( Eve guessing correctly ) = \frac{3}{4}.

Pretty high huh? Well, luckily math in on our side. So one would rarely encode information in 1 bit right? Most messages are 10s and 100s of bits long. Maybe even more! Let’s see how the problem works out then.

For the encryption to fail, it must fail for every single bit.

Probability ( failure of encryption )
= Probability ( bit 1 fails ) x ….. Probability ( bit n fails )
= (\frac{3}{4})^n.

Probability ( success of encryption )
= 1 – Probability ( failure of encryption )
= 1 – (\frac{3}{4})^n.

This value approaches 1 as n approaches \infty . Even for n=10, the probability of success of encryption is 0.944. See how the tables turned? What are the odds of that happening? (well, you know the answer now). Classic quantum mess around. What I discussed here is also called the BB84 quantum key distribution protocol. You could read more about it here: BB84.

Now this is a brilliant way to look at situations in life, in general, isn’t it? Aren’t the events in life also probabilistic? I follow this up in my next post: Why Murphy was probably right.

*Side note: There’s a disturbing lack of females in applied mathematics. I’d most naturally tend to assume that a mathematician is a guy. Here’s an article on how, even though women exist in science, they generally take up positions in biology and healthcare related fields, instead of more mathematically gruesome areas:women in science.

The Physics of Interpersonal Relationships

1. Frictional force is directly proportional to normal reaction force between two people. If the two entities are close to each other, and at opposition, the more reaction supplied by either parties, the more friction. Also, work done by frictional force is non conservative in nature. One can only expend more and more time and energy into bickering with the other party, without expecting equivalent valuable output. Much is lost as heat.

2. Gravitational force is inversely proportional to the square of the physical distance between two people. If you find yourself gravitating towards someone, when you shouldn’t be, maintain a physical separation of at least 15 ft. That minimizes chance of physical contact and any form of non-awkward conversation.

3. If you find yourself fast accelerating into an unwanted situation with someone, the only way to change course of the events is to apply a jerk. Change in acceleration can be characterized by either one entity or favorably both being jerks.

4. The first law of thermodynamics states that energy of an isolated system is constant. One may decide to channel their energies into productive work and recreational hobbies, or indulge in interpersonal warfare, but not both. I’d pick the first.

5. Neural pathways get trained based on the associations between input stimulus to the output reaction. The weights on these pathways can be adjusted by experimentally varying the reactions to specific stimuli. If you dislike someone, for eg. a woman in your workplace that completely cold shoulders you even when you are sweet to her and offer sage advice on where to get the best kind of food in the locality, and giving her a heads up in the time of need, your most natural reaction is to detest that woman. You feel insulted. Now replace this feeling with that of pity for the poor woman who can’t see right from wrong.

Some associations might be tagged as positive compulsions. For example, when you feel disappointed by the lack of people to talk to, start writing a blog post instead. Wink wink.

6. Newton’s law of cooling suggests that rate of cooling is proportional to difference in temperature of the body and its surroundings. If you find yourself constantly agitated by your surroundings, walk out of the room. Staying in the room will only maintain or increase your agitation.

7. If two people have a charged discussion and are speeding through an argument, then in the presence of a magnetic field, they’re likely to experience a perpendicular force, called the Lorentz force, most likely in the form of a slap or punch, not by Lorentz himself.

8. If two interacting people have the same wavelength, a slight change in wavelength causes the formation of beats. A pattern of constructive and destructive interferences follows. In that case, just listen to the beat of your own music, some sound logic will follow.

Time Dilation

If you had a choice, would you speed through life or would you rather have time stand still?

Of course, any change in the perception of time would deal with three things:

1. Traveling at speeds close to speed of light. Relativistic effects.

2. Changing how your neurons perceive light, sound and other stimuli that model time. Zeitgebers.

3. Changing external stimuli in ways that trick your brain into perceiving time differently.

Now, I’ll rule out 1, though my inner physicist might love to get back to it at some point in the future. Points 2 and 3 are equally interesting.

Point 2 has much to do with the rate with which your brain perceives new information, and what weight it gives to that new piece of information. I watched a video on Stoner sloth today that sparked this thought process. Smoking cannabis has an effect of slowing down the perception of time. Smokers tend to react slowly to external stimuli, and hence like a sloth. Now, I haven’t ever smoked weed, nor do I wish to, but I do wonder about the cognitive process that goes into this phenomenon. I suppose the neural pathways linked to visual and auditory stimuli are activated a lot more than other senses, which possibly leads to an information overload on them. There’s a also a sense of hyperfocusing. The intensity with which you sense your surroundings, is high, much like during meditation. There’s also an associated thought jumping. The mind makes associative thoughts quite quickly and abruptly, which leads the user to believe that a huge stream of thoughts have passed the mind in a less time, giving the perception of an elongated time frame.

Point 3 is something I read about two days back, on virtual reality using oculus rift, source: IEEE Xplore. They’ve set up a cool experiment in which they divide people into three sets, formulating three different visual stimuli, using three different test conditions. For all three sets, the basic scene was the same; a beach side with a setting sun. In the second and third sets, they introduced verbal and spatial tasks, by flashing letters and mobile objects, respectively. The three different test conditions were: a stationary sun, a realistically setting sun, a sun setting twice as fast.

Due to lack of any other stimulus, the people in set 1 overestimated time, and were hugely affected by the setting time of the sun, in estimating the length of time spent.

Meanwhile people in sets 2 and 3 under estimated the passage of time, due to their brains being involved in several cognitive tasks. They were also relatively less affected by the setting time of the sun.

While this experiment serves as a good lesson in virtual reality, it does make me wonder, if we will be able to simulate conditions that would, say make us more productive by giving the illusion of time flying. Or slowing time down if we feel overwhelmed by the cascade of things happening in our lives. I guess only time will tell?

Image source:

Quantum Superpositions of Life. Or not.


For everyone familiar with the quantum superposition principal (or have heard the folklore of a cat owned by a certain Schrodinger), the state of a quantum system is determined by the superposition of square-rooted probabilities of all possible states that the system can take. Practically speaking your future is

\sqrt{\alpha}*\left|bright\right\rangle + \sqrt{1-\alpha}*\left|bleak\right\rangle

where 0 < \alpha < 1 and \left|bright\right\rangle \:\:and\:\: \left|bleak\right\rangle are orthonormal states.

While there is no inherent ‘quantumness’ to this scenario, it still obeys a superposition principal, except that \alpha is undetermined. It however, is not impossible to fit a probabilistic model into this framework, based on past experiences.

The brightness or bleakness of the future gets determined, when one gets to the future (or in other words, performs a direct measurement of the system). It’s the same with a quantum system. The superposition collapses once a measurement is performed.

What I’m trying to say is that, the superposition principle isn’t unique to the quantum world. However, what physicists take advantage of, in quantum systems, is retaining the super-imposibility of the system. The idea is to perform a series of operations on a superimposed system, that does not at any point destroy the superposition, and to take advantage of the fact that the value of \alpha is known.

For instance, one may seek to turn tables. In which case, one applies a NOT gate, which happens to be a Pauli X matrix.

X*(\sqrt{\alpha}*\left|bright\right\rangle + \sqrt{1-\alpha}*\left|bleak\right\rangle ) = \sqrt{\alpha}*\left|bleak\right\rangle + \sqrt{1-\alpha}*\left|bright\right\rangle .

The idea of teleportation has forever fascinated mankind, and somehow people believe that if quantum teleportation is possible, an equivalent human form will exist some day. However, the idea behind teleportation, even in the quantum world, pertains to teleportation of information, not the particle itself. So while it might entail faster that light communication, it does not validate the simultaneous disappearance and reappearance of the qubit. This is where the ‘quantumness’ comes to play, due to a concept called quantum ‘entanglement’, that I shall discuss in the next post. Probably.