The best kind of problems to solve are linear measurements

where is a square matrix, and essentially requires linear programming (or simple matrix calculations).

minimize

subject to

which equivalently, has the closed form solution given by

.

If is full rank and square, we have a unique solution that satisfies a one-one mapping between inputs and outputs . If we go slightly out of comfort zone, we come to the easiest class of nonlinear programming, for linear measurements, which is convex optimization. Why? Because in convex functions, local optimum is global optimum. They are nice that way: as long as you can ensure that with each iteration, you are reducing your objective value, at an appropriate enough rate, you will converge to the correct solution.

An example of this type of problem is still , with being a tall matrix, which can be formulated as a least-squares problem. If is full rank, it has a unique solution that can be computed using a matrix pseudo-inverse. (Note: In this model, there is no one-one mapping between ‘s and ‘s. There are more equations than variables, which means we may not be able to find a set of ‘s which satisfy all of the equations. We can however find an which comes *close enough* to the measurements, by minimizing the deviation of the measurement model from the actual measurements).

minimize

which equivalently, has the closed form solution given by

.

What about linear-measurement models that are inherently non-convex? Consider to be a fat matrix, we can have infinitely many solutions ( is rank deficient). We further impose a restriction on to be k-sparse (of course, this imposition is natural enough. We are collecting less information about our signal than we’re supposed to. If we want to uniquely recover the signal, we need to utilize certain properties about the structure of the signal). The problem is called the *compressed sensing* problem, and can be stated as

minimize

subject to .

This problem is np-hard to solve. However, if we choose such that it satisfies *restricted isometry property *depending on parameter , then we can still uniquely recover , using convex optimization. Of course there are non-convex approaches as well, but a common convex approximation is the lasso problem

minimize .

This is harder than our vanilla least-squares formulation, but still, very much in the realms of the well established convex optimization algorithms. Now let’s get even more out of our comfort zone. Non-linear problems that are non-convex. So we come to phase retrieval

which is equivalent to

for

or

.

Insane! Phase contains most of the information! As you might have noticed the trend, we’re trying to recover more using lesser and lesser information. A common analysis suggests that most of the information about a signal is contained in the phase of the measurements, not the magnitudes.

Now this function is highly non convex. Why? For convex functions all sub-level sets are convex. What that means is, take a convex curve. Think parabola symmetric about y axis. And now cut it with , for any , and look at the part of the parabola that lies below ( y-axis is ). We want to look at all such that , for any . It just forms a line segment on the x-axis. Now a line segment is a convex set. This parabola hence has a unique minimum.

Now think of a highly irregular curve with say 5-6 local minima, 5-6 global maxima and 1 global minimum. Such curves are not convex, because when you project them on to the x-axis, they don’t form convex sets.

Of course, we then resort to using the first trick from our new signal processing handbook. Make things *linear* and *convex*. A clever way to look at

.

is to “linearize” it .

then

where were looking at an element-wise product, on the left side. This can be better written as

where is a diagonal matrix . And so the problem turns into

minimize over .

It’s easy to see that this problem is not convex. Because the entries of diagonal of are restricted to be phase values. They have to have unit magnitude. This isn’t a convex set. So now what do we do ?

What AltMinPhase does is that it looks at it as 2 alternating minimizations with and being variables. For constant , this is convex and is a least-square problem if is full rank. For constant , it is easy to see that the optimal is given by

.

They then alternatively, minimise over these 2 variables. Now think back to the highly non-convex function with multiple local minima and maxima. If you initialize wrongly, you’re very likely to hit a local minimum. And that’s the end of it. Wrong solution!

Initialize correctly, close enough to the global minimum (which should ideally be very close to the true , for a noiseless case), and bam! It’s a very fast convergence algorithm. If is iid Gaussian, we can design an initial vector which has an expected value equal to the true value . Beauty of random matrix theory (the same thing that helped us solve the compressed sensing problem).

*******

In the next few posts, I will write about the intuition and big picture of some of the current topics in signal processing, like structured sparsity, phase retrieval, random matrix theory and some applications in machine learning. In doing so, I hope to create a good database of ideas related to my research.