Signal Measurements: From linear to non-linear

The best kind of problems to solve are linear measurements

y = Ax

where A is a square matrix, and essentially requires linear programming (or simple matrix calculations).

minimize  c

subject to  Ax = y

which equivalently, has the closed form solution given by

x = A^{-1} y.

If A is full rank and square, we have a unique solution that satisfies a one-one mapping between inputs x_i and outputs y_i. If we go slightly out of comfort zone, we come to the easiest class of nonlinear programming, for linear measurements, which is convex optimization. Why? Because in convex functions, local optimum is global optimum. They are nice that way: as long as you can ensure that with each iteration, you are reducing your objective value, at an appropriate enough rate, you will converge to the correct solution.

An example of this type of problem is still y=Ax, with A being a tall matrix, which can be formulated as a least-squares problem. If A is full rank, it has a unique solution that can be computed using a matrix pseudo-inverse. (Note: In this model, there is no one-one mapping between x_i‘s and y_i‘s. There are more equations than variables, which means we may not be able to find a set of x_i‘s which satisfy all of the equations. We can however find an x  which comes close enough to the measurements, by minimizing the deviation of the measurement model from the actual measurements).

minimize  \|Ax-y\|^2_2

which equivalently, has the closed form solution given by

x = (A^TA)^{-1} A^Ty.

What about linear-measurement models that are inherently non-convex? Consider A to be a fat matrix, we can have infinitely many solutions (A is rank deficient). We further impose a restriction on x to be k-sparse (of course, this imposition is natural enough. We are collecting less information about our signal than we’re supposed to. If we want to uniquely recover the signal, we need to utilize certain properties about the structure of the signal). The problem is called the compressed sensing problem, and can be stated as

minimize \|y-Ax\|_2

subject to \|x\|_0 \leq k.

This problem is np-hard to solve. However, if we choose A such that it satisfies restricted isometry property depending on parameter k, then we can still uniquely recover x, using convex optimization. Of course there are non-convex approaches as well, but a common convex approximation is the lasso problem

minimize  \|Ax-y\|^2_2 + \lambda \|x\|_1.

This is harder than our vanilla least-squares formulation, but still, very much in the realms of the well established convex optimization algorithms. Now let’s get even more out of our comfort zone. Non-linear problems that are non-convex. So we come to phase retrieval

y = |Ax|

which is equivalent to

y_j = |a_j^T x| for j=1,2\dots m

or

y_j = (a_j^T x)^2 .

Insane! Phase contains most of the information! As you might have noticed the trend, we’re trying to recover more using lesser and lesser information. A common analysis suggests that most of the information about a signal is contained in the phase of the measurements, not the magnitudes.

Now this function is highly non convex. Why? For convex functions all sub-level sets are convex. What that means is, take a convex curve. Think parabola symmetric about y axis. And now cut it with y = t , for any t, and look at the part of the parabola that lies below y=t ( y-axis is f(x)). We want to look at all x such that f(x) < t, for any t. It just forms a line segment on the x-axis. Now a line segment is a convex set. This parabola hence has a unique minimum.

Now think of a highly irregular curve with say 5-6 local minima, 5-6 global maxima and 1 global minimum. Such curves are not convex, because when you project them on to the x-axis, they don’t form convex sets.

Of course, we then resort to using the first trick from our new signal processing handbook. Make things linear and convex. A clever way to look at

y = |Ax|.

is to “linearize” it .

y_{ph} = phase(Ax)

then

y \circ y_{ph} = Ax

where were looking at an element-wise product, on the left side. This can be better written as

Cy = Ax

where C is a diagonal matrix C = diag(y_{ph}). And so the problem turns into

minimize over x,C\quad\|Ax-Cy\|_2.

It’s easy to see that this problem is not convex. Because the entries of diagonal of C are restricted to be phase values. They have to have unit magnitude. This isn’t a convex set. So now what do we do ?

What AltMinPhase does is that it looks at it as 2 alternating minimizations with C and x being variables. For constant C, this is convex and is a least-square problem if A is full rank. For constant x, it is easy to see that the optimal C is given by

C = phase( y_{apparent}) = phase(Ax).

They then alternatively, minimise over these 2 variables. Now think back to the highly non-convex function with multiple local minima and maxima. If you initialize wrongly, you’re very likely to hit a local minimum. And that’s the end of it. Wrong solution!

Initialize correctly, close enough to the global minimum (which should ideally be very close to the true x = x^*, for a noiseless case), and bam! It’s a very fast convergence algorithm. If A is iid Gaussian, we can design an initial vector which has an expected value equal to the true value x^*. Beauty of random matrix theory (the same thing that helped us solve the compressed sensing problem).

*******

In the next few posts, I will write about the intuition and big picture of some of the current topics in signal processing, like structured sparsity, phase retrieval, random matrix theory and some applications in machine learning. In doing so, I hope to create a good database of ideas related to my research.