# Epsilon Net Argumentative Essays

Now we turn attention to another important spectral statistic, the least singular value of an matrix (or, more generally, the least non-trivial singular value of a matrix with ). This quantity controls the invertibility of . Indeed, is invertible precisely when is non-zero, and the operator norm of is given by . This quantity is also related to the condition number of , which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of (and more generally, of the shifts for complex ) will be of importance in rigorously establishing the circular law for iid random matrices , as it plays a key role in computing the Stieltjes transform of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.

The least singular value

which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm

at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of , but are not able to control the “generic” or “incompressible” choices of , for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with for , it gives an upper bound of for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as , though these are more difficult to compute than positive moments .

So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the rows of the matrix , and the hyperplane spanned by the other rows. The reason for this is as follows. First suppose that , so that is non-invertible, and there is a linear dependence between the rows . Thus, one of the will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so .

More generally, if the least singular value is small, one generically expects many of these distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of with a unit normal of this hyperplane.

When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal (which depends on all the rows other than ) is independent of , so even after conditioning to be fixed, the entries of remain independent. As such, the dot product is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)

These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).

To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble of random sign matrices, where are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.

Read the rest of this entry »

## R. Vershynin, On the role of sparsity in Compressed Sensing and Random Matrix Theory

In this short note, we discuss applications of some concepts of Compressed Sensing (sparsity, compressibility) in the recent work on invertibility of random matrices due to Rudelson and the author. We sketch an argument leading to the optimal bound N-1/2 on the median of the smallest singular value of an N × N matrix with random independent entries. We highlight the parts of the argument where sparsity ideas played a key role.

## T. Strohmer and R. Vershynin, Comments on the randomized Kaczmarz method

In this short note, we respond to some concerns raised by Y. Censor, G. Herman, and M. Jiang about the randomized Kaczmarz method, which we proposed earlier. Basically, we emphasize that the convergence rate of this algorithm does depend on a condition number of the system of linear equations (which in turn depends on the normalization of rows). For example, multiplying a single equation through by a constant obviously does not change the solution of the system, but it can change its condition number (and, as a result, the convergence rate of the randomized Kaczmarz algorithm).

This point was not probably very clear in our original paper. To clarify this, JFAA decided to publish the concerns of Y. Censor, G. Herman, and M. Jiang along with our response, as two notes in the same journal issue.

## R. Vershynin, Spectral norm of products of random and deterministic matrices

This paper is an attempt to understand random matrices with non-independent entries, but which can be factorized through random matrices with independent entries. In the language suitable to statistical applications, we are interested in sample covariance matrices of a wide class of random vectors -- the linear transformations of vectors with independent entries.

Here we study the spectral norm of such matrices. Suppose a matrix M can be factored as M=BA, where A is a random matrix with independent mean zero entries and B is a fixed matrix. Under the (4+epsilon)-th moment assumption on the entries of A, we show that the spectral norm of such an m × n matrix M is bounded by m1/2 + n1/2, which is sharp. In other words, in regard to the spectral norm, products of random and deterministic matrices behave similarly to random matrices with independent entries. Note that this bound is independent of the intermediate dimension through which M factors (i.e. the number of rows of A and the number of columns of B).

The main motivation to prove this result was to an application in the recent work of M. Rudelson and the author on the smallest singular value of random rectangular matrices. We showed there that the smallest singular value of a random m × n matrix with i.i.d. mean zero entries with unit variance is bounded below by m1/2 - (n-1)1/2 with high probability. This was proved under the assumptions that the entries have subgaussian tails. We left it open in that work whether the same holds under some finite moment assumption. Using the main result of the present paper, this bound automatically extends to the case when entries have bounded (4+epsilon)-th moment.

It would be interesting to know if the main result of this paper can be stated as an "isoperimetric" result. Namely, we may ask -- what matrices B maximize the expected spectral norm of M=BA, subject to natural constraints? A natural constraint, for example, is that B is an orthogonal projection onto Rm. Is it true then that the coordinate projections B are the maximizers? If so, the result of this paper would quickly follow from known estimates by Y. Seginer and R. Latala.

In absence of the isoperimetric picture, we prove our main result as follows. Let us assume m=n for simplicity. The columns of M are independent random vectors in Rn (even though the rows of M are not necessarily independent). So we can write M*M as the sum of independent rank one linear operators Xj × Xj (the tensor products of the columns of M with themselves). A sharp estimate for sums of independent random operators is given by the well known M. Rudelson's theorem. This approach leads to the bound O(n log n) on the spectral norm, i.e. we are off by a logarithmic factor from the correct answer.

The loss of a logarithmic factor is unfortunately inherent to the use of M. Rudelson's theorem, where it is needed in full generality. It would be interesting to understand the situations where the logarithmic term can be removed from M. Rudelson's theorem. So far, only one such situation is known from the work of G. Aubrun -- when the independent random vectors Xj are uniformly distributed in an unconditional convex body.

In absence of a suitable lossless form of M. Rudelson's theorem, the next stage of the argument generally resembles R. Latala's approach, where one first reduces the problem to Gaussian matrices A (by a standard symmetrization), and then uses fairly standard concentration of measure technique in the Gauss space, coupled with delicate constructions of nets. An interesting feature of this part of the argument is the use of the techique one might call the entropy-concentration tradeoff. This method has roots in some earlier arguments, including the one by R. Latala, and it has been developed in the work of M. Rudelson and the author on invertibility of random matrices.

To illustrate the entropy-concentration tradeoff, recall that we need to bound the spectral norm ||M|| = supx||Mx||, where the supremum is taken over all vectors x in the unit Euclidean sphere Sn-1. The well known "epsilon-net" method proceeds by first fixing x and bounding ||Mx|| with high probability (using standard concentration inequalities), then finishes by taking the unit bound over a sufficiently fine epsion-net of the unit sphere. Unfortunately, a single probability bound for every fixed vector x is not strong enough to make this method work. Sparse vectors -- those which have few but large nonzero coordinates -- produce worse concentration bounds than spread vectors, which have many but small nonzero coordinates. (The reason being that sparse vectors produce worse Lipshitz constants). What helps us is that there are fewer sparse vectors on the sphere than there are spread vectors. This leads to a tradeoff between concentration and entropy, i.e. between the probability with which ||Mx|| is nicely bounded, and the size of an epsilon-net for the vectors x which achieve this probability bound. One then divides the unit sphere into classes of vectors according to their ``sparsity'', and uses the entropy-concentration tradeoff for each class separately.

## M. Rudelson, R. Vershynin, The least singular value of a random square matrix is O(n^{-1/2})

Here we settle the other direction of the conjectural bound on the least singular value of a random square matrix. Let A be a matrix whose entries are real i.i.d. centered random variables with unit variance and suitable moment assumptions. Then the smallest singular value of A is of order n-1/2 with high probability. We have proved with Mark Rudelson the lower estimate of this type; in this note we establish the matching upper estimate.

## M. Rudelson, R. Vershynin, The smallest singular value of a random rectangular matrix

Here we further develop our inevrtibility method from our first paper on the topic. We consider a rectangular random matrix (N × n), and we show that with high probability its smallest singular value is at least of order N1/2 - (n-1)1/2. Known results of random matrix theory show that in the limit, as the dimension n increases to infinity while the aspect ratio n/N converges to a constant, the smallest singular value converges almost surely to the quantity above. On the contrary, our estimate holds for all fixed dimensions (up to a constant factor), i.e. without taking the limit, and with very high probability. Estimates in fixed dimensions are crucial for various problems in geometric functional analysis, numerical analysis and other fields.

Our estimate above bridges the known results for tall matrices and square matrices. For square matrices, our estimate is of order n-1/2, which is our result from the first paper. For tall matrices, where n is any fixed proportion of N, our estimate is of order N1/2, which yields the result of Litvak, Pajor, Rudelson and Tomczak, now with the optimal dependence on the proportion.

Our argument is a development of our method from the first paper for square matrices. However, dealing with rectangular matrices is in several ways considerably harder. Several new tools are developed in this paper, which are of independent interest.

One new key ingredient is a small ball probability bound for sums of independent random vectors in Rd. This is the probability that such a sum falls into a given small Euclidean ball in Rd. Our method is a development of the Littlewood-Offord theory, which connects the small ball probabilities to the additive structure of the sum. The less additive structure, the smaller is the small ball probability. We include a version of a new and simple argument sue to Sodin and Friedland, which allows to avoid any use of Halasz regularity lemma.

We use small the ball probability estimate to prove an optimal bound for the distance problem: how close is a random vector X to an independent random subspace H? We show that the distance is at least of order m1/2 with high probability, where m is the codimension of H. To prove this, we first show the intuitive (albeit nontrivial) fact that random subspaces have no additive structure. As said above, this fact makes our small ball probabiilty estimates powerful enough.

Finally, we use the distance estimate to bound below the smallest singular value of a random matrix A. Our random vector X is a column of A, and our random subspace H is the span of the other columns. The simple rank argument shows that the smallest singular valueis zero if and only if X lies in H for some column. A simple quantitative version of this argument is that a lower estimate on the smallest singular value yields a lower bound on the distance dist(X,H). We show how to reverse the rank argument, and thus derive bounds for the smallest singular value from the distance problem.

## D. Needell, R. Vershynin, Signal recovery from incomplete and inaccurate measurements via Regularized Orthogonal Matching Pursuit

This is our second paper with my student Deanna Needell, where we continue to study alternatives to linear programming methods in Compressed Sensing. In our first paper, we developed a greedy algorihtm called ROMP, which can perfectly recover any sparse signal from a small number of linear non-adaptive measurements. In the present paper, we prove stability of ROMP.

ROMP seems to be the first greedy algorithm in Compressed Sensing that is provably stable in the sense that the recovery error is proportional to the level of the noise. In particular, the recovery is exact in the absence of noise. Also, one can use ROMP to accurately recover arbitrary signals, not necessarily sparse. This kind of stability was known for the linear programming methods from the work of Candes, Romberg and Tao, but has been previously unknown for any greedy method (like OMP).

## D. Needell, R. Vershynin, Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

This is the first paper we wrote with my student Deanna Needell. We settle there one problem in the area of Compressed Sensing: how to construct a greedy algorithm for sparse recovery with uniform guarantees.

Compressed Sensing is focused on the sparse recovery problem: how to reconstruct a vector X that has very few non-zero coordinates from few linear measurements of X. Such measurements can be described as a measurement vector AX, which is the product of some measurement matrix A by X. There are two major algorithmic approaches to the sparse recovery problem: methods based on Linear Programming (a.k.a. L1-minimization, Basis Pursuit) and greedy iterative methods (for example, Orthogonal Matching Pursuit, Tresholding Algorithms). Each of the two approaches has its own advantages and disadvantages.

L1 minimization method has strongest known guarantees. Once the measurement matrix A satisfies the abstract form of the Uncertainty Principle (in the form of the Restricted Isometry Condition), then a result of Candes and Tao states that L1-minimization yields exact recovery. Specifically, the unique vector with minimal L1-norm over all vectors with measurements AX is X. The L1-minimization can be described as a linear program, for which abundance of solvers is available.

An alternative approach is using greedy iterative algorithms. The measurement vector AX is a linear combination of very few columns of A (because X has few nonzero coefficients), and we need to know which columns. It would then be natural to select the columns that are most correlated with AX (i.e. whose inner products with AX is biggest in magnitude). However, the set selected this way will not in general recover the nonzeros of X exactly, even for very nice measurement matrices A. Instead, one could select only one column of A most correlated with X, subtract its contribution from AX (making the residual orthogonal to that column) and iterate. This simple algorithm is called the Orthogonal Matching Pursuit (OMP), see the paper by Tropp and Gilbert.

The advantage of OMP is its transparency, ease of implementation, and great speed (OMP is currently considerably faster than L1-mimimization). The disadvantage of OMP is its weaker guarantees. It is actually known that OMP does not have uniform guarantees (with high probability, correctly recovers all vectors) for natural random matrices. Furthermore, no analysis at all is known for some important classes of measurements, such as random Fourier measurements (where the measurements are random frequencies of the signal X).

Our paper essentially settles these concerns about OMP. We prove that a simple regularized version of OMP (ROMP) has essentially the same guarantees as L1-minimization. Once the measurement matrix A satisfies the Uncertainty Principle (a.k.a. Restricted Isometry Condition), then ROMP recovers every sparse vector X from its measurements AX exactly. ROMP is therefore the first greedy algorithm for sparse recovery with uniform guarantees.

ROMP is based on the following idea. To recover the signal X from its measurements Y = AX, we can use the "observation vector" U = A*Y as a good local approximation to X. Indeed, U encodes correlations of Y with the columns of A. By the Uniform Uncertainty Principle, every n columns form approximately an orthonormal system. Therefore, every n coordinates of U look like correlations of Y with the orthonormal basis and therefore are close to the coefficients of X.

This suggests to make use of the n biggest coordinates of the observation vector U, rather than one biggest coordinate as OMP does. ROMP thus selects n biggest coordinates, and performs regularization by further selecting only the coordinates with comparable sizes (to ensure that the information is spread evenly among them).

Added May 4, 2009: Stephane Mallat brought up to us a reference that we unfortunately omitted in the published version of our paper. It is a reference to his work with Zhifeng Zhang in 1993-94, where they first introduced matching pursuits and specifically the OMP. The published version of this work introduces and discusses matching pursuits (among other things), while Zhifeng Zhang's thesis in 1993 introduced the OMP.

## M. Rudelson, R. Vershynin, The Littlewood-Offord Problem and invertibility of random matrices

Here we prove two basic conjectures on the invertibility of random matrices. One of them has been a prediction of Von Neumann and his associates, who speculated that the condition number of a random n by n matrix with i.i.d. entries should be linear in n (with high probability). The condition number is the ratio of the largest to the smallest singlar values of the matrix, or equivalently the product of the norms of the matrix and its inverse.

The largest singular value of random matrices is well understood -- it is of order n1/2 with high probability, and it converges to a Tracy-Widom distribution as n increases. The hard part is the smallest singular value, conjectured to be of order n-1/2. This was only known for Gaussian matrices (Smale's problem solved by Edelman in 1988). Rudelson and Tao and Vu proved weaker estimates in some more general cases. The full conjecture is proved in the present paper. Thus the norm of random matrix and its inverse are both of order n1/2 with high probability. This confirms Von Neumann's prediction in full generality.

The next question is: what is this high probability? For random Bernoulli matrices (whose entries are 1 and -1 with probability 1/2), Spielman and Teng conjectured at ICM 2002 the following asymptotics for the smallest singular value normalized (i.e. divided) by its mean n-1/2. The probability that it is smaller than t should be linear in t for all t > cn (where 0 < c < 1 is some absolute constant). This asymptotics must break down for smaller t because Bernoulli matrices are singular wiht nonzero probability. The general Spielman-Teng's conjecture is proved in this paper (for all subgaussian matrices).

One immediate consequence of the general Spielman-Teng's conjecture (now a theorem) for t=0 is that every subgaussian matrix is singular with probability exponentially small in n. For Bernoulli matrices, this was proved by Kahn, Komlos and Szemeredi in 1995. Even the fact that this singularity probability converges to zero as n increases is nontrivial; Komlos proved this in 1967. Erdos conjectured the singularity probability is (1/2 + o(1))n. The best known bound (3/4 + o(1))n is due to Tao and Vu.

Our random matrix results come as a consequence of our attempts to develop the Littlewood-Offord theory of small ball probability.

The classical concentration inequalities of probability theory demonstrate that nice random variables, such as the sums of i.i.d.'s, are nicely concentrated about their means. Less studied but often needed is the reverse phenomenon -- that the concentration is not too tight. The Littlewood-Offord Problem asks, for i.i.d. random variables X_k and real numbers a_k, to put an upper bound on the probability P that the sum of a_k X_k lies near some number v.

Unlike the concentration inequalities, which are controlled only by the magnitude of the coefficients a_k (e.g. the norms of the coefficient vectors, like in Berstein's inequality), the anti-concentration inequalities are less stable; they are controlled by the arithmetic structure in the coefficients a_k. Tao and Vu put forth a program to show that if the concentration probability P is large then there is a strong additive structure in the coefficients a_1, ... , a_n: this sequence should be embedded in a short arithmetic progression (or a generalization thereof). This says that the only reason for a random sum to concentrate too much is due to (essential) cancellations, which can only be caused by a rich additive structure of the summands.

We give an optimal result for the Littlewood-Offord problem for arbitrary coefficients a_k of the same order of magnitude by showing that these coefficients essentially lie in an arithmetic progression of length 1/P.

This paper is written so that Von Neumann's prediction can be read separately from the rest. The Littlewood-Offord theory is not needed there; it becomes crucial in the proof of the stronger Spielman-Teng's conjecture.

## M. Rudelson, R. Vershynin, Sampling from large matrices: an approach through geometric functional analysis

Here we study random submatrices of large matrices. The main problem is: how can one learn a matrix from a random sample of its entries?

We show how to approximately compute any matrix from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r is the numerical rank of the matrix. The numerical rank is the ratio of the Frobenius (a.k.a. Hilbert-Schmidt) norm squared to the operator (a.k.a. spectral) norm squared. The numerical rank is thus always bounded by the usual rank, and unlike the usual rank it is stable under perturbations. The stability is crucial for numerical applications, where the matrix is blurred by noise (e.g. roundoff errors), so the matrix is of full rank but can be of low numerical rank.

Our approximation result gives an asymptotically optimal guarantee for an algorithm to compute low-rank approximations of A. We also give asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX-2CSP problems.

We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables. This project started in the summer of 2002 and has undergone several major transformations since then.

## Yu. Liubarskii, R. Vershynin, Uncertainty principles and vector quantization

Here we touch a computational aspect of the basic problem in asymptotic convex geometry: what matrices realize Euclidean projections of the cube? A sufficient condition for such matrices turns out to be some abstract form of the Uncertainty Principle (UP). This Uncertainty Principle was recently set forth by Candes and Tao in the context of the fast developing applied area of Compressed Sensing. For orthogonal matrices, UP says that all submatrices have norm slightly smaller than 1. For matrices that realize the (discrete) Fourier transform, this leads to variants of the classical Uncertainty Principle in harmonic analysis. Gluskin [unpublished] and Talagrand [Selection of proportion of characters] observed that some form of UP implies Euclidean sections of L_1 (thus Euclidean sections of the cube).

We use this idea to observe a new connection between the Uncertainty Principle and the vector quantization theory. We show that for frames in N dimensions that satisfy the Uncertainty Principle, one can quickly convert every frame representation into a more regular Kashin's representation, whose coefficients all have the same magnitude N-1/2. Information tends to spread evenly among these coefficients. As a consequence, Kashin's representations have a great power for reduction of errors in their coefficients. In particular, scalar quantization of Kashin's representations yields robust vector quantizers.

This project started in the Fall of 2003 and has undergone several major transformations since then.

## T. Strohmer, R. Vershynin, A randomized solver for linear systems with exponential convergence

The Kaczmarz method for solving linear systems of equations Ax=b is a simple iterative algorithm with many applications ranging from computer tomography to digital signal processing, but whose performance is poorly understood. The algorithm starts with an arbitrary guess x_0 of the solution, and iteratively projects the running approximation x_k onto the solution spaces of each equation, in the cyclic order. Despite the simplicity and popularity of this method, there are no useful estimates for its rate of convergence.

It has been observed several times in tomography literature that Kaczmarz algorithm should become faster if one projects onto solution spaces in a random, rather than cyclic, order. However, no estimates on the convergence of such randomized version have been known either.

In this paper, we prove that a randomized Kaczmarz algorithm converges with expected exponential rate. This is the first solver whose rate does not depend on the number of equations in the system. The solver does not even need to know the whole system, but only its small random part. It thus outperforms all previously known methods on extremely overdetermined systems. Even for moderately overdetermined systems, numerical simulations reveal that our algorithm can converge faster than the celebrated conjugate gradient algorithm.

## R. Vershynin, Beyond Hirsch Conjecture: walks on random polytopes and smoothed complexity of the simplex method

Here we introduce a new effective phase-I in linear programming, and significantly improve the smoothed analysis of the simplex method.

There exist algorithms which perform nicely in practice, but can be very slow in (specifically designed) worst cases. The celebrated simplex method is one of them. In practice, the simplex method is observed to solve linear programs in time linear in the dimension, but there exist linear programs for which it takes exponential time (Klee-Minty counterexample).

To explain the discrepancy between the empirical evidence and theoretical analysis, Spielman and Teng introduced the concept of smoothed analysis of algorithms, where we measure the expected running time of an algorithm under slight random perturbations of arbitrary inputs. The idea is that the worst cases should be so few and isolated that one can rule them out by slightly perturbing the input.

Spielman and Teng proved that the (shadow-vertex) simplex method had polynomial smoothed complexity. On a slight random perturbation of arbitrary linear program, the simplex method finds the solution after a walk on the vertices of the feasible polytope with expected length polynomial in the number of constraints n, the number of variables d and the inverse standard deviation of the perturbation 1/sigma.

We show that the length of walk in the simplex method is actually polylogarithmic in the number of constraints n . Spielman-Teng's bound on the walk was O*(n86 d55 sigma-30), up to logarithmic factors. We improve this to O*(d9 sigma-4). This shows that the tight conjectural bound n-d on the length of walk on polytopes (Hirsch Conjecture) is not a limitation for the smoothed Linear Programming. Random perturbations create short paths between vertices.

One ingredient of independent interest that is developed here is a new phase-I for solving arbitrary linear programs. Instead of finding a vertex of a feasible set, we add a vertex at random to the feasible set. This does not affect the solution of the linear program with constant probability. So, in expectation it takes a constant number of independent trials until a correct solution is found. This overcomes one of the major difficulties of smoothed analysis of the simplex method -- one can now statistically decouple the walk from the smoothed linear program. This yields a much better reduction of the smoothed complexity to a geometric quantity -- the size of planar sections of random polytopes. We also improve upon the known estimates for that size.

## A. C. Gilbert, M. J. Strauss, J. A. Tropp, R. Vershynin, Algorithmic linear dimension reduction in the l_1 norm for sparse vectors

Compressible signals are pervasive in signal processing applications. The essential feature of a compressible signal is that it can be approximated well by a sparse signal. It has recently been observed that it is possible to collect the essential information from a compressible signal by projecting it onto a low-dimensional random subspace. This idea is called sketching in computer science and compressed sensing in signal processing.

The geometric functional analysis has studied non-algorithmic aspects of projections onto random low-dimensional subspaces since 1980s. Many tools are now available to check if such a projection is an almost isometry on a class of functions (a set of vectors in Rd). None of these, however, suggest reconstruction algorithms. Suppose we know that a certain projection is an almost isometry on a certain set in Rd. Is it possible to reconstruct points in the set from their projections in polynomial time?

Much progress on this problem was made in 2004 by Candes-Tao, Donoho and others, including Rudelson-Vershynin. See our joint FOCS presentation Candes-Tao-Rudelson-Vershynin; also papers by Muthukrishnan and the Compressed Sensing Page. The reconstruction problem has been solved via linear programming, this beautiful idea going back to Donoho and his collaborators. Due to the development of interior point methods in the 1980s, the complexity of the linear program is polynomial in d. Howeverm this might be too slow in applications, especially when d is very large (in streaming algorithms, d=232 is not uncommon).

This paper develops the first "superfast" reconstruction algorithm whose complexity is polylogarithmic, rather than polynomial, in dimension d (with one specifically designed projection that works for all vectors).

## M. Rudelson, R. Vershynin, On sparse reconstruction from Fourier and Gaussian measurements

This is a journal version of the conference paper below. We added an (exponential) probability estimate for our result on random Fourier measurements, and we included a more accurate calculation of the constants for Gaussian measurements. Recently, Donoho and Tanner have been able to compute precise asymptotic behavior for these constants.

## M. Rudelson, R. Vershynin, Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements

Here we try to understand how to compute a sparse signal from few frequencies, and we prove the best known estimate on the number of frequencies so that one can compute f using linear programming.

So we want to compute a signal f, viewed as a vector in R^n with small support r, from k = k(r,n) frequencies of f -- the values of its discrete Fourier transform. A simple linear algebra shows that k = 2r frequencies suffice. But is there a formula or an effective (polynomial time) algorithm to compute f from these frequencies?

Donoho and his collaborators advocated that one should be able to "convexify" the reconstruction problem -- relax it to a convex problem with the same solution, which can then be solved using methods of linear programming. When exactly the convex relaxation is equivalent to the original problem is an open question. Candes and Tao found a sufficient condition for this equivalence -- the Restricted Isometry Condition (R.I.C.). This reduces the reconstruction problem to veryfying R.I.C. We prove the best known bound for the number of measurements where R.I.C. holds: k(r,n) = O(r log(n) log2(r) log(r log n)). The optimal bound should be O(r log n).

Our arguments are based on the techniques of Geometric Functional Analysis and Probability in Banach spaces, in particular on the development of Rudelson's sampling method for random vectors in the isotropic position.

## R. Vershynin, Random sets of isomorphism of linear operators on Hilbert space

This note deals with a problem of the "probabilistic Ramsey theory". Given a linear operator T on a Hilbert space with an orthogonal basis, we define the isomorphic structure Sigma(T) as the family of all finite subsets of the basis so that T restricted to their span is a nice isomorphism. We give a dimension-free optimal lower bound on the size of Sigma(T). This improves and extends in several ways the principle of restricted invertibility due to Bourgain and Tzafriri. With an appropriate notion of randomness, we obtain a randomized principle of restricted invertibility.

## M.Rudelson, R.Vershynin, Geometric approach to error correcting codes and reconstruction of signals

We develop an approach through geometric functional analysis to error correcting codes and to reconstruction of signals from few linear measurements. An error correcting code encodes an n-letter word x into an m-letter word y in such a way that x can be decoded correctly when any r letters of y are corrupted. We prove that most linear orthogonal transformations Q from R^n into R^m form efficient and robust robust error correcting codes over reals. The decoder (which corrects the corrupted components of y) is the metric projection onto the range of Q in the L_1 norm.

An equivalent problem arises in signal processing: how to reconstruct a signal that belongs to a small class from few linear measurements? We prove that for most sets of Gaussian measurements, all signals of small support can be exactly reconstructed by the L_1 norm minimization. This is a substantial improvement of recent results of Donoho and of Candes and Tao.

Yet one more equivalent problem in combinatorial geometry is the existence of neighborly polytopes -- polytopes with fixed number of facets and maximal number of lower-dimensional facets. We prove that most sections of the cube form such polytopes.

## B.Klartag, R.Vershynin, Small ball probability and Dvoretzky theorem

Large deviation estimates are by now a standard tool in the Asymptotic Convex Geometry, contrary to small deviation results. In this note we present a novel application of a small deviations inequality to a problem related to the diameters of random sections of high dimensional convex bodies. Our results imply an unexpected distinction between the lower and the upper inclusions in Dvoretzky Theorem.

## R.Vershynin, Frame expansions with erasures: an approach through the non-commutative operator theory

In modern communication systems such as the Internet, random losses of information can be mitigated by oversampling the source. This is equivalent to expanding the source using overcomplete systems of vectors (frames), as opposed to the traditional basis expansions.

Dependencies among the coefficients in frame expansions often allow for better performance comparing to bases under random losses of coefficients. We show that for any n-dimensional frame, any source can be linearly reconstructed from only n log n randomly chosen frame coefficients, with a small error and with high probability. Thus every frame expansion withstands random losses better (for worst case sources) than the orthogonal basis expansion, for which the n log n bound is attained.

The proof reduces the problem to M.Rudelson's selection technique for random vectors in the isotropic position, which in turn is based on the non-commutative Khinchine's inequality.

## R.Vershynin, Isoperimetry of waists and local versus global asymptotic convex geometries (with an appendix by M. Rudelson and R. Vershynin)

Existence of nicely bounded sections of two symmetric convex bodies K and L implies that the intersection of random rotations of K and L is nicely bounded. For L = subspace, this main result immediately yields the following phenomenon: "If K has one nicely bounded section, then most sections of K are nicely bounded". This "existence implies randomness" consequence was proved independently in [Giannopoulos, Milman and Tsolomitis] and even before that, in [Mankiewicz-Tomczak]. Our main result represents a new connection between the local asymptotic convex geometry (study of sections of convex bodies) and the global asymptotic convex geometry (study of convex bodies as a whole). The method relies on the new 'isoperimetry of waists' on the sphere due to Gromov.

## M. Rudelson, R. Vershynin, Combinatorics of random processes and sections of convex bodies

We find a sharp combinatorial bound for the metric entropy of sets in Rn and general classes of functions. This solves two basic combinatorial conjectures on the empirical processes:

1. A class of functions satisfies the uniform Central Limit Theorem if the square root of its combinatorial dimension is integrable.
2. The uniform entropy is equivalent to the combinatorial dimension under minimal regularity.
Our method also constructs a nicely bounded coordinate section of a symmetric convex body in Rn. In operator theory, this essentially proves for all normed spaces the restricted invertibility principle of Bourgain and Tzafriri.

## R. Vershynin, Integer cells in convex sets

Here a version of the classical Minkowski's theorem is proved for integer cells rather than integer points. Namely, every convex set K in Rn admits a coordinate projection that contains at least vol(0.1 K) cells of the integer lattice.

Our proof is based on an extension of the combinatorial density theorem of Sauer, Shelah and Vapnik-Chervonenkis from the Boolean cube {0,1}n to Zn. This leads to a new approach to coordinate sections of convex bodies. In particular, the Volume Ratio Theorem and Milman's duality of the diameters admit natural versions for coordinate sections.