Sparse recovery without incoherence
Transcription
Sparse recovery without incoherence
Sparse recovery without incoherence Rachel Ward University of Texas at Austin February 2013 Research supported in part by the ONR and Alfred P. Sloan Foundation 2 High dimensional data with low-dimensional structure is everywhere Discrete images Smooth functions 2 2 4 4 6 6 8 8 10 10 1 12 14 Low-rank matrices 2 12 3 2 14 16 16 18 18 20 20 2 4 6 8 10 12 14 16 18 20 1 2 3 4 6 8 10 12 14 16 18 20 3 The sparsity model I x ∈ RN is s-sparse if |{j : |xj | > 0}| ≤ s. I For general x ∈ RN , the best s-sparse approximation is xs = arg min{kx − uk : u ∈ RN is s-sparse} (from x to xs ) Compressed sensing I Instead of observing x = (xj )N j=1 directly, acquire through m N general linear measurements I Useful when it is expensive or impossible to acquire and store the entire signal. yk = N X j=1 4 aj,k xj , k = 1, . . . , m 5 Sparsity in images (xj1 ,j2 ) p |xj1 +1,j2 − xj1 ,j2 |2 + |xj1 ,j2 +1 − xj1 ,j2 |2 Image as two-dimensional array of pixels (xj1 ,j2 ) ∈ Rn×n Images are approximately sparse in spatial discrete differences / wavelet bases / local cosine bases MRI compressive imaging In Magnetic Resonance Imaging (MRI), rotating magnetic field measurements are Fourier transform measurements: 1X yk1 ,k2 = xj1 ,j2 e 2πi(k1 j1 +k2 j2 )/n , −n/2 + 1 ≤ k1 , k2 , ≤ n/2 n j1 ,j2 Each measurement takes a certain amount of time — reduced number of samples m n2 means reduced time of MRI scan. 6 7 Polynomial interpolation 1 0.8 0.6 0.4 0.2 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Smooth functions can be characterized by decay rates in their Fourier series / Legendre polynomial series expansions. Multivariate polynomial interpolation: generalized Polynomial Chaos expansions for Uncertainty Quantification 8 Incoherent sampling Set-up 1. N-dimensional signal of interest x = (xj )N j=1 with assumed sparsity: x = Ψb and b is sparse 2. m × N measurement matrix A (m N) 3. Measurements y = Ax. y = A x Reconstructing sparse signals Primer theorem Assume A is an m × N matrix with the property that if x1 and x2 are s-sparse and not equal, then Ax1 6= Ax2 . Let y = Ax for some s-sparse x. Then x = arg min |{j : |uj | > 0}| such that u∈RN Au = y . Sufficient condition for exact recovery: If Au = 0 then either u = 0 or u is not 2s-sparse. Very sufficient condition is the null space property: If Au = 0, then ku2s k1 ≤ 21 ku − u2s k1 . Still, the support minimization problem is computationally intractable in general. 10 11 Reconstructing sparse signals Primer theorem Assume A is an m × N matrix with the property that if x1 and x2 are s-sparse and not equal, then Ax1 6= Ax2 . Let y = Ax for some s-sparse x. Then x = arg min |{j : |uj | > 0}| such that u∈RN Au = y . Sufficient condition for exact recovery: If Au = 0 then either u = 0 or u is not 2s-sparse. Very sufficient condition is the null space property: If Au = 0, then ku2s k1 ≤ 21 ku − u2s k1 . Still, the support minimization problem is computationally intractable in general. 12 Reconstructing sparse signals Primer theorem Assume A is an m × N matrix with the property that if x1 and x2 are s-sparse and not equal, then Ax1 6= Ax2 . Let y = Ax for some s-sparse x. Then x = arg min |{j : |uj | > 0}| such that u∈RN Au = y . Sufficient condition for exact recovery: If Au = 0 then either u = 0 or u is not 2s-sparse. Very sufficient condition is the null space property: If Au = 0, then ku2s k1 ≤ 21 ku − u2s k1 . Still, the support minimization problem is computationally intractable in general. 13 Reconstructing sparse signals Primer theorem Assume A is an m × N matrix with the property that if x1 and x2 are s-sparse and not equal, then Ax1 6= Ax2 . Let y = Ax for some s-sparse x. Then x = arg min |{j : |uj | > 0}| such that u∈RN Au = y . Sufficient condition for exact recovery: If Au = 0 then either u = 0 or u is not 2s-sparse. Very sufficient condition is the null space property: If Au = 0, then ku2s k1 ≤ 21 ku − u2s k1 . Still, the support minimization problem is computationally intractable in general. 14 Reconstructing sparse signals Primer theorem Assume A is an m × N matrix with the property that if x1 and x2 are s-sparse and not equal, then Ax1 6= Ax2 . Let y = Ax for some s-sparse x. Then x = arg min |{j : |uj | > 0}| such that u∈RN Au = y . Sufficient condition for exact recovery: If Au = 0 then either u = 0 or u is not 2s-sparse. Very sufficient condition is the null space property: If Au = 0, then ku2s k1 ≤ 21 ku − u2s k1 . Still, the support minimization problem is computationally intractable in general. Reconstructing sparse signals Convex relaxation: `1 -minimization Reconstruct sparse x from y = Ax by x # = arg min u∈RN N X |uj | such that Au = y . j=1 If A satisfies the null space property1 , then 1. If x is s-sparse then x # = x: 2. If x is almost s-sparse, then 1 x# Exact recovery is close to x: Stability Candès, Romberg, Tao, Donoho, Cohen, Dahmen, DeVore, . . . Reconstructing sparse signals Convex relaxation: `1 -minimization Reconstruct sparse x from y = Ax by x # = arg min u∈RN N X |uj | such that Au = y . j=1 If A satisfies the null space property1 , then 1. If x is s-sparse then x # = x: 2. If x is almost s-sparse, then 1 16 x# Exact recovery is close to x: Stability Candès, Romberg, Tao, Donoho, Cohen, Dahmen, DeVore, . . . Incoherent sampling y = Ax Let (Φ, Ψ) is a pair of orthonormal bases of RN . 1. Φ = (φj ) is used for sensing: A is a subset of rows of Φ∗ 2. Ψ = (ψk ) is used to sparsely represent x: x = Ψb, and b is assumed sparse Definition The coherence between Φ and Ψ is √ µ(Φ, Ψ) = N max | hφj , ψk i | 1≤k,j≤N If µ(Φ, Ψ) = C a constant, then Φ and Ψ are called incoherent. Incoherent sampling y = Ax Let (Φ, Ψ) is a pair of orthonormal bases of RN . 1. Φ = (φj ) is used for sensing: A is a subset of rows of Φ∗ 2. Ψ = (ψk ) is used to sparsely represent x: x = Ψb, and b is assumed sparse Definition The coherence between Φ and Ψ is √ µ(Φ, Ψ) = N max | hφj , ψk i | 1≤k,j≤N If µ(Φ, Ψ) = C a constant, then Φ and Ψ are called incoherent. Incoherent sampling Example: I Ψ = Identity. Signal is sparse in canonical/Kroneker basis I Φ is discrete Fourier basis, N−1 1 φj = √ e i2πjk/N k=0 N I The Kronecker and Fourier bases are incoherent: √ µ(Φ, Ψ) := N max | hφj , ψk i | = 1. j,k 19 Theorem (Sparse recovery via incoherent sampling2 ) Let (Φ, Ψ) be a pair of incoherent bases of RN , with µ(Φ, Ψ) ≤ K . Let s ≥ 1, Let m ≥ 100s K 2 log4 (N), Select m (possibly not distinct) rows of Φ∗ i.i.d. from the uniform distribution on {1, 2, . . . , N}. to form A : RN → Rm . The following holds with exceedingly high probability: for all x ∈ RN , given measurements y = Ax, the approximation x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . 2 20 √1 kΨ∗ x s − (Ψ∗ x)s k1 . Candès, Romberg, Tao ’06, Rudelson Vershynin ’08, ... Theorem (Sparse recovery via incoherent sampling2 ) Let (Φ, Ψ) be a pair of incoherent bases of RN , with µ(Φ, Ψ) ≤ K . Let s ≥ 1, Let m ≥ 100s K 2 log4 (N), Select m (possibly not distinct) rows of Φ∗ i.i.d. from the uniform distribution on {1, 2, . . . , N}. to form A : RN → Rm . The following holds with exceedingly high probability: for all x ∈ RN , given measurements y = Ax, the approximation x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . 2 √1 kΨ∗ x s − (Ψ∗ x)s k1 . Candès, Romberg, Tao ’06, Rudelson Vershynin ’08, ... Theorem (Sparse recovery via incoherent sampling2 ) Let (Φ, Ψ) be a pair of incoherent bases of RN , with µ(Φ, Ψ) ≤ K . Let s ≥ 1, Let m ≥ 100s K 2 log4 (N), Select m (possibly not distinct) rows of Φ∗ i.i.d. from the uniform distribution on {1, 2, . . . , N}. to form A : RN → Rm . The following holds with exceedingly high probability: for all x ∈ RN , given measurements y = Ax, the approximation x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . 2 √1 kΨ∗ x s − (Ψ∗ x)s k1 . Candès, Romberg, Tao ’06, Rudelson Vershynin ’08, ... 23 In practice, incoherent sampling is not always possible. What then are optimal compressed sampling strategies? 24 Compressed sensing MRI Image as 2D array of pixels (xj1 ,j2 ) ∈ Rn×n Natural images are sparsely represented in 2D wavelet bases Sensing measurements in MRI are 2D Fourier or K-space measurements, n φk1 ,k2 = n1 e i2π(j1 k1 +j2 k2 )/n j ,j =1 , 1 2 −n/2 + 1 ≤ k1 , k2 ≤ n/2 Wavelet and Fourier bases are maximally coherent: µ(Ψ, Φ) = √ N Compressed sensing MRI The coherence between frequency φk1 ,k2 and the entire bivariate Haar wavelet basis Ψ = (ψI ) can be bounded by3 √ √ N µ(φk1 ,k2 , Ψ) = N max | hφk1 ,k2 , ψI i | . 1/2 I |k1 + 1|2 + |k2 + 1|2 When some elements of the sensing basis are more coherent with the sparsity basis than others, is it best just to take the m most coherent measurements? 3 25 Krahmer, W., 2012 Compressed sensing MRI The coherence between frequency φk1 ,k2 and the entire bivariate Haar wavelet basis Ψ = (ψI ) can be bounded by3 √ √ N µ(φk1 ,k2 , Ψ) = N max | hφk1 ,k2 , ψI i | . 1/2 I |k1 + 1|2 + |k2 + 1|2 When some elements of the sensing basis are more coherent with the sparsity basis than others, is it best just to take the m most coherent measurements? 3 26 Krahmer, W., 2012 27 Compressed sensing MRI Reconstructions of an 256 × 256 MRI image from m = .1 × (256)2 frequency measurements using total variation minimization Pixel space / Frequency space 50 100 150 200 250 50 100 150 200 250 200 250 Lowest frequencies 50 100 150 200 250 50 100 150 Uniformly subsampled frequencies Compressed sensing MRI Several papers have proposed4 to sample K-space according to densities scaling inversely to a power of the distance to the origin. This is reminiscent of the coherence between frequency φk1 ,k2 and bivariate Haar wavelet basis Ψ, √ N µ(φk1 ,k2 , Ψ) . 1/2 2 |k1 + 1| + |k2 + 1|2 Define the local coherence function µloc = (µj ) from an orthonormal basis Φ to an orthonormal basis Ψ as √ µk := µ(φk , Ψ) = N max | hφk , ψj i | j P 2 2 Note the inequality N1 N j=1 µj ≤ (µ(Φ, Ψ)) . For Fourier/Wavelets, average coherence is log(N), coherence is N. 4 28 Lustig, Donoho, Pauly 2007, Puy, Vandergheynst, Wiaux 2011 Compressed sensing MRI Several papers have proposed4 to sample K-space according to densities scaling inversely to a power of the distance to the origin. This is reminiscent of the coherence between frequency φk1 ,k2 and bivariate Haar wavelet basis Ψ, √ N µ(φk1 ,k2 , Ψ) . 1/2 2 |k1 + 1| + |k2 + 1|2 Define the local coherence function µloc = (µj ) from an orthonormal basis Φ to an orthonormal basis Ψ as √ µk := µ(φk , Ψ) = N max | hφk , ψj i | j P 2 2 Note the inequality N1 N j=1 µj ≤ (µ(Φ, Ψ)) . For Fourier/Wavelets, average coherence is log(N), coherence is N. 4 29 Lustig, Donoho, Pauly 2007, Puy, Vandergheynst, Wiaux 2011 Compressed sensing MRI Several papers have proposed4 to sample K-space according to densities scaling inversely to a power of the distance to the origin. This is reminiscent of the coherence between frequency φk1 ,k2 and bivariate Haar wavelet basis Ψ, √ N µ(φk1 ,k2 , Ψ) . 1/2 2 |k1 + 1| + |k2 + 1|2 Define the local coherence function µloc = (µj ) from an orthonormal basis Φ to an orthonormal basis Ψ as √ µk := µ(φk , Ψ) = N max | hφk , ψj i | j P 2 2 Note the inequality N1 N j=1 µj ≤ (µ(Φ, Ψ)) . For Fourier/Wavelets, average coherence is log(N), coherence is N. 4 30 Lustig, Donoho, Pauly 2007, Puy, Vandergheynst, Wiaux 2011 Theorem (Coherence-based sampling5 ) Consider a pair of orthonormal bases (Φ, Ψ) with local coherences bounded by µj = µ(φj , Ψ) ≤ κj 4 P 2 Let s ≥ 1, and suppose m & s N1 N j=1 κj log (N). Select m (possibly not distinct) rows of Φ∗ i.i.d. from the multinomial distribution on {1, 2, . . . , N} with weights cκ2j to form A : RN → Rm . The following holds with exceedingly high probability for all x ∈ RN . Given measurements y = Ax, the image x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . 5 Krahmer, Rauhut, W ’12, ... √1 kΨ∗ x s − (Ψ∗ x)s k1 . Theorem (Coherence-based sampling5 ) Consider a pair of orthonormal bases (Φ, Ψ) with local coherences bounded by µj = µ(φj , Ψ) ≤ κj 4 P 2 Let s ≥ 1, and suppose m & s N1 N j=1 κj log (N). Select m (possibly not distinct) rows of Φ∗ i.i.d. from the multinomial distribution on {1, 2, . . . , N} with weights cκ2j to form A : RN → Rm . The following holds with exceedingly high probability for all x ∈ RN . Given measurements y = Ax, the image x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . 5 Krahmer, Rauhut, W ’12, ... √1 kΨ∗ x s − (Ψ∗ x)s k1 . Theorem (Coherence-based sampling5 ) Consider a pair of orthonormal bases (Φ, Ψ) with local coherences bounded by µj = µ(φj , Ψ) ≤ κj 4 P 2 Let s ≥ 1, and suppose m & s N1 N j=1 κj log (N). Select m (possibly not distinct) rows of Φ∗ i.i.d. from the multinomial distribution on {1, 2, . . . , N} with weights cκ2j to form A : RN → Rm . The following holds with exceedingly high probability for all x ∈ RN . Given measurements y = Ax, the image x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . 5 Krahmer, Rauhut, W ’12, ... √1 kΨ∗ x s − (Ψ∗ x)s k1 . 34 Corollary for MRI compressive imaging 50 100 150 200 250 50 100 150 200 250 Let n ∈ N. Let Ψ be the bivariate Haar wavelet basis. Form sensing matrix A by selecting m & s · log5 (n) frequency measurements (k1 , k2 ) i.i.d. from the multinomial distribution with 1 weights pk1 ,k2 ∝ (|k1 |+1)2 +(|k 2. 2 |+1) The following holds with exceedingly high probability for all x = (xj1 ,j2 ) ∈ Rn×n . Given measurements y = Ax, the image x # = arg min kΨ∗ uk1 subject to y = Au u∈RN satisfies the error guarantee kx − x # k2 . √1 kΨ∗ x s − (Ψ∗ x)s k1 Comparing different sampling schemes ... 50 100 150 200 250 50 100 150 200 250 Low-frequencies only Original MRI image 50 50 100 100 150 150 200 200 250 50 100 150 200 250 250 Uniform Equispaced radial lines 50 50 50 100 100 150 150 200 200 150 200 250 250 250 50 35 100 100 150 200 (k1 , k2 ) ∼ (k12 + k22 )−1/2 250 50 100 150 (k1 , k2 ) ∼ (k12 + k22 )−1 200 250 50 Comparing different sampling schemes ... 100 Original 150 MRI image 50 50 100 100 150 150 200 200 200 250 50 100 150 200 250 250 Equispaced radial lines Low frequencies only 50 50 150 200 250 50 250 100 100 50 150 100 150 200 200 250 250 50 36 100 100 150 200 (k1 , k2 ) ∼ (k12 + k22 )−1/2 250 50 100 150 (k1 , k2 ) ∼ (k12 + k22 )−1 200 250 37 More examples of coherence-based sampling Polynomial interpolation The Legendre polynomials, a smooth function, and its Legendre series coefficients. I I I The Legendre polynomials (Lj )j≥0 form an orthonormal basis for L2 ([−1, 1]) with respect to the uniform measure, R1 hLj , Lk i = −1 Lj (x)Lk (x)dx = δjk . P Smoothness assumption on f : f (x) ≈ N j=0 cj Lj (x) and |cj | ≤ j −α . Approximate unknown f from sampling points f (x1 ), f (x2 ), . . . , f (xm ). Sampling strategy? 39 Polynomial interpolation The Legendre polynomials, a smooth function, and its Legendre series coefficients. I I I The Legendre polynomials (Lj )j≥0 form an orthonormal basis for L2 ([−1, 1]) with respect to the uniform measure, R1 hLj , Lk i = −1 Lj (x)Lk (x)dx = δjk . P Smoothness assumption on f : f (x) ≈ N j=0 cj Lj (x) and |cj | ≤ j −α . Approximate unknown f from sampling points f (x1 ), f (x2 ), . . . , f (xm ). Sampling strategy? Polynomial interpolation The Legendre polynomials, a smooth function, and its Legendre series coefficients. I I I I 6 40 Legendre √ polynomials are not uniformly bounded, kLj k∞ = 2j + 1. However they satisfy 1/4 1 | Lj (x)| ≤ κ(x) = √2π 1−x on the unit interval. 2 R1 2 −1 κ (x)dx ≤ 3 - Infinite-dimensional average local coherence Coherence-based sampling implies a stable sampling strategy: x1 , x2 , . . . , xm ∼ π(1−x1 2 )1/2 dx Stability of Chebyshev sampling aligns with classical results on Lagrange interpolation6 L. Brutman. Lebesgue functions for polynomial interpolation - a survey. 41 Polynomial interpolation The Legendre polynomials, a smooth function, and its Legendre series coefficients. I I I I 6 Legendre √ polynomials are not uniformly bounded, kLj k∞ = 2j + 1. However they satisfy 1/4 1 | Lj (x)| ≤ κ(x) = √2π 1−x on the unit interval. 2 R1 2 −1 κ (x)dx ≤ 3 - Infinite-dimensional average local coherence Coherence-based sampling implies a stable sampling strategy: x1 , x2 , . . . , xm ∼ π(1−x1 2 )1/2 dx Stability of Chebyshev sampling aligns with classical results on Lagrange interpolation6 L. Brutman. Lebesgue functions for polynomial interpolation - a survey. Polynomial interpolation The Legendre polynomials, a smooth function, and its Legendre series coefficients. I I I I 6 Legendre √ polynomials are not uniformly bounded, kLj k∞ = 2j + 1. However they satisfy 1/4 1 | Lj (x)| ≤ κ(x) = √2π 1−x on the unit interval. 2 R1 2 −1 κ (x)dx ≤ 3 - Infinite-dimensional average local coherence Coherence-based sampling implies a stable sampling strategy: x1 , x2 , . . . , xm ∼ π(1−x1 2 )1/2 dx Stability of Chebyshev sampling aligns with classical results on Lagrange interpolation6 L. Brutman. Lebesgue functions for polynomial interpolation - a survey. Polynomial interpolation The Legendre polynomials, a smooth function, and its Legendre series coefficients. I I I I 6 Legendre √ polynomials are not uniformly bounded, kLj k∞ = 2j + 1. However they satisfy 1/4 1 | Lj (x)| ≤ κ(x) = √2π 1−x on the unit interval. 2 R1 2 −1 κ (x)dx ≤ 3 - Infinite-dimensional average local coherence Coherence-based sampling implies a stable sampling strategy: x1 , x2 , . . . , xm ∼ π(1−x1 2 )1/2 dx Stability of Chebyshev sampling aligns with classical results on Lagrange interpolation6 L. Brutman. Lebesgue functions for polynomial interpolation - a survey. Polynomial Interpolation uniformly distributed (left) and Chebyshev-distributed (right) 1 0.8 0.6 0.4 0.2 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 −0.2 −0.4 −1 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 0.8 0.6 0.4 0.2 0 −1 1 −0.8 −0.6 −0.4 −0.2 0 0.6 0.8 1 0.6 0.4 0.4 0.2 44 0.4 0.8 0.6 0 −1 0.2 1 0.8 0.2 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 0 −1 −0.8 −0.6 −0.4 −0.2 Low-rank matrix approximation Just as the convex relaxation of the sparsity of a vector is its `1 norm, the convex relaxation of the rank of a matrix is its nuclear norm - the sum of its singular values. Low-rank matrix completion guarantees via nuclear norm minimization assume incoherence of underlying low-rank matrix7 Coherence-based sampling: incoherence condition can be removed by row/column weighted sampling. Related to notion of weighted matrix sampling8 7 8 Candes, Recht, Plan, Montanari, Keshavan, Oh . . . Negahban, Wainwright 2012 Low-rank matrix approximation Just as the convex relaxation of the sparsity of a vector is its `1 norm, the convex relaxation of the rank of a matrix is its nuclear norm - the sum of its singular values. Low-rank matrix completion guarantees via nuclear norm minimization assume incoherence of underlying low-rank matrix7 Coherence-based sampling: incoherence condition can be removed by row/column weighted sampling. Related to notion of weighted matrix sampling8 7 8 46 Candes, Recht, Plan, Montanari, Keshavan, Oh . . . Negahban, Wainwright 2012 47 Summary Compressed sensing and related optimization problems often assume incoherence between the sensing and sparsity bases to derive sparse recovery guarantees. Incoherence is restrictive and not achievable in many problems of practical interest. We introduced the concept of local coherence from one basis to another, and showed that with a bound on the local coherence, one may derive sampling strategies and sparse recovery results for a wide range of new sensing problems. More can be said: measurement error, instance-optimal results, . . . Extensions I Compressed sensing video? I From orthonormal bases to redundant dictionaries I Implications for superresolution? I Incorporate structured sparsity constraints I ... 49 References Rauhut, Ward, “Sparse Legendre expansions via `1 -minimization.” Journal of approximation theory 164.5 (2012): 517-533. Rauhut, Ward, “Sparse recovery for spherical harmonic expansions.” arXiv preprint arXiv:1102.4097 (2011). Burq, Dyatlov, Ward, Zworski, “Weighted eigenfunction estimates with applications to compressed sensing.” SIAM Journal on Mathematical Analysis 44.5 (2012), 3481-3501 Krahmer, Ward, “Beyond incoherence: stable and robust sampling strategies for compressive imaging”. arXiv preprint arXiv:1210.2380 (2012). Chen, Bhojanapalli, Sanghavi, Ward, R. “Coherent Matrix Completion.” In Proceedings of The 31st International Conference on Machine Learning (2014) 674-682.