Memory efficient implementations of partial Singular Value Decompositions (SVD)

Question

For model reduction, I want to compute the left singular vectors associated to the - say 20 - largest singular values of a matrix $A \in \mathbb R^{N,k}$, where $N\approx 10^6$ and $k\approx 10^3$. Unfortunately, my matrix $A$ will be dense without any structure.

If I just call the svd routine from the numpy.linalg module in Python for a random matrix of this size, I run into a memory error. This is due to the allocation of $V\in \mathbb R^{N,N}$ for the decomposition $A = VSU$.

Are there algorithms around, that avoid this pitfall? E.g. by setting up only the singular vectors assiociated with nonzero singular values.

I am ready to trade in computation time and accuracy.

Interesting, it seems Numpy does not know how to do a thin SVD... — J. M., Jun 07 '13 at 12:27
Thanks for the hint. Indeed, numpy.linalg.svd has the option full_matrices that be set to False so that only the 'nonzero' parts are computed. Nevertheless, is there a way to reduce the computation even further? — Jan, Jun 07 '13 at 12:42
The numpy backend uses fortran code, the LAPACKE_dgesvd routine for standard svd. However, typically your matrix is C_CONTIGOUS (check with matrix.flags). Therefore it copies the data for fortran alignment. Additionally while running the lapack routine dgesvd another copy of your matrix is needed (or at least the memory for it). You can get rid of one copy if you make sure that the memory alignment is fortran style right from the beginning. — Bort, Jun 07 '13 at 15:05

Max Hutchinson · Accepted Answer · 2013-06-10T18:51:29.183

7

If you only want a few singular values/vectors, ARPACK should do the trick. The SVD docs aren't great, and this distribution is more up to date.

EDIT: If you want to do this in python, SciPy has a wrapper. Since your matrix is dense, you could try the block sparse row (BSR) format.

edited Jun 10 '13 at 18:51

answered Jun 07 '13 at 14:38

Max Hutchinson

3,051
16
29

I will have a look, how ARPACK integrates with python... – Jan Jun 10 '13 at 10:05
1

Looks like scipy has wrappers. I'll add them to answer body. – Max Hutchinson Jun 10 '13 at 18:46

score 5 · Answer 2 · answered Jan 09 '14 at 12:26

Intel MKL implements the new Jacobi-SVD algorithm. Here are the implemention details: http://www.netlib.org/lapack/lawnspdf/lawn169.pdf http://www.fernuni-hagen.de/MATHPHYS/veselic/downloads/j02.pdf

And the LAPACK routine: http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-732F9EE1-BCEC-4D9B-9B93-AF5499B21140.htm#DRMAC08-1

Work size is of course adjustable. You could call C functions from Python easily using Cython, SWIG or any other wrapping mechanism.

score 3 · Answer 3 · answered Jan 09 '14 at 10:13

3

Maybe you can try this.

https://github.com/jakevdp/pypropack

This is a Python wrapper for the PROPACK package, which implements efficient partial singular value decompositions of large sparse matrices and linear operators.

answered Jan 09 '14 at 10:13

Mass Zhou

131
1

score 3 · Answer 4 · edited May 23 '17 at 12:35

3

Take a look at sklearn.decomposition.TruncatedSVD in scikit-learn 0.14-rc.
(I believe that the scikit-learn people follow stackoverflow.com/questions/tagged/scikit-learn, so I'd ask detailed questions there.)

(How much memory do you have ? 10$^{6+3}$ doubles is already 8G.)

edited May 23 '17 at 12:35

Community

1

answered Aug 07 '13 at 15:47

denis

932
5
16

Thanks for your answer. By now, I do well with the scipy routines. Also, I haven't gone till $10^6 \times 10^3$ yet but to about the half of it what is still feasible for my laptop. If necessary, I can use a working machine with 32 GB RAM. – Jan Aug 10 '13 at 11:17

score 1 · Answer 5 · answered Jul 12 '23 at 12:00

There is scipy implementation of SVD, where you can select number of singular values to output. Seems to be faster than numpy even for moderate matrices as typically you don't need to compute too much singular vectors.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html

Beware that scipy.sparse.linalg.svds returns singular values in ascending order while np.linalg.svd returns them in descending order, which seems more natural.

You can fix that unnatural ordering by

U = np.flip(U, axis=1)
S = np.flip(S)
Vh = np.flip(Vh, axis=0)

Memory efficient implementations of partial Singular Value Decompositions (SVD)

5 Answers5