Rule of thumb for sparse vs dense matrix storage

Question

Suppose I know the expected sparsity of a matrix (i.e. the number of non-zeros / total possible number of non-zeros). Is there a rule of thumb (perhaps approximate) for deciding whether to use sparse matrix storage (specifically, compressed row storage) vs. storing it as a dense matrix?

Speed is more important in my application than memory. But out of general curiosity, I'm interested in answers from both a speed and memory perspective.
After generating the matrix, I only apply addition and multiplication operations on it.
I have only been able to find qualitative answers, e.g. this question and this question but I'm looking for something like

...if the sparsity is more than approximately $x%$, then use dense storage.

Wolfgang Bangerth · Accepted Answer

All matrix operations are memory bound (and not compute bound) on today's processors. So basically, you have to ask which format stores fewer bytes. This is easy to compute:

For a full matrix, you store 8 bytes (one double) per entry
For a sparse matrix, you store 12 bytes per entry (one double for the value, and one integer for the column index of the entry).

In other words, if your sparsity is below 67% -- i.e., for nearly any matrix any reasonable person would call sparse --, the sparse matrix format will not only yield better memory use but also better compute time.

Brian Borchers · Answer

For what it is worth, for random sparse matrices of size 10,000 by 10,000 vs. dense matrices of the same size, on my Xeon workstation using MATLAB and Intel MKL as the BLAS, the sparse matrix-vector multiply was faster for densities of 15% or less. At 67% (as proposed by another answer), the dense matrix-vector multiplication was about three time faster.

Answered by Brian Borchers on June 30, 2021

Henrik Schumacher · Answer

Even if a matrix is very sparse, its matrix product with itself can be dense. Take for example a diagonal matrix and fill its first row and column with nonzero entries; its product with itself will be completely dense. Such a matrix can arise, for examle, as graph Laplacian of a graph in which there is a vertex that is connected to all other vertices. In practice, it suffices if there are few vertices with pretty high connectivity to the rest of the network. For matrix-vector multiplication, this phenomenon is less relevant although it may lead to imbalances when trying to parallelize the matrix-vector multiplication.

What I want to highlight: It really depends on the sparsity pattern and on what you want to do with the matrix. So, the best definition of a sparse matrix that I can come up with (which is pretty useless at the same time) is as follows:

A matrix is sparse if it is advantageous to store only its nonzero values and their positions and to invest the additional overhead that is coming from managing the arising data structure.

The lesson to learn: It really depends on what you want to do with it, which algorithm you use, and (as others have already pointed out) which hard- and software you use whether a given matrix is sparse or not (read as: whether you should use a sparse or dense matrix data structure). There cannot be a purely percentage-based rule if it is not only about storing data or matrix-vector multiplication. The best way to find out if your matrices are sparse is just to try it and compare with dense matrix methods.

Rule of thumb for sparse vs dense matrix storage

3 Answers

Add your own answers!

Ask a Question