The Singular Value Decomposition

One Idea Behind Data, Images, and Linear Algebra

Ivan Ricardo

iu.ricardo@maastrichtuniversity.nl

Maastricht University

January 31, 2026

Motivation

How does your phone compress photos?
Why does Google Photos recognize faces?
How can we simplify messy data without losing too much information?

Matrices as Images

Images can be represented by matrices, or tables, with certain numbers of rows and columns
Typically, we have color (Red Green Blue, or RGB) images, which can be represented by three matrices
However, we can make things simple and start with one gray matrix
We will then turn the gray matrix into numbers so it is easier to perform math on this

Gray Matrix Representation

using Images, LinearAlgebra
img = load("img/porco.jpg")
gray_img = Gray.(img)

Numbers Behind the Pixels

Each cell is a number from 0.0 to 1.0
0.0 is black and 1.0 is white

gray_matrix = Float64.(gray_img)

921×1620 Matrix{Float64}:
 0.498039  0.509804  0.517647  0.521569  …  0.568627  0.568627  0.568627
 0.670588  0.678431  0.686275  0.686275     0.733333  0.733333  0.733333
 0.682353  0.690196  0.694118  0.690196     0.756863  0.756863  0.756863
 0.670588  0.67451   0.678431  0.67451      0.745098  0.745098  0.745098
 0.690196  0.694118  0.694118  0.690196     0.745098  0.745098  0.745098
 0.67451   0.67451   0.67451   0.670588  …  0.733333  0.733333  0.733333
 0.678431  0.682353  0.682353  0.67451      0.752941  0.752941  0.752941
 0.666667  0.670588  0.670588  0.666667     0.733333  0.733333  0.733333
 0.686275  0.682353  0.678431  0.67451      0.745098  0.745098  0.745098
 0.686275  0.682353  0.678431  0.67451      0.745098  0.745098  0.745098
 0.686275  0.682353  0.678431  0.67451   …  0.74902   0.74902   0.74902
 0.686275  0.682353  0.678431  0.67451      0.752941  0.752941  0.752941
 0.686275  0.682353  0.678431  0.67451      0.752941  0.752941  0.752941
 ⋮                                       ⋱                      
 0.611765  0.607843  0.603922  0.6          0.737255  0.745098  0.752941
 0.596078  0.596078  0.596078  0.596078  …  0.741176  0.74902   0.756863
 0.584314  0.584314  0.588235  0.592157     0.717647  0.72549   0.737255
 0.592157  0.596078  0.6       0.603922     0.721569  0.72549   0.729412
 0.592157  0.596078  0.6       0.6          0.717647  0.721569  0.72549
 0.592157  0.592157  0.596078  0.6          0.72549   0.72549   0.729412
 0.588235  0.592157  0.596078  0.6       …  0.729412  0.733333  0.737255
 0.588235  0.592157  0.596078  0.596078     0.721569  0.72549   0.729412
 0.588235  0.588235  0.592157  0.596078     0.705882  0.709804  0.713725
 0.584314  0.588235  0.592157  0.596078     0.701961  0.705882  0.709804
 0.584314  0.588235  0.592157  0.596078     0.713725  0.717647  0.721569
 0.45098   0.45098   0.45098   0.45098   …  0.52549   0.533333  0.556863

The Singular Value Decomposition

Breaking a Matrix Apart

Call my matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\)
We can decompose this matrix into the product of three matrices \[ \mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top \]
\(\mathbf{U}\) are the “row-shapes” that show how each component spreads down the image (vertical patterns)
\(\boldsymbol{\Sigma}\) is a diagonal matrix that ranks the “importance” of components - bigger values means more important; small ones can be dropped
\(\mathbf{V}\) are the “column-shapes” that show how each component spreads across the image (horizontal patterns).

Basic Properties

The SVD exists for all real matrices (square or rectangular)
The pieces have a fixed role:
- \(\mathbf{U}\) and \(\mathbf{V}\) are orthonormal matrices (\(\mathbf{U}^\top \mathbf{U} = \mathbf{I}_m\) and \(\mathbf{V}^\top \mathbf{V} = \mathbf{I}_n\))
- \(\boldsymbol{\Sigma}\) is a (ordered) diagonal with non-negative numbers (singular values)

Singular values are ordered from greatest to least – they measure the importance of the patterns in the data or image.

Eckart-Young-Mirsky Theorem

Eckart-Young-Mirsky Theorem

Let \(\mathbf{A} \in \mathbb{R}^{m \times n}\) have SVD \(\mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top\) and define \(\mathbf{A}_k = \sum_{i=1}^k \boldsymbol{\sigma}_i \mathbf{u}_i \mathbf{v}_i\). \(\mathbf{A}_k\) solves the optimization problem \[ \min_{\text{B of rank k}} \| \mathbf{A} - \mathbf{B} \|_* \] for \(* = \{2, F\}\). Furthermore, \[ \| \mathbf{A} - \mathbf{A}_k \|_2 = \boldsymbol{\sigma}_{k+1}, \quad \| \mathbf{A} - \mathbf{A}_k \|_F = \left(\sum_{i=k+1}^{\min(m, n)} \boldsymbol{\sigma}_i\right). \]

where \[ \| \mathbf{X} \|_2 = \boldsymbol{\sigma}_{\text{max}} \quad \text{and} \quad \| \mathbf{X} \|_F = \sqrt{\sum_{i, j}^{mn} x_{i,j}}. \]

Eckart-Young-Mirsky Theorem

Singular values measure how much each rank-one component contributes.
Keep the biggest \(k\) components and you lose exactly the next singular value in spectral norm and the squared sum of the remaining singular values in Frobenius norm, \[ \| \mathbf{A} - \mathbf{A}_k \|_2 = \boldsymbol{\sigma}_{k+1}, \quad \| \mathbf{A} - \mathbf{A}_k \|_F = \left(\sum_{i=k+1}^{\min(m, n)} \boldsymbol{\sigma}_i\right). \]

Application to Compression

This image can be represented by a matrix \(\mathbf{A} \in \mathbb{R}^{921 \times 1620}\).
The SVD gives us \(\mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top\) where
- \(\mathbf{U} \in \mathbb{R}^{921 \times 921}\)
- \(\boldsymbol{\Sigma} \in \mathbb{R}^{921 \times 1620}\)
- \(\mathbf{V} \in \mathbb{R}^{1620 \times 1620}\)

Singular Values in a Rectangular Matrix

\(\boldsymbol{\Sigma}\) is rectangular, so we have \(\min(921, 1620) = 921\) singular values.
What happens if we simply take the largest \(k = 920\) singular values? In other words, we only “compress” the image by removing the last singular value and vector?
This is a reduction in the rank of the original image by one, we should expect minimal noise reduction.

k = 921

k = 920

k = 919

k = 918

k = 500

k = 100

k = 50

k = 25

k = 2

But how does this compress?

Instead of storing \(m \times n\) numbers, we store \(m \times k + n \times k + k\) numbers
In our example, we have \(921 \times 1620 = 1492020\) numbers to store.
If we were to use \(k = 100\), we store \(921 \times 100 + 1620 \times 100 + 100 = 254200\) numbers
This is approximately \(17\)% the size of the original image!
But how do we determine the optimal number of \(k\) in practice?

Selecting k

We may use a scree plot, which shows us how much information we lose at each \(k\).

Cumulative Energy / Variance Explained

Alternatively, we may have a Cumulative Energy based estimator to estimate the number \(k\) in practice
Allows us to obtain a specific value \(k\) without relying on our judgement \[ k^* = \min \left( k: \frac{\sum_{i=1}^k \boldsymbol{\sigma}_i^2}{\sum_{i=1}^{\min(m, n)} \boldsymbol{\sigma}_i^2} \geq \tau\right) \]
Keep enough singular values to capture a fixed fraction of total variance or “energy”

Cumulative Energy / Variance Explained

S = svd_porco.S
τ = 0.999
cumulative_energy = cumsum(S.^2) ./ sum(S.^2)
k_star = findfirst(x -> x >= τ, cumulative_energy)

In keeping \(99.9\)% of the variance, I want to have \(k = 68\).
This reduced the image storage from \(1492020\) numbers to store to \(172856\) numbers to store.
The new image is approximately \(11\)% the size of the original image

Image with k = 68

Conclusion

SVD expresses an image as orthonormal matrices weighted by singular values – most visual information is concentrated in the first few components
Eckart-Young-Mirsky guarantees the truncated SVD is the optimal rank-\(k\) approximation (error controlled exactly by the discarded singular values in spectral/Frobenius norms).
Practical rule: pick \(k\) from a scree or cumulative-energy cutoff; e.g., \(99.9\)% energy at \(k=68\) reduced storage to \(\approx 11\)% with minimal visual loss.