One Idea Behind Data, Images, and Linear Algebra
January 31, 2026
How does your phone compress photos?
Why does Google Photos recognize faces?
How can we simplify messy data without losing too much information?
921×1620 Matrix{Float64}:
0.498039 0.509804 0.517647 0.521569 … 0.568627 0.568627 0.568627
0.670588 0.678431 0.686275 0.686275 0.733333 0.733333 0.733333
0.682353 0.690196 0.694118 0.690196 0.756863 0.756863 0.756863
0.670588 0.67451 0.678431 0.67451 0.745098 0.745098 0.745098
0.690196 0.694118 0.694118 0.690196 0.745098 0.745098 0.745098
0.67451 0.67451 0.67451 0.670588 … 0.733333 0.733333 0.733333
0.678431 0.682353 0.682353 0.67451 0.752941 0.752941 0.752941
0.666667 0.670588 0.670588 0.666667 0.733333 0.733333 0.733333
0.686275 0.682353 0.678431 0.67451 0.745098 0.745098 0.745098
0.686275 0.682353 0.678431 0.67451 0.745098 0.745098 0.745098
0.686275 0.682353 0.678431 0.67451 … 0.74902 0.74902 0.74902
0.686275 0.682353 0.678431 0.67451 0.752941 0.752941 0.752941
0.686275 0.682353 0.678431 0.67451 0.752941 0.752941 0.752941
⋮ ⋱
0.611765 0.607843 0.603922 0.6 0.737255 0.745098 0.752941
0.596078 0.596078 0.596078 0.596078 … 0.741176 0.74902 0.756863
0.584314 0.584314 0.588235 0.592157 0.717647 0.72549 0.737255
0.592157 0.596078 0.6 0.603922 0.721569 0.72549 0.729412
0.592157 0.596078 0.6 0.6 0.717647 0.721569 0.72549
0.592157 0.592157 0.596078 0.6 0.72549 0.72549 0.729412
0.588235 0.592157 0.596078 0.6 … 0.729412 0.733333 0.737255
0.588235 0.592157 0.596078 0.596078 0.721569 0.72549 0.729412
0.588235 0.588235 0.592157 0.596078 0.705882 0.709804 0.713725
0.584314 0.588235 0.592157 0.596078 0.701961 0.705882 0.709804
0.584314 0.588235 0.592157 0.596078 0.713725 0.717647 0.721569
0.45098 0.45098 0.45098 0.45098 … 0.52549 0.533333 0.556863
Call my matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\)
We can decompose this matrix into the product of three matrices \[ \mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top \]
\(\mathbf{U}\) are the “row-shapes” that show how each component spreads down the image (vertical patterns)
\(\boldsymbol{\Sigma}\) is a diagonal matrix that ranks the “importance” of components - bigger values means more important; small ones can be dropped
\(\mathbf{V}\) are the “column-shapes” that show how each component spreads across the image (horizontal patterns).
Singular values are ordered from greatest to least – they measure the importance of the patterns in the data or image.
Eckart-Young-Mirsky Theorem
Let \(\mathbf{A} \in \mathbb{R}^{m \times n}\) have SVD \(\mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top\) and define \(\mathbf{A}_k = \sum_{i=1}^k \boldsymbol{\sigma}_i \mathbf{u}_i \mathbf{v}_i\). \(\mathbf{A}_k\) solves the optimization problem \[ \min_{\text{B of rank k}} \| \mathbf{A} - \mathbf{B} \|_* \] for \(* = \{2, F\}\). Furthermore, \[ \| \mathbf{A} - \mathbf{A}_k \|_2 = \boldsymbol{\sigma}_{k+1}, \quad \| \mathbf{A} - \mathbf{A}_k \|_F = \left(\sum_{i=k+1}^{\min(m, n)} \boldsymbol{\sigma}_i\right). \]
where \[ \| \mathbf{X} \|_2 = \boldsymbol{\sigma}_{\text{max}} \quad \text{and} \quad \| \mathbf{X} \|_F = \sqrt{\sum_{i, j}^{mn} x_{i,j}}. \]
Alternatively, we may have a Cumulative Energy based estimator to estimate the number \(k\) in practice
Allows us to obtain a specific value \(k\) without relying on our judgement \[ k^* = \min \left( k: \frac{\sum_{i=1}^k \boldsymbol{\sigma}_i^2}{\sum_{i=1}^{\min(m, n)} \boldsymbol{\sigma}_i^2} \geq \tau\right) \]
Keep enough singular values to capture a fixed fraction of total variance or “energy”
68
SVD expresses an image as orthonormal matrices weighted by singular values – most visual information is concentrated in the first few components
Eckart-Young-Mirsky guarantees the truncated SVD is the optimal rank-\(k\) approximation (error controlled exactly by the discarded singular values in spectral/Frobenius norms).
Practical rule: pick \(k\) from a scree or cumulative-energy cutoff; e.g., \(99.9\)% energy at \(k=68\) reduced storage to \(\approx 11\)% with minimal visual loss.