![]() |
![]() |
ETNA - Electronic Transactions on Numerical Analysis
|
![]() |
Verlag der Österreichischen Akademie der Wissenschaften Austrian Academy of Sciences Press
A-1011 Wien, Dr. Ignaz Seipel-Platz 2
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400 https://verlag.oeaw.ac.at, e-mail: verlag@oeaw.ac.at |
![]() |
|
DATUM, UNTERSCHRIFT / DATE, SIGNATURE
BANK AUSTRIA CREDITANSTALT, WIEN (IBAN AT04 1100 0006 2280 0100, BIC BKAUATWW), DEUTSCHE BANK MÜNCHEN (IBAN DE16 7007 0024 0238 8270 00, BIC DEUTDEDBMUC)
|
ETNA - Electronic Transactions on Numerical Analysis, pp. 610-619, 2021/10/05
We study the behavior of the stochastic gradient descent methodapplied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible matrices $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that$ \mathbb{E}\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$This is a curious inequality since the last term involves one additional matrix multiplication applied to the error $x_k - x$ compared to the remaining terms: if the projection of $x_k - x$ onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.
Keywords: stochastic gradient descent, Kaczmarz method, least-squares, regularization