On the regularization effect of stochastic gradient descent applied to least-squares

Stefan  Steinerberger

doi:doi:10.1553/etna_vol54s610

Ronny Ramlau, Lothar Reichel (Hg.)

ETNA - Electronic Transactions on Numerical Analysis

ISBN 978-3-7001-8258-0
Online Edition

Research Article
Open access

Electronic Transactions on Numerical Analysis (ETNA) is an electronic journal for the publication of significant new developments in numerical analysis and scientific computing. Papers of the highest quality that deal with the analysis of algorithms for the solution of continuous models and numerical linear algebra are appropriate for ETNA, as are papers of similar quality that discuss implementation and performance of such algorithms. New algorithms for current or new computer architectures are appropriate provided that they are numerically sound. However, the focus of the publication should be on the algorithm rather than on the architecture. The journal is published by the Kent State University Library in conjunction with the Institute of Computational Mathematics at Kent State University, and in cooperation with the Johann Radon Institute for Computational and Applied Mathematics of the Austrian Academy of Sciences (RICAM). Reviews of all ETNA papers appear in Mathematical Reviews and Zentralblatt für Mathematik. Reference information for ETNA papers also appears in the expanded Science Citation Index. ETNA is registered with the Library of Congress and has ISSN 1068-9613.

…

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press

A-1011 Wien, Dr. Ignaz Seipel-Platz 2
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: verlag@oeaw.ac.at

Bestellung/Order

ETNA - Electronic Transactions on Numerical Analysis

ISBN 978-3-7001-8258-0
Online Edition

Send or fax to your local bookseller or to:

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press

A-1011 Wien, Dr. Ignaz Seipel-Platz 2,
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: bestellung.verlag@oeaw.ac.at
UID-Nr.: ATU 16251605, FN 71839x Handelsgericht Wien, DVR: 0096385

Bitte senden Sie mir
Please send me

Exemplar(e) der genannten Publikation
copy(ies) of the publication overleaf

NAME

ADRESSE / ADDRESS

ORT / CITY

LAND / COUNTRY

ZAHLUNGSMETHODE / METHOD OF PAYMENT

Visa Euro / Master American Express

NUMMER

Ablaufdatum / Expiry date:

I will send a cheque Vorausrechnung / Send me a proforma invoice

DATUM, UNTERSCHRIFT / DATE, SIGNATURE

BANK AUSTRIA CREDITANSTALT, WIEN (IBAN AT04 1100 0006 2280 0100, BIC BKAUATWW), DEUTSCHE BANK MÜNCHEN (IBAN DE16 7007 0024 0238 8270 00, BIC DEUTDEDBMUC)

Home

Order

Print

DIN-A4-Flyer

HTML-Druck

References

BibTeX

EndNote

RIS/Zotero

Mendeley

QR-Code

Share

X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

Researchgate-Export (COinS)

Permanent QR-Code

doi:10.1553/etna_vol54s610

ETNA

Ronny Ramlau, Lothar Reichel (Hg.)

ETNA - Electronic Transactions on Numerical Analysis

ISBN 978-3-7001-8258-0
Online Edition

Research Article
Open access

Stefan Steinerberger

On the regularization effect of stochastic gradient descent applied to least-squares ()

S. 610 - 619
doi:10.1553/etna_vol54s610

Open access

Verlag der Österreichischen Akademie der Wissenschaften

doi:10.1553/etna_vol54s610

Abstract:
We study the behavior of the stochastic gradient descent methodapplied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible matrices $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that$ \mathbb{E}\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$This is a curious inequality since the last term involves one additional matrix multiplication applied to the error $x_k - x$ compared to the remaining terms: if the projection of $x_k - x$ onto the subspace of singular vectors corresponding to large singular values is large, then the stochastic gradient descent method leads to a fast regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values acts as a regularizer.

Keywords: stochastic gradient descent, Kaczmarz method, least-squares, regularization
Published Online: 2021/10/05 08:03:53
Object Identifier: 0xc1aa5576 0x003cde5e
Rights:All rights reserved.For questions regarding copyright and copies please contact us by email.

…

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press

A-1011 Wien, Dr. Ignaz Seipel-Platz 2
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: verlag@oeaw.ac.at