• Ronny Ramlau, Lothar Reichel (Hg.)

ETNA - Electronic Transactions on Numerical Analysis

Bild


Electronic Transactions on Numerical Analysis (ETNA) is an electronic journal for the publication of significant new developments in numerical analysis and scientific computing. Papers of the highest quality that deal with the analysis of algorithms for the solution of continuous models and numerical linear algebra are appropriate for ETNA, as are papers of similar quality that discuss implementation and performance of such algorithms. New algorithms for current or new computer architectures are appropriate provided that they are numerically sound. However, the focus of the publication should be on the algorithm rather than on the architecture. The journal is published by the Kent State University Library in conjunction with the Institute of Computational Mathematics at Kent State University, and in cooperation with the Johann Radon Institute for Computational and Applied Mathematics of the Austrian Academy of Sciences (RICAM). Reviews of all ETNA papers appear in Mathematical Reviews and Zentralblatt für Mathematik. Reference information for ETNA papers also appears in the expanded Science Citation Index. ETNA is registered with the Library of Congress and has ISSN 1068-9613.

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press
A-1011 Wien, Dr. Ignaz Seipel-Platz 2
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: verlag@oeaw.ac.at

Bestellung/Order


Bild
ETNA - Electronic Transactions on Numerical Analysis



ISBN 978-3-7001-8258-0
Online Edition



Send or fax to your local bookseller or to:

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press
A-1011 Wien, Dr. Ignaz Seipel-Platz 2,
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: bestellung.verlag@oeaw.ac.at
UID-Nr.: ATU 16251605, FN 71839x Handelsgericht Wien, DVR: 0096385

Bitte senden Sie mir
Please send me
 
Exemplar(e) der genannten Publikation
copy(ies) of the publication overleaf


NAME


ADRESSE / ADDRESS


ORT / CITY


LAND / COUNTRY


ZAHLUNGSMETHODE / METHOD OF PAYMENT
    Visa     Euro / Master     American Express


NUMMER

Ablaufdatum / Expiry date:  

    I will send a cheque           Vorausrechnung / Send me a proforma invoice
 
DATUM, UNTERSCHRIFT / DATE, SIGNATURE

BANK AUSTRIA CREDITANSTALT, WIEN (IBAN AT04 1100 0006 2280 0100, BIC BKAUATWW), DEUTSCHE BANK MÜNCHEN (IBAN DE16 7007 0024 0238 8270 00, BIC DEUTDEDBMUC)
Bild

LSEMINK: a modified Newton–Krylov method for Log-Sum-Exp minimization

    Kelvin Kan, James G. Nagy, Lars Ruthotto

ETNA - Electronic Transactions on Numerical Analysis, pp. 618-635, 2024/12/18

doi: 10.1553/etna_vol60s618

doi: 10.1553/etna_vol60s618


PDF
X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

X 
Researchgate-Export (COinS)

Permanent QR-Code

doi:10.1553/etna_vol60s618



doi:10.1553/etna_vol60s618

Abstract

This paper introduces LSEMINK, an effective modified Newton–Krylov algorithm geared toward minimizing the log-sum-exp function for a linear model. Problems of this kind arise commonly, for example, in geometric programming and multinomial logistic regression. Although the log-sum-exp function is smooth and convex, standard line-search Newton-type methods can become inefficient because the quadratic approximation of the objective function can be unbounded from below. To circumvent this, LSEMINK modifies the Hessian by adding a shift in the row space of the linear model. We show that the shift renders the quadratic approximation to be bounded from below and that the overall scheme converges to a global minimizer under mild assumptions. Our convergence proof also shows that all iterates are in the row space of the linear model, which can be attractive when the model parameters do not have an intuitive meaning, as is common in machine learning. Since LSEMINK uses a Krylov subspace method to compute the search direction, it only requires matrix-vector products with the linear model, which is critical for large-scale problems. Our numerical experiments on image classification and geometric programming illustrate that LSEMINK considerably reduces the time-to-solution and increases the scalability compared to geometric programming and natural gradient descent approaches. It has significantly faster initial convergence than standard Newton–Krylov methods, which is particularly attractive in applications like machine learning. In addition, LSEMINK is more robust to ill-conditioning arising from the nonsmoothness of the problem. We share our MATLAB implementation at a GitHub repository (https://github.com/KelvinKan/LSEMINK).

Keywords: log-sum-exp minimization, Newton–Krylov method, modified Newton method, machine learning, geometric programming