SMS scnews item created by Linh Nghiem at Fri 5 Aug 2022 1858
Type: Seminar
Distribution: World
Expiry: 19 Aug 2022
Calendar1: 12 Aug 2022 1400
CalLoc1: Zoom webinar
Auth: linhn@10.48.17.84 (hngh7483) in SMS-SAML

Seminar: Ho -- Instability, Computational Efficiency and Statistical Accuracy

Instability, Computational Efficiency and Statistical Accuracy 

Speaker: Nhat Ho - University of Texas at Austin 
Time: 14:00-15:00 Friday 12 August 2022
Location: Zoom at https://uni-sydney.zoom.us/j/89818118106

Many statistical estimators are defined as the fixed point of a data-dependent operator,
with estimators based on minimizing a cost function being an important special case.
The limiting performance of such estimators depends on the properties of the
population-level operator in the idealized limit of infinitely many samples.  We develop
a general framework that yields bounds on statistical accuracy based on the interplay
between the deterministic convergence rate of the algorithm at the population level, and
its degree of (in)stability when applied to an empirical object based on n samples.
Using this framework, we analyze both stable forms of gradient descent and some
higher-order and unstable algorithms, including Newtonâ€™s method and its
cubic-regularized variant, as well as the EM algorithm.  We provide applications of our
general results to several concrete classes of singular statistical models, including
Gaussian mixture estimation, single-index models, and informative non-response models.
We exhibit cases in which an unstable algorithm can achieve the same statistical
accuracy as a stable algorithm in exponentially fewer stepsâ€”namely, with the number of
iterations being reduced from polynomial to logarithmic in sample size n." 

Bio: Nhat Ho is currently an Assistant Professor of Statistics and Data Sciences at the
University of Texas at Austin.  He is also a core member of the Machine Learning
Laboratory.  His current research focuses on the interplay of four principles of
statistics and data science: heterogeneity of data, interpretability of models,
stability, and scalability of optimization and sampling algorithms.