Calendar
Calendar
Posted March 15, 2019
3:30 pm – 4:30 pm Lockett 232
Yichuan Zhao, Georgia State University
Empirical likelihood for the bivariate survival function under univariate censoring
Abstract: The bivariate survival function plays an important role in multivariate survival analysis. Using the idea of influence functions, we develop empirical likelihood confidence intervals for the bivariate survival function in the presence of univariate censoring. It is shown that the empirical log-likelihood ratio has an asymptotic standard chi-squared distribution with one degree of freedom. A comprehensive simulation study shows that the proposed method outperforms both the traditional normal approximation method and the adjusted empirical likelihood method in most cases. The Diabetic Retinopathy Data are analyzed for illustration of the proposed procedure. This is joint work with Haitao Huang.
Posted March 26, 2019
3:30 pm – 4:30 pm Lockett 232
Arnab Ganguly, LSU
Reading Group Talk: An introduction to RKHS
Abstract: I will present some introductory materials on Reproducing kernel Hilbert spaces and its use in supervised learning.
Posted April 9, 2019
3:30 pm – 4:30 pm Lockett 232
Arnab Ganguly, LSU
An Introduction to RKHS - Part II
Posted August 9, 2019
Last modified October 31, 2019
Dejan Slepcev, Carnegie Mellon University
Variational problems on random structures: analysis and applications to data science
Abstract: Modern data-acquisition techniques produce a wealth of data about the world we live in. Extracting the information from the data leads to machine learning/statistics tasks such as clustering, classification, regression, dimensionality reduction, and others. Many of these tasks seek to minimize a functional, defined on the available random sample, which specifies the desired properties of the object sought.
I will present a mathematical framework suitable for studies of asymptotic properties of such, variational, problems posed on random samples and related random geometries (e.g. proximity graphs). In particular we will discuss the passage from discrete variational problems on random samples to their continuum limits. Furthermore we will discuss how tools of applies analysis help shed light on algorithms of machine learning.
Posted March 19, 2026
1:30 pm – 2:30 pm Lockett 276
Aditya Guntuboyina, University of California, Berkeley
What functions does XGBoost learn?
XGBoost is a scalable tree boosting system that is widely used by data scientists for regression. We develop a theoretical framework that explains what kinds of functions XGBoost is able to learn. We introduce an infinite-dimensional function class that extends ensembles of shallow decision trees, along with a natural measure of complexity that generalizes the regularization penalty built into XGBoost. We show that this complexity measure aligns with classical notions of variation—in one dimension it corresponds to total variation, and in higher dimensions it is closely tied to a well-known concept called Hardy–Krause variation. We prove that the best least-squares estimator within this class can always be represented using a finite number of trees, and that it achieves a nearly optimal statistical rate of convergence, avoiding the usual curse of dimensionality. Our work provides the first rigorous description of the function space that underlies XGBoost, clarifies its relationship to classical ideas in nonparametric estimation, and highlights an open question: does the actual XGBoost algorithm itself achieve these optimal guarantees? This is joint work with Dohyeong Ki at UC Berkeley.