Back in 2018 I published pytorch-hessian-eigenthings, a niche open source package for GPU-accelerated curvature analysis of PyTorch models. Loss landscape curvature metrics like the eigenvalues of the Hessian have been implicated in many generalization properties of neural networks (like flat-minima hypotheses, low-rank Hessian claims, etc.). But the full Hessian costs memory quadratic in the parameter count, which is usually infeasible. This library uses Hessian-vector products + iterative methods (Lanczos, power iteration) to get the eigendecomposition in linear memory instead. I stepped away from the project for years, but it ended up being used by other researchers doing curvature analysis work. I noticed the original implementation had aged so I thought I'd revisit it. I also have more professional engineering experience under my belt to inform the design.
I just shipped a v1.0 rewrite. The new version adds new curvature operators (Generalized Gauss-Newton, empirical Fisher), and new algorithms (Hutchinson + Hutch++ trace estimation, spectral density via Stochastic Lanczos Quadrature). It also has a fused Triton/torch.compile cross-entropy Hessian-vector kernel for foundation-model-scale vocabularies (where standard implementations blow up). More importantly it adds a lot of numerical analysis validating the operators: closed-form correctness on linear/logistic regression where the Hessian is known analytically, and cross-library tests against curvlinops to catch any regressions.
I'm hoping to use it for some follow-up analysis. For example right now I'm looking at inter-agreement between various optimizer updates (Muon, K-FAC, Natural Gradient Descent) on Pythia checkpoints.
Very open to suggestions or requests from anyone who's been working in this space. I've been out of the field for a while, so pointers to recent work I should be aware of are very welcome.
Back in 2018 I published pytorch-hessian-eigenthings, a niche open source package for GPU-accelerated curvature analysis of PyTorch models. Loss landscape curvature metrics like the eigenvalues of the Hessian have been implicated in many generalization properties of neural networks (like flat-minima hypotheses, low-rank Hessian claims, etc.). But the full Hessian costs memory quadratic in the parameter count, which is usually infeasible. This library uses Hessian-vector products + iterative methods (Lanczos, power iteration) to get the eigendecomposition in linear memory instead. I stepped away from the project for years, but it ended up being used by other researchers doing curvature analysis work. I noticed the original implementation had aged so I thought I'd revisit it. I also have more professional engineering experience under my belt to inform the design.
I just shipped a v1.0 rewrite. The new version adds new curvature operators (Generalized Gauss-Newton, empirical Fisher), and new algorithms (Hutchinson + Hutch++ trace estimation, spectral density via Stochastic Lanczos Quadrature). It also has a fused Triton/torch.compile cross-entropy Hessian-vector kernel for foundation-model-scale vocabularies (where standard implementations blow up). More importantly it adds a lot of numerical analysis validating the operators: closed-form correctness on linear/logistic regression where the Hessian is known analytically, and cross-library tests against curvlinops to catch any regressions.
https://github.com/noahgolmant/pytorch-hessian-eigenthings
I'm hoping to use it for some follow-up analysis. For example right now I'm looking at inter-agreement between various optimizer updates (Muon, K-FAC, Natural Gradient Descent) on Pythia checkpoints.
Very open to suggestions or requests from anyone who's been working in this space. I've been out of the field for a while, so pointers to recent work I should be aware of are very welcome.