Simple models of sudden learning

Seminar
מועמד למחלקה
Off
Speaker
Yohai Bar-Sinai (Tel Aviv University)
Date
- Add to Calendar 2026-06-22 10:30:00 2026-06-22 12:00:00 Simple models of sudden learning A quantitative understanding of how and when neural networks learn from data is a fundamental problem with far-reaching practical consequences. An intriguing and still poorly understood phenomenon in modern machine learning is that models often learn to perform tasks in a sequence of sudden transitions,  a behavior known as "Grokking". Notably, in these cases the model generalized to unseen data long after it has completely fit the training set. This sharp transition between memorization and generalization has been observed across various synthetic and realistic scenarios. I will present a set of idealized models that exhibit this behavior and help explain its emergence "in the wild". These include linear and near-linear settings in both regression and classification, where the full training dynamics can be solved analytically, and where grokking can be understood as a manifestation of critical slowing down. In more complex settings, similar phenomenology may arise from glassy-like escape dynamics in high-dimensional loss landscapes. I will conclude by discussing how these simplified models apply to realistic settings. Physics (Building 202), Room 301 המחלקה לפיזיקה physics.dept@mail.biu.ac.il Asia/Jerusalem public
Place
Physics (Building 202), Room 301
Abstract

A quantitative understanding of how and when neural networks learn from data is a fundamental problem with far-reaching practical consequences. An intriguing and still poorly understood phenomenon in modern machine learning is that models often learn to perform tasks in a sequence of sudden transitions,  a behavior known as "Grokking". Notably, in these cases the model generalized to unseen data long after it has completely fit the training set. This sharp transition between memorization and generalization has been observed across various synthetic and realistic scenarios. I will present a set of idealized models that exhibit this behavior and help explain its emergence "in the wild". These include linear and near-linear settings in both regression and classification, where the full training dynamics can be solved analytically, and where grokking can be understood as a manifestation of critical slowing down. In more complex settings, similar phenomenology may arise from glassy-like escape dynamics in high-dimensional loss landscapes. I will conclude by discussing how these simplified models apply to realistic settings.

תאריך עדכון אחרון : 14/06/2026