Berfin Şimşek

postdoc @ Flatiron CCM
guest researcher @ NYU CDS

About me

Hi! I am a Research Fellow (Postdoc) at Flatiron (CCM) and a guest researcher at New York University (CDS). I am currently analyzing exciting models of deep learning that might give insight into representations and feature learning. During my Ph.D. at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, I developed a combinatorial method to quantify the complexity of neural network loss landscapes.

Prior to starting at Flatiron (CCM), I was a Faculty Fellow at New York University (CDS) where I co-instructed the Machine Learning Course. I was fortunate to be advised by Clément Hongler and Wulfram Gerstner at EPFL. For a brief period, I explored out-of-distribution generalization at Meta AI during my Ph.D.. I studied Electrical-Electronics Engineering and Mathematics double-major at Koç University in Istanbul. Before that, I earned two bronze medals in International Mathematical Olympiad (IMO).

See my Google Scholar page for an up to date list of publications.

Talks & Conferences

AI for Mathematics Workshop

Apr 7, 2025

Average-case Algorithmic Hardness Workshop

Jan 27, 2025

Joint Mathematics Meetings

Jan 8, 2025

See all talks

Featured Publications

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

Berfin Şimşek, Amire Bendjeddou, Wulfram Gerstner, Johanni Brea

PDF Talk (5 mins) Poster Slides

Publications

Quickly discover relevant content by filtering publications.

Understanding Out-of-Distribution Accuracies through Quantifying Difficulty of Test Samples

Berfin Şimşek, Melissa Hall, Levent Sagun

PDF Slides

Deep Linear Networks Dynamics: Low-Rank Biases Induced by Initialization Scale and L2 Regularization

Arthur Jacot, François Ged, Berfin Şimşek, Franck Gabriel, Clément Hongler

PDF

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Berfin Şimşek, François Ged, Arthur Jacot, Francesco Spadaro, Clément Hongler, Wulfram Gerstner, Johanni Brea

PDF Poster Talk (5 mins) Talk (20 mins) Talk (1 hour) Slides

See all publications

Posts

Last updated on Jun 19, 2024 11 min read

A Neural Net Model for Distillation with Weights Explained

It is important to understand how large models represent knowledge to make them efficient and safe. We study a toy model of neural nets that exhibits non-linear dynamics and phase transition. Although the model is complex, it allows finding a family of the so-called `copy-average’ critical points of the loss. The gradient flow initialized with random weights consistently converges to one such critical point for networks up to a certain width, which we proved to be optimal among all copy-average points. Moreover, we can explain every neuron of a trained neural network of any width. As the width grows, the network changes the compression strategy and exhibits a phase transition. We close by listing open questions calling for further mathematical analysis and extensions of the model considered here.

Last updated on Apr 24, 2024 11 min read

Implicit Regularization of Random Feature Models

We discuss how the effective ridge reveals the implicit regularization effect of finite sampling in random features. The derivative of the effective ridge tracks the variance of the optimal predictor, yielding an explanation for the variance explosion at the interpolation threshold for arbitrary datasets.