guest researcher @ NYU CDS

Hi! I am a Research Fellow (Postdoc) at Flatiron (CCM) and a guest researcher at New York University (CDS). I am currently analyzing exciting models of deep learning that might give insight into representations and feature learning. During my Ph.D. at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, I developed a combinatorial method to quantify the complexity of neural network loss landscapes.

Prior to starting at Flatiron (CCM), I was a Faculty Fellow at New York University (CDS) where I co-instructed the Machine Learning Course. I was fortunate to be advised by Clément Hongler and Wulfram Gerstner at EPFL. For a brief period, I explored out-of-distribution generalization at Meta AI during my Ph.D.. I studied Electrical-Electronics Engineering and Mathematics double-major at Koç University in Istanbul. Before that, I earned two bronze medals in International Mathematical Olympiad (IMO).

See my Google Scholar page for an up to date list of publications.

Quickly discover relevant content by filtering publications.

Deep Linear Networks Dynamics: Low-Rank Biases Induced by Initialization Scale and L2 Regularization

It is important to understand how large models represent knowledge to make them efficient and safe. We study a toy model of neural nets that exhibits non-linear dynamics and phase transition. Although the model is complex, it allows finding a family of the so-called `copy-average’ critical points of the loss. The gradient flow initialized with random weights consistently converges to one such critical point for networks up to a certain width, which we proved to be optimal among all copy-average points. Moreover, we can explain every neuron of a trained neural network of any width. As the width grows, the network changes the compression strategy and exhibits a phase transition. We close by listing open questions calling for further mathematical analysis and extensions of the model considered here.

We discuss how the effective ridge reveals the implicit regularization effect of finite sampling in random features. The derivative of the effective ridge tracks the variance of the optimal predictor, yielding an explanation for the variance explosion at the interpolation threshold for arbitrary datasets.