I have recently started as a post-doc with Dr. Kyunghyn Cho and Dr. Krzysztof Geras at New York University. I am also a machine learning lead at molecule.one . I obtained my PhD from Jagiellonian University supervised by Prof. Jacek Tabor and co-supervised by Prof. Amos Storkey (University of Edinburgh). I also spent two summers as a visiting student with Prof. Yoshua Bengio.

My email is staszek.jastrzebski (on gmail).


  • Our paper on parameter efficient training of BERT was accepted to ICML 2019!
  • Papers accepted to ICLR 2019, and AISTATS 2019! Also, our preprint on Neural Architecture Search is online.

My main research goal is to understand and improve how deep network generalize. My research interests include:

  • Optimization in Deep Learning
  • Representation Learning
  • Natural Language Processing
  • Computer Aided Drug Design

Co-supervised students:

  • [BSc] Michał Zmysłowski (UW) - Gradient structure and generalization
  • [MSc] Sławomir Mucha - Towards the ImageNet of virtual screening
  • [MSc] Tobiasz Ciepliński - Evaluating generative models in chemistry using docking simulators
  • You? I am always looking for new promising students
  • [Defended, MSc] Tomasz Wesołowski - Relevance of enriching word embeddings in modern deep natural language processing
  • [Defended, MSc] Andrii Krutsylo - Physics aware representation for drug discovery
  • [Defended, BSc] Michał Soboszek - Evaluating word embeddings
  • [Defended, MSc] Jakub Chłędowski - Representation learning for textual entailment
  • [Defended, MSc] Mikołaj Sacha - Meta learning and sharpness of the minima

Selected publications

For a full list please see my Google Scholar profile.

Logo image

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

S. Jastrzębski, Z. Kenton, N. Ballas, A. Fischer, Y. Bengio, A. Storkey

International Conference on Learning Representations 2019
code .pdf poster

Logo image

Three factors influencing minima in SGD

S. Jastrzębski*, Z. Kenton*, D. Arpit, N. Ballas, A. Fischer, Y. Bengio, A. Storkey

International Conference on Artificial Neural Networks 2018 (oral), International Conference on Learning Representations 2018 (workshop)

Logo image

Residual Connections Encourage Iterative Inference

S. Jastrzębski*, D. Arpit*, N. Ballas, V. Verma, T. Che, Y. Bengio

International Conference on Learning Representations 2018
.pdf poster

Logo image

A Closer Look at Memorization in Deep Networks

D. Arpit*, S. Jastrzębski*, N. Ballas*, D. Krueger*, T. Maharaj, E. Bengio, A. Fischer, A. Courville, S. Lacoste-Julien, Y. Bengio

International Conference on Machine Learning 2017
.pdf poster slides

Logo image

Quo vadis G Protein-Coupled Receptor ligands? A tool for analysis of the emergence of new groups of compounds over time

D. Lesniak, S. Jastrzębski, W. M. Czarnecki, S. Podlewska, A. Bojarski

Bioorganic & Medicinal Chemistry Letters, 2017

Logo image

Learning to SMILE(S)

S. Jastrzębski, D. Lesniak, W. M. Czarnecki

International Conference on Learning Representations 2016 (workshop track)
.pdf poster

Short CV

Logo image

Google AI

Research intern, automatic machine learning

11.2018-, Zurich, Switzerland

Logo image


Research intern supervised by prof. Yoshua Bengio

Summer 2016 & 2017, Montreal, Canda

Logo image


Machine Learning intern, fraud detection models and Deep NLP applications

7-9.2016, London, UK

Logo image

University of Edinburgh

Research intern, Deep Learning for Go, under the supervision of prof. Amos Storkey

9-11.2015, Edinburgh, UK

Logo image


SDE intern, distributed data systems

2015, Palo Alto, USA

Logo image


SDE intern, API research and design

2014, Redmond, USA

Logo image


SDE intern, data processing framework development

2013, London, UK


Logo image

Word Embeddings Benchmarks

Python package for evaluating word embeddings.


Logo image


ML algorithms C++ implementations for R, including online clustering, swappable SVM library interface and new clustering algorithm.

github url

Logo image

KrakRobot Simulator

Simulator for 2015 and 2016 editions online qualification round.

github url


Logo image

Optimization in Deep Learning

Lecture @ GMUM Deep Learning Workshop

2018, Cracow, Poland

Logo image

Understanding How Deep Networks Learn

Invited talk @ PL in ML

2017, Warsaw, Poland
slides video