Arsenii (Senya) Ashukha


I'm a Research Scientist at Samsung AI Center and I'm finishing my PhD that is focused on applications and understanding of stochastic deep learning models. Prior to that, I received a master's degree at Moscow Institute of Physics and Technology and Yandex school of data analysis, where lerned machine learning. I did a bachelor's degree at Bauman Moscow State Technical University with a major in applied math and computer science.

Email  /  CV  /  Google Scholar  /  GitHub  /  Twitter

profile photo
Research
­čŹŐVariational Dropout Sparsifies Deep Neural Networks
Dmitry Molchanov*, Arsenii Ashukha*, Dmitry Vetrov
ICML, 2017
retrospectiveÔĆ│ / talk (15 mins) / arXiv / bibtex / code (theano, tf by GoogleAI, colab pytorch)

Variational dropout secretly trains highly sparsified deep neural networks, while a pattern of sparsity is learned jointly with weights during training.

­čŹŐPitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
Arsenii Ashukha*, Alexander Lyzhov*, Dmitry Molchanov*, Dmitry Vetrov
ICLR, 2020
blog post / poster video (5mins) / code / arXiv / bibtex

The work shows that i) a simple ensemble of independently trained networks performs significantly better than recent techniques ii) a simple test-time augmentation applied to a conventional network outperforms low-parameters ensembles (e.g. Dropout) and also improves all ensembles for free iii) comparison of uncertainty estimation ability of algorithms is often done incorectly in literature.

Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation
Dmitry Molchanov*, Alexander Lyzhov*, Yuliya Molchanova*, Arsenii Ashukha*, Dmitry Vetrov
UAI, 2020
code / arXiv / slides / bibtex

We introduce greedy policy search (GPS), a simple but high-performing method for learning a policy of test-time augmentation.

The Deep Weight Prior
Andrei Atanov*, Arsenii Ashukha*, Kirill Struminsky, Dmitry Vetrov, Max Welling
ICLR, 2019
code / arXiv / bibtex

The deep weight prior is the generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new datasets.

Variance Networks: When Expectation Does Not Meet Your Expectations
Kirill Neklyudov*, Dmitry Molchanov*, Arsenii Ashukha*, Dmitry Vetrov
ICLR, 2019
code / arXiv / bibtex

It is possible to learn a zero-centered Gaussian distribution over the weights of a neural network by learning only variances, and it works surprisingly well.

Uncertainty Estimation via Stochastic Batch Normalization
Andrei Atanov, Arsenii Ashukha, Dmitry Molchanov, Kirill Neklyudov, Dmitry Vetrov
ICLR Workshop, 2018
code / arXiv

Inference-time stochastic batch normalization improves the performance of uncertainty estimation of ensembles.

Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov
NeurIPS, 2017
code / arXiv / bibtex / poster

The model allows to sparsify a DNN with an arbitrary pattern of spasticity e.g., neurons or convolutional filters.

Code

Check out very short and simple and fan to make implementations of ML algorithms:


Also, chek out more solid implementations (at least they can do ImageNet):
Talks
The State of Ensembling (and Uncertainty), Scientific seminar of @bayesgroup, 2019

Normalizing flows, Deep|Bayes summer school, 2019

How to generate Ensembles, Uncertainty estimation mini-course, 2020

Grammatical errors are hard to control sometimes, and it should be 0.5(f1(x) + f2(x))

Variational dropout (in russian), Data fest, 2019

The webpage template was borrowed from Jon Barron.