Arsenii Ashukha

I'm a Research Scientist at Samsung AI Center. I (almost) received a PhD in Bayesian deep learning @ bayesgroup so I can make big overcomplicated DNNs work :) My research interests are focused on crafting and training reliable and data-efficient machine learning models.

Prior to that, I was a part of Yandex.Research team (a Russian search giant) and did ML engineering internships at Yandex and Rambler. I received a master's degree at MIPT and Yandex School of Data Analysis. I did a bachelor's degree at BMSTU with a major in applied math and computer science.

Email  /  CV  /  Google Scholar  /  GitHub  /  Twitter

profile photo
Variational Dropout Sparsifies Deep Neural Networks
Dmitry Molchanov*, Arsenii Ashukha*, Dmitry Vetrov
ICML, 2017
retrospective⏳ / talk (15 mins) / arXiv / bibtex / code (theano, tf by GoogleAI, colab pytorch)

Variational dropout secretly trains highly sparsified deep neural networks, while a pattern of sparsity is learned jointly with weights during training.

Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
Arsenii Ashukha*, Alexander Lyzhov*, Dmitry Molchanov*, Dmitry Vetrov
ICLR, 2020
blog post / poster video (5mins) / code / arXiv / bibtex

The work shows that i) a simple ensemble of independently trained networks performs significantly better than recent techniques ii) a simple test-time augmentation applied to a conventional network outperforms low-parameters ensembles (e.g. Dropout) and also improves all ensembles for free iii) comparison of uncertainty estimation ability of algorithms is often done incorectly in literature.

Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation
Dmitry Molchanov*, Alexander Lyzhov*, Yuliya Molchanova*, Arsenii Ashukha*, Dmitry Vetrov
UAI, 2020
code / arXiv / slides / bibtex

We introduce greedy policy search (GPS), a simple but high-performing method for learning a policy of test-time augmentation.

Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Andrei Atanov, Alexandra Volokhova, Arsenii Ashukha, Ivan Sosnovik, Dmitry Vetrov
INNF Workshop at ICML, 2019
code / arXiv / bibtex

We employ semi-conditional normalizing flow architecture that allows efficiently trains normalizing flows when only few labeled data points are available.

Unsupervised Domain Adaptation with SharedLatent Dynamics for Reinforcement Learning
Evgenii Nikishin, Arsenii Ashukha, Dmitry Vetrov
NeurIPS Workshop Track, 2019
code / poster

Domain adaptation via learning shared dynamics in a latent space with adversarial matching of latent states.

The Deep Weight Prior
Andrei Atanov*, Arsenii Ashukha*, Kirill Struminsky, Dmitry Vetrov, Max Welling
ICLR, 2019
code / arXiv / bibtex

The deep weight prior is the generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new datasets.

Variance Networks: When Expectation Does Not Meet Your Expectations
Kirill Neklyudov*, Dmitry Molchanov*, Arsenii Ashukha*, Dmitry Vetrov
ICLR, 2019
code / arXiv / bibtex

It is possible to learn a zero-centered Gaussian distribution over the weights of a neural network by learning only variances, and it works surprisingly well.

Uncertainty Estimation via Stochastic Batch Normalization
Andrei Atanov, Arsenii Ashukha, Dmitry Molchanov, Kirill Neklyudov, Dmitry Vetrov
ICLR Workshop, 2018
code / arXiv

Inference-time stochastic batch normalization improves the performance of uncertainty estimation of ensembles.

Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov
NeurIPS, 2017
code / arXiv / bibtex / poster

The model allows to sparsify a DNN with an arbitrary pattern of spasticity e.g., neurons or convolutional filters.


Check out very short and simple and fan to make implementations of ML algorithms:

Also, chek out more solid implementations (at least they can do ImageNet):

The webpage template was borrowed from Jon Barron.