Survival loss: A neuron death regularizer

Date

November 20, 2020

Source

Workshop of Physical Agents (WAF)

Authors

Emilio Almazan
Javier Tovar
Alejandro de la Calle

Abstract

We found that combining the L2 regularizer with Adam kills up to 60% of filters in ResNet-110 trained on CIFAR-100 as opposed to combining L2 with Momentum. It does not have a significant impact in terms of accuracy though, where both reach similar values. However, we found that this can be a serious issue if the impaired model is used as a pre-trained model for another more complex dataset (e.g. larger number of categories). This situation actually happens in continual learning. In this paper we conduct a study on the impact of dead filters in continual learning when the dataset increases its difficulty over time and more power from the network is required. Furthermore, we propose a new regularization term referred to as survival loss, that complements L2 to avoid filters to die when combined with Adam. We show that the survival loss improves accuracy in a simulated continual learning set-up, with the prospect of higher improvements as more iterations arrive.