Survival loss: A neuron death regularizer


November 20, 2020


Workshop of Physical Agents (WAF)


Emilio Almazan
Javier Tovar
Alejandro de la Calle


We found that combining the L2 regularizer with Adam kills up to 60% of filters in ResNet-110 trained on CIFAR-100 as opposed to combining L2 with Momentum. It does not have a significant impact in terms of accuracy though, where both reach similar values. However, we found that this can be a serious issue if the impaired model is used as a pre-trained model for another more complex dataset (e.g. larger number of categories). This situation actually happens in continual learning. In this paper we conduct a study on the impact of dead filters in continual learning when the dataset increases its difficulty over time and more power from the network is required. Furthermore, we propose a new regularization term referred to as survival loss, that complements L2 to avoid filters to die when combined with Adam. We show that the survival loss improves accuracy in a simulated continual learning set-up, with the prospect of higher improvements as more iterations arrive.