Geometric Interpretation of a CNN’s last layer

Date

July 1, 2019

Source

Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA)

Authors

Alejandro de la Calle
Javier Tovar
Emilio Almazan

Abstract

Training Convolutional Neural Networks (CNNs) remains a non-trivial task that in many cases relies on the skills and experience of the person conducting the training. Choosing hyper-parameters, knowing when the training should be interrupted, or even when to stop trying training strategies are some difficult decisions that have to be made. These decisions are difficult partly because we still know very little about the internal behaviour of CNNs, especially during training. In this work we conduct a methodical experimentation on MNIST public database of handwritten digits to better understand the evolution of the last layer from a geometric perspective: namely the classification vectors and the image embedding vectors. The visual inspection of these vectors during training have revealed misalignment issues, which otherwise would have not being obvious to detect. We show that by constraining the norms of the classifiers during training these issues are mitigated as well as the time to converge is reduced by 40%. Within this context we present the problem of the variability across equal set-up trainings due to the random component of the initialisation method. We propose a novel approach that guides the initialisation of the parameters in the classification layer. This method reduces 12% the variability across repetitions and leads to accuracies 18% higher on average.