Short review: Semi-Supervised Learning with Ladder Network 

Paper: Semi-Supervised Learning with Ladder Networks, A. Rasmus et al., 2015

It was nice to read a paper from my alma mater, Aalto University. The proposed network model performs the job similar to a stack of denoising autoencoders but with a more optimized approach. The model applies autoencoding for all network layers together which opens more opportunities for optimizing their reconstruction. In contrast, regular stacked autoencoders will attempt to reconstruct one layer at a time.

The model aims to improve the training of feedforward networks for classification tasks. You are supposed to have a pre-trained network that will act as an encoder of the Ladder Network. The layers of your pre trained network will be consequentially analyzed for conditional distribution of information representation. After the distribution function is estimated (e.g. Gaussian), the model will approximate a relevant denoising function, that later will be used for reconstructing layer values. The layer and its reconstructed version are then connected via skip-connections. With this combination of a network pre-trained on labeled data and unsupervised “stacked autoencoder” we can train the net in unsupervised fashion.

Even though the authors mention the possibility of using such method in Recurrent Neural Nets, its main application is supposed to be in feedforward networks. Evaluation of the model shows significantly lower error rate in MNIST test when using small number of labeled samples e.g. 100. However, I was surprised to see that the authors used 10 000 labeled samples for the validation of the model. This may contradict the premise of semi-supervised learning, even though they technically used a smaller number of labeled training samples. Authors admit that it is not representative of real life scenarios and they used 10 000 labeled samples for validation because it is customary for that benchmark. Therefore, it will be interesting to see a further evaluation with the lower amount of labeled data for validation.