Short review: One-shot Learning with Memory-Augmented Neural Networks

Paper: One-shot Learning with Memory-Augmented Neural Networks, A. Santoro et al., 2016

The topic of memory augmented Artificial Neural Networks (MANN) is particularly interesting for me and I am glad to see growing attention to this topic. Back in 2014 I reviewed “Neural Turing Machines” (NTM) paper by A. Graves et al. that introduced a model of Artificial Neural Network (ANN) capable of Turing-complete computing. Today, I will review another interesting paper from DeepMind that presents an updated version of NTM.

NTM is an architecture of a Neural network that can write and read information to external memory selectively. However, training such network to perform particular operations takes multiple training iterations and large set of training samples. This limits the application of NTM due to performance limitations and availability of large datasets for some real life scenarios. There was an opinion in Machine Learning community that Deep Learning models are poorly suited for low-shot learning. This was supported by research findings, for example [B. Lake et al., 2015] demonstrated advantage of Bayesian Learning over Deep Learning methods in low-shot tasks. Therefore, the addition of Bayesian Learning to NTM model could be beneficial for alleviating the aforementioned problems. This could also make training of the model closer to human learning where only few samples of new concepts may be enough for inference.

Modified NTM model follows the concept of meta-learning and can learn at two levels: rapid learning and slow learning. Rapid learning involves quick encoding of few samples of new information with external memory module. At this level the model uses “one-shot” (few-shots) learning. Unfortunately, the authors provided limited information about the algorithm and details of modifications to NTM controller’s heads e.g. aligning mechanisms in read and write heads. In addition to rapid learning, the model also learns from data at a slow level using gradient descent which optimizes function of expected learning cost. For prediction the network uses memory data about previous samples from the sequence, that were added earlier in rapid learning. Thus, by processing examples at two levels the model is trained to better predict sequences samples.

Another difference from the vanilla NTM is a different addressing mechanism. New mechanism only uses content based location while the original one also had location-based addressing. Older method location-based addressing allowed to iterate through fixed sequences in memory and was less suitable for processing non-sequence data. New addressing method prioritizes writing to least recently used memory locations for storing most recent information in the memory. Additionally, the model is supposed to update relevant information in the memory even for recently updated content if newer information was detected. Therefore, the model can write either to the least recently used locations or the most recently used.

The experiments indicate the advantage of the new model over previous NTM architecture and LSTM networks in image classification task. The model also demonstrated adequate performance on regression tasks compared to predictions by a Gaussian Process model with the ground truth kernel. Unfortunately, there was no comparison with Bayesian Learning competitors [B. Lake et al., 2015] which seem to be direct opponents to proposed model. A small issue that I personally experienced was organization of experiments section in the paper, which I found rather confusing. I assume it is just an issue of the version 1 and it will be fixed in the future versions.

To summarize, this paper demonstrates an interesting view on NTM and how it can be adjusted for use-cases with limited data samples. There certainly could be more details of the new architecture. If the authors could release the source code it would also be beneficial for better understanding of their model.


  • Santoro, Adam, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. “One-shot learning with memory-augmented neural networks.” arXiv preprint arXiv:1605.06065 (2016).
  • Graves, Alex, Greg Wayne, and Ivo Danihelka. “Neural turing machines.” arXiv preprint arXiv:1410.5401 (2014).
  • Lake, Brenden M., Ruslan Salakhutdinov, and Joshua B. Tenenbaum. “Human-level concept learning through probabilistic program induction.” Science 350, no. 6266 (2015): 1332-1338.