The paper introduces a model that aims to learn from natural language sentences. The ability to learn from natural language input is important goal of Machine Learning that is actively pursued by many researchers. Such ability will increase the opportunities of supervised learning by overcoming the necessity of labeling large quantities of data samples. The data labeling may require enormous resources even though some of the work can be automated with libraries such as Stanford NLP that parse and annotate the text. Another example of such library is Google “Parsey McParseface” parser that has shown over 97% accuracy in part-of-speech tagging [D. Andor et al., 2016].
The goal of the paper is to create a model that can learn from the conversation by receiving a natural language feedback. The model is an End-to-End Memory Network (MemN2N) [S. Sukhbaatar et al., 2015], that was also co-authored by J.Weston in 2015 and is a further expansion of the idea of Memory Networks [J. Weston et al., 2014]. This network architecture demonstrated high performance in language understanding tasks on bAbI datasets and show advantage in some tasks over LSTM based competitors. In short, the network architecture has two components: Memory Module that contains memory vectors and Controller Module that can access these memory vectors with addressing vectors. The Controller can modify the internal state memory with memory vectors from Memory Module, forming the next memory state.
The author evaluates multiple learning scenarios with different types of feedback. Feedback used in the experiments had 12 variations: 6 for positive and 6 for negative feedback. Even though the author does not explain the reason of selecting this particular number of templates, I assume his attempt was to select enough templates to make the dialog more natural. Moreover, only few feedback templates were listed in the paper. In some experiment scenarios natural language feedbacks were also supplemented by additional signals as a reward for the agent. Such rewards were used to reinforce the distinction between positive and negative text feedback. Now, if we step from this paper for the moment and think about intelligent agents in general, there could be different ways to generate such reward signals. When interacting with humans the agents can analyze non-verbal information of humans and use them as reward signals for similar systems. For example, even tone of human voice can be a great resource of such information.
In addition to reward based learning, the model attempts to predict a possible answer based on the previous answers. This mode is briefly described but it looks like selection of the most possible answer resembles a classifier for the learning agent answers that augments the MemN2N. Therefore, the resulting answer will use information from both the learning agent and the expert.
- Weston, Jason E. “Dialog-based language learning.” In Advances in Neural Information Processing Systems, pp. 829-837. 2016.
- Andor, Daniel, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. “Globally normalized transition-based neural networks.” arXiv preprint arXiv:1603.06042 (2016).
- Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. “End-to-end memory networks.” In Advances in neural information processing systems, pp. 2440-2448. 2015.
- Weston, Jason, Sumit Chopra, and Antoine Bordes. “Memory networks.” arXiv preprint arXiv:1410.3916 (2014).