Short review: Infinite Dimensional Word Embeddings

Paper: Infinite Dimensional Word Embeddings, E. Nalisnick, S. Ravi, 2015

The paper describes Infinite Skip-Gram (iSG) model that can learn the number of dimensions for word embeddings rather than used a fixed dimensions value. This approach is inspired by the idea of infinite dimensions in Infinite Restricted Boltzman Machine(iRBM), that was introduced earlier this year by Cote, Larochelle. The infinite size is possible because the layer size is a part of the energy function of iRBM. The iSG model may be capable of representing different word meanings by using different dimensions, with the number of dimensions growing when necessary.

Different word meanings in the model require information about the context, that is stored in context vectors. In order to utilize these vectors, the model use expected inner product as a similarity. This function is supposed to perform better than cosine similarity, that is used by competitors based on Skip-Gram (SG) mode which do not use context vectors. One of such competitors is a popular Word2Vec model of word embeddings.

The experiments are rather limited and indicate that the iSG seems to have advantage over regular Skip-Gram in modeling words ambiguity. Additionaly, iSG have advantage in predicting context words. The improvements can be explained by the utilization of context vectors in the model. It worth noting that, the dimensions number for both iSG and SG models were close (exact match is challenging because the model decides the dimensions size itself) even though iSG incorporates context vectors. This may indicate higher efficiency in utilizing the available vector space. Unfortunately, the paper does not evaluate the model with competitors on common tasks e.g. word similarity but the authors state the plans to add more experimental results in the future. Regardless, the idea may be an interesting alternative to word embedding models such as Word2Vec.