Paper: Training recurrent networks online without backtracking, Y. Ollivier and G. Charpiat, 2015
After working with Recurrent Neural Nets (RNN) I have experienced the problem of online training for such networks. Current solutions to the problem include limiting back propagation on a small number of the last steps. This paper proposes an algorithm that does not require back propagation through time which could be extremely expensive for RNNs. The paper “Gradient-based Hyperparameter Optimization through Reversible Learning” by D. Maclaurin et al., published in February 2015, describes similar idea of using gradient descent on a learning rate by unrolling the previous steps. If you are interested, you can check my short review of that paper.
Important part of the algorithm is a gradient approximation with so called “rank-one trick” where they approximate rank-one outer products of the matrix. With that approximation it is no longer necessary to use large and expensive Jacobian matrix for forward accumulation in automatic differentiation. For better performance the authors also use Kalman filter for online estimation with matrix reduction techniques on covariance matrix estimate. Instead of rank-one matrix reduction in gradient approximation authors suggest the possibility of using higher rank reductions. It should help in further performance optimization by reducing the number of calls of the gradient approximation algorithm. The author estimate the computational cost of the algorithm similar to running the RNN itself. This is important because other solutions typically had higher computational cost.
The paper definitely deserves your attention, especially if your area if interests includes RNN.