So I heard you are fancy trying out ExcelNet? Well in addition to convolutional nets, you might as well run recurrent nets on it!
I went on and made a Google Sheets document for running several kinds of recurrent nets. I used it to debug some of the code I wrote recently, but then I think it might be fun to share it out. The current version only has 2 dimensions in the input, but adding new dimensions should be easy (update the weights and formulas, and you get the idea…).
Since I still might need it every now and then, I made it read-only. But feel free to make a copy and play with it yourself.
Give it a try here and let me know what you think.
On this blog, I used to mention Google’s breakthrough in Machine Translation, in which an “encoder” LSTM-RNN are used to directly “read” a source sentence, translate it into fixed-length feature representation and those features are used to feed another “decoder” LSTM-RNN to produce the target sentence.
The idea can be applied for images, too. We can simply use a DNN to “read” an image, translate it into a fixed-length feature representation which will then be used to feed the “decoder” RNN. The ability of DNN in encoding images into fixed-length feature vectors is almost indisputable, hence this approach is promising.
Without further ado, it simply works, as shown in a recent paper, which is also featured on NYTimes.
Update: This is a similar study on video http://lanl.arxiv.org/pdf/1411.4389.pdf
LSTM-RNN has been the state-of-the-art in handwritten recognition for quite a long time. Now it has just been shown to overpass ReLU DNN in speech recognition, at least on TIMIT. The nice thing about LSTM in this setting is that it is a much smaller architecture with only 2 layers. The paper can be accessed at http://arxiv.org/abs/1402.1128
On other hands, this is some advancement on translating long sentences using neural nets: http://arxiv.org/abs/1409.0473