Well, it really is! The re-producible results which can be applied to various domains make deep learning really sexy. But it is also exciting due to a whole bunch of new ideas people are working on. I try to put here just some of them, but I believe there are much more than this:
- Hessian-Free optimization. It is well-known that gradient descent is pesky, but this algorithm has been dominated the neural network community until James Marten’s recent publication, in which he used some more advanced optimization techniques (and lots of tricks) to train deep architectures. He even used it for RNN, and people are fast enough to develop the stochastic version of that. Some other dudes are even faster, they implemented it on GPU and provide deep-learning-as-a-service.
- Unsupervised learning (i.e. Feature learning). This is indeed a big movement. The breakthrough dates back to Hinton’s 2006 paper on Restricted Boltzmann Machine (RBM) with Contrast Divergence. It is now widely understood that careful weight initialization is critical in deep architecture, and layer-by-layer RBM training will allow neural networks to be as deep as needed. This is then extended to become Deep Belief Network (DBN) and Deep Boltzmann Machine (DBM).
Yann LeCun has been working on something called Predictive Sparse Coding (PSD)and Andrew Ng is famous for the giant network that automatically detects a cat face (which is featured on NYT). He also seems to be working on combining deep learning and K-means (!)
- Hyper-parameter optimization. Training neural networks might require significant expertise to deal with bunch of hyper-parameters, ranging from network architecture (number of layers/hidden units…) to optimization parameters (learning rate, momentum and so on). But it comes to a point that we can let the machine do the boring job of tuning hyper-parameters, by using the so-called Bayesian optimization. This techniques has been proved to be efficient in deep architectures.
- Dropout. This is a kinda new technique proposed by Hinton. Although his official paper about dropout was rejected, but this is really a cool thing. It significantly improves lots of deep architectures, including ConvNN (ImageNet2012), feed forward networks and deep belief network. People, again, are fast enough to show that dropout is effective also on deep architecture trained with HF optimizer.
- Scalability. Well, we have deepnet, which provides an easy-to-use GPU implementation of ConvNN, feed-forward net, RBM, DBM and DBN.
Andrew Ng and dudes at Google also worked on something called downpour SGD, which is an implementation of gradient descent running on 1000 servers (each has 16 cores). Yeah, I know, it is Google style (and the funny name comes after the idea of 16000 CPU cores are running gradient descent. Downpour SGD, quite legit).Some other dudes are planning to provide deep-learning-as-a-service. This is really a fancy thing to have, since not everyone has enough computational power for training deep models. I will keep an eye on this dude.
- A new conference is founded! The International Conference on Deep Learning and Representation Learning (ICLR) is organized by Y. Bengio and Yann LeCun, and might become a new source of inspiration for deep learning. They have a good publication model to further speedup the development of the field.
As a side note, the ImageNet 2013 competition has been launched! This time it comes with a new object detection task similar in style to PASCAL VOC, in which competitors have to precisely detect the location (bounding box) of the interested objects in the image. It is funny to see how people in deep learning community will adapt their network to work on this challenge, if it is applicable at all!
Finally, this is a critical comment on the deep learning movement, made by a psychology professor. But it is really nothing new. We still can expect some (significant) advancements in this field in probably the next 10 years.