# NIPS 2016

So, NIPS 2016, the record-breaking NIPS with more than 6000 attendees, the massive recruiting event, the densest collection of great men with huge egos, whatever you call it.

I gotta write about this. Maybe several hundreds of people will also write something about NIPS, so I would start with something personal, before going to the usual march through papers and ideas, you know…

One of the cool things about this NIPS is I got to listen directly to the very men who taught me so many things during the last several years. Hearing Nando de Freitas talking on stage, I could easily recall his voice, the accent when he says thee-ta ($\theta$) was so familiar. Listening to Rajesh Rao talking, I couldn’t help recalling the joke with the adventurer hat he made in order to “moisturise” his Neuroscience lectures. Sorry professor, nice try, but the joke didn’t quite work.

And of course, Yoshua Bengio with his usual hard-to-be-impressed style (although he hasn’t changed much since last time we talked). Also Alex Graves whose works wowed me so many times.

One of the highlights of the days was Jurgen Schmidhuber, with his deep, machine-generated voice, told a deep joke. The joke goes like this:

Three men were sentenced to death because of the invention of technology that causes mass unemployment in some certain industries. They were a French guy named LeCun, a British guy named Hinton and a German guy named Schmidhuber.

Before the execution, the Death asked them: “Any last word?”
– The French guy said: Je veux … (blah blah, in French, I couldn’t get it)
– The German guy: I want to give a final speech about the history of Deep Learning!
– The British guy: Please shoot me before Schmidhuber gives his goddamn speech!

As some of my friends put it: when he can make a joke about himself, probably he is still mentally healthy (pun intended).

But the best part about NIPS is I got to meet some of my long-time friends. Some of them are from France, who gave me first hands-on experience on Deep Learning, and taught me so many things that I still remember today. They are probably the most important mentors of my career. Also some other friends from the other part of the world.

The not-so-good part about NIPS is that most of the important papers are available on arxiv several months ago, and people come to the conference presenting their even newer results. But I guess this is a good-to-have problem.

That’s quite enough of bullshiting. Here come the ideas I think will be important in the next several years of Deep Learning. I won’t do a march through the papers, you can do it yourself.

• End-to-end differentiable models: There is a symposium at NIPS dedicated to discussing the next frontiers in Recurrent nets. I am most fascinated with the Differentiable Neural Computers (DNC) by Alex Graves. As usual, the DNC seem doesn’t have any practical use-case yet, but people said the same about his Memory Networks Neural Turing Machines some years ago, and now Memory Networks, a very similar model to Neural Turing Machines, is being used for most of the works in Question Answering, and maybe others.
Taking a step back, I am not even sure Backprop is the right algorithm (more on this later), so the idea of making everything differentiable, although seems logical, might eventually hit some unseen blockers, and most certainly is not the Holy Grail of DL yet. Alex Graves also acknowledges this, but he said maybe we should push this idea to the limit and sees what happens, to which I kinda agree, especially when we don’t really have anything more efficient than Backprop.
Also from Alex Graves is the somewhat unnoticed work on Adaptive Computation Time. I found this particularly interesting.
RNN is also used heavily in other lines of work like Metalearning (more on this later).
• Deep RL: I have seen some lectures talking about the original concepts of RL, without all the hypes with Deep Learning. People borrow ideas from RL for many DL problems, and I think this trend will last for some time.
• GANs: everybody is talking about GANs, all kinds of variations of GANs were presented, the GAN workshop was super crowded. The idea of doing generative modelling without MCMC is a fascinating one, especially when the results of GANs are so realistic.
Personally, I feel the way GANs were formulated (seemingly inspired by Adversarial examples) is somewhat misleading. I feel it has stronger connection to the Actor-Critic model in RL than to Adversarial examples. Anyway, in academia, reformulating your problem is not a bad idea, and often a good skill to have. The future of GANs might be promising if you are shooting for a paper with applications.
Along the line of generative models, PixelCNN and WaveNet also seem interesting.
• Metalearning: or so-called Learning-to-learn, notably the work by Nando de Freitas, is not really new, but I think it opens a new horizon, although whether it matches the expectation is something questionable.
• Yoshua Bengio’s works. As usual, works done by Yoshua Bengio belong to their own kind. Besides the GSN (which I think still has some original ideas), TargetProp also seem promising. It was funny when he finished the presentation on TargetProp at NIPS, and he asked: “Questions, please?”, and the crowd was silent for 5 seconds, until he said: “It took me a lot of time to absorb what I’ve just presented, so I understand”.
I like the way those works try to step beyond Backprop (while many people, apparently including Geoff Hinton, believe the brain does something similar to Backprop), and I will be interested to see the follow-ups of this line of works.
• Machine Learning and Neuroscience: This is what NIPS is all about. I was amazed when I saw Rajesh Rao talking about POMDPs, and it is not even new, they have been working on it for several years already. On the other side, we have works from Yoshua Bengio entitled “Toward Biologically Plausible DL”. At the “Brain and Bits” workshop, Yoshua Bengio also said he would like to see more works in Neuroscience that help understanding how the brain learns. But of course that is difficult, and may take many years before any significant breakthrough. People are still debating what it means to “understand how the brain works”, so I don’t see a near future for this, but the future might hold big surprises.
On this line of thought, I think there are two major questions. First, whether the brain does backprop? For this I am quite hesitant. Intuitively, we wouldn’t believe that the brain does backprop, but hey, it is the best thing we have so far.
Second, whether there exists a single objective function that the brain optimizes? At the highest level of abstraction, maybe yes (we try our best to survive, don’t we?), but at the level that we can do something about, then I think almost certainly no. And this is a problem, because the way we train our model is always optimizing some objective function (which, by the way, is the best thing we know so far).
Anyway, I think I need to learn at least 10 more years to have an informed opinion on this.

And then you also have a bunch of young people at NIPS, which makes it less boring. For an academic conference, 6000 attendees were certainly an amazing number.