This is the slides I used for a talk I did recently in our reading group. The slides, particularly the Attention part, was based on one of Quoc Le’s talks on the same topic. I couldn’t come up with any better visual than what he did.
It has been quite a while since the last time I look at this topic, unfortunately I never managed to fully anticipate its beauty. Seq2Seq is one of those simple-ideas-that-actually-work in Deep Learning, which opened up a whole lot of possibilities and enabled many interesting work in the field.
A friend of mine did Variational Inference for his PhD, and once he said Variational Inference is one of those mathematically-beautiful-but-don’t-work things in Machine Learning.
Indeed, there are stuff like Variational, Bayesian inference, Sum-Product Nets etc… that come with beautiful mathematical frameworks, but don’t really work at scale, and stuff like Convolutional nets, GANs, etc.. that are a bit slippery in their mathematical foundation, often empirically discovered, but work really well in practice.
So even though many people might not really like the idea of GANs, for example, but given this “empirical tradition” in Deep Learning literature, probably they are here to stay.