Today I learned how to explain weight sharing to a Deep Learning noob.
Going through the basic, fundamental materials over and over again often reveals interesting insights that you might never be aware of. This is such an example.
The whole point of many pattern recognition problems is about recognizing invariants. The same object that appears in different locations in an image (so-called shift invariant), the same word appears at different location in many sentences but bearing the same meaning, etc.. are all invariants, and you want your model to learn those invariants efficiently, without having to add more capacity to it.
In the Deep Learning way, it is achieved by weight sharing:
- In Convolutional nets, weight sharing is achieved by the convolutional filters.
- In text processing, weight sharing manifests itself as word/character embeddings.
- In sequential data, weight sharing between the consecutive steps in a sequence is what we call recurrent neural nets.
Weight sharing doesn’t only help the model to be more parameter-efficient, but also make learnings to be somewhat easier, because the model can reuse what it learned in different contexts.
Now, this might sound obvious to many of us, but probably it took the community quite a bit of time to come up with the current understanding.
People take for granted so many amazing ideas in Deep Learning nowadays, which by itself is pretty amazing.