On weight sharing in Deep Net

Today I learned how to explain weight sharing to a Deep Learning noob.

Going through the basic, fundamental materials over and over again often reveals interesting insights that you might never be aware of. This is such an example.

The whole point of many pattern recognition problems is about recognizing invariants. The same object that appears in different locations in an image (so-called shift invariant), the same word appears at different location in many sentences but bearing the same meaning, etc.. are all invariants, and you want your model to learn those invariants efficiently, without having to add more capacity to it.

In the Deep Learning way, it is achieved by weight sharing:

  • In Convolutional nets, weight sharing is achieved by the convolutional filters.
  • In text processing, weight sharing manifests itself as word/character embeddings.
  • In sequential data, weight sharing between the consecutive steps in a sequence is what we call recurrent neural nets.

Weight sharing doesn’t only help the model to be more parameter-efficient, but also make learnings to be somewhat easier, because the model can reuse what it learned in different contexts.

Now, this might sound obvious to many of us, but probably it took the community quite a bit of time to come up with the current understanding.

People take for granted so many amazing ideas in Deep Learning nowadays, which by itself is pretty amazing.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s