A neural net telling the story behind every photo

Computer thinks: 'I'm sure it's got a mind of it's own!'

On this blog, I used to mention Google’s breakthrough in Machine Translation, in which an “encoder” LSTM-RNN are used to directly “read” a source sentence, translate it into fixed-length feature representation and those features are used to feed another “decoder” LSTM-RNN to produce the target sentence.

The idea can be applied for images, too. We can simply use a DNN to “read” an image, translate it into a fixed-length feature representation which will then be used to feed the “decoder” RNN. The ability of DNN in encoding images into fixed-length feature vectors is almost indisputable, hence this approach is promising.

Without further ado, it simply works, as shown in a recent paper, which is also featured on NYTimes.

Update: This is a similar study on video http://lanl.arxiv.org/pdf/1411.4389.pdf

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s