This is about an interesting lecture on Recommender Systems at Linkedin. It turns out that they cast the recommendation problem into a binary classification problem, and they actually use logistic regression for the classifier. To recommend a job to a user, they use all features computed from job description, user profile, company profile and user’s connections. All those features are the input of logistic regression. The output probability given by logistic regression is treated as the confidence measure and used for ranking. Pretty neat, huh.
The rest of their efforts are mostly feature engineering. Since the text is unconstrained, they have to deal with ambiguous text, entity resolution (IBM might stand for International Business Machines or International Brotherhood of Magicians)… They also use classical techniques like inverted indices to index the companies, tf-idf measure for matching…
Scalability is also a big issue. They have more than 2 millions companies (at 2012), hence matching every user with every company is infeasible. They did this by exploiting the network of user’s connection, where the companies at the first-order connection with the user are considered first. Of course this might significantly reduce the range of the company which are introduced to the user, but I guess there is no better way. Nonetheless, for a normal LinkedIn user, the moral is that you should connect to as many companies as possible, so you can be kept updated about their job ads on LinkedIn. Of course this is when LinkedIn’s business comes into the game. If a company wants to reach more candidates than its own first-order connections, then it must pay for LinkedIn. But that is another story.
For feature engineering, they use quite a lot of complicated models for text processing, entity extraction etc… They even mentioned CRF for doing that. This once again confirms that 90% of ML efforts in the industry is dedicated for feature engineering. When you have a good set of features, then a simple model like logistic regression would do the job.
Some other nice topics were also briefly presented at the end of the presentation. Check it out!