Using NLP to Create a Recommender System

In the article Using Scikit-Surprise to Create a Simple Recipe Collaborative Filtering Recommender System we developed the simplest recommender system using the scikit-surprise package and saw how to use the built-in algorithms it contains, such as KNN or SVD.

I’d like to take my recommender systems practice a step further and attempt to create my own prediction algorithm. Surprise allows you to override its core classes and methods in order to tailor your own algorithm and try to improve the recommender system’s outcomes, or at the very least get it closer to what you want from your own recommender system. It’s important to remember that recommender systems aren’t only about accuracy; they’re also about knowing the recommendations you want to make to your clients, which can differ from one company to the next.

The only good metrics for recommender systems are user tests to see how they react to your recommendations, so in this post, I’ll focus on building my own recommender system to make recommendations of recipes that are similar in content to the ones the users have rated previously (a Content-Based recommender system).

We’ll utilize the content of the recipe collection to determine the degree of similarity. We may assess similarity in a variety of ways, but I’d like to use some NLP methods here, so we’ll base our algorithm on the similarity of the recipe text, which includes the title, steps, and description.

The first step is to use WordNet to tokenize and lemmatize the words in the recipes, and then we’ll use TfidfVectorizer to generate a vector from the lemmatized vocabulary and calculate the recipes cosine similarity. Finally, we’ll tweak our Surprise algorithm to find the most similar recipes to a given one and provide recommendations based on them.

The first two sections (data loading and preparation) are identical to those described in our prior post. The creation of the model creation section has new content.