Money Tips
HomeAIBuilding a Recommender System From Scratch with Matrix Factorization in Python

Building a Recommender System From Scratch with Matrix Factorization in Python

Published on

spot_img


Building a Recommender System From Scratch with Matrix Factorization in Python
Image by Author | Ideogram

Inroduction

In this article, we will build step by step a movie recommender system in Python, based on matrix factorization. Among the many approaches for building recommender systems that suggest products, services, or content to users based on their preferences and past interactions, matrix factorization stands out as a powerful technique for collaborative filtering, efficiently capturing hidden patterns in user-item interactions from large-scale user and item databases.

Concretely, the tutorial will a Python library called surprise that contains handy implementations of matrix factorization algorithms to build recommender systems. We will also consider the MovieLens 100K datasets: a popular dataset for movie recommendations, ideal for getting familiar with recommender systems from a practical standpoint.

Note: It is recommended that you have some basic knowledge and familiarity with recommender systems concepts and fundamentals prior to beginning this tutorial.

Step-by-Step Process

The first step is to import the necessary libraries and packages. You may need to manually install the surprise library before being able to import it.

We will begin our coding by defining a function to load the MovieLens 100K dataset from the official dataset’s external website. The process entails unpacking the downloaded .zip file.

Next, we proceed to load the data by calling the newly defined function, putting the data into a Pandas DataFrame, and obtaining some basic information about it.

The printed output describes important aspects of the dataset:

As we can observe, the size of this dataset is pretty manageable for illustrative purposes in this tutorial, although real-world applications of matrix factorization would usually entail much larger user and item (e.g. movie) sets.

Now, with the aid of two classes imported from the surprise library, namely Dataset and Reader, we will pack the dataset into a format that will be easily manageable by the library’s implementation of matrix factorization techniques. We do so as follows, and also split the data into training and test sets for model evaluation. Notice the importance of specifying the correct range of numerical ratings in the dataset when initializing the Reader object:

Now we get into real action by initializing, training, and evaluating the matrix factorization model. Concretely, we will use singular value decomposition (SVD), a popular matrix factorization approach whose implementation is provided via surprise’s SVD class. If you are familiar with training machine learning models with scikit-learn, you’ll find the process fairly similar:

In the above SVD model instantiation, n_factors is an important hyperparameter in which we define the desired dimension (in our example, 20) for the latent feature space we will use for building compact user and item vector representations, based on the original data given in the form of an ample yet sparse user-item ratings matrix. For a better understanding of this key process in matrix factorization, be sure to check out this article. Other arguments used are the learning rate (lr_all, 0.01), a regularization parameter (reg_all, 0.01) to prevent overfitting, and the number of training epochs (n_epochs) being set to 20.

Changing any of the described arguments’ values may impact the resulting model performance on the test data, measured by prediction error metrics like RMSE and MAE. In our specific setting, we get:

For a more robust evaluation, we can optionally apply cross-validation:

Trying It Out

Now let’s see our recommender system in action by making some example recommendations. For this, we will first define two more custom functions: one that loads the set of movie titles, and one that, given a user ID and a number N of desired recommendations, will use the trained model to obtain a list of top-N recommended movies for that user, based on her/his preferences reflected in the original ratings data. The latter function is perhaps the most insightful part of the entire code, so we added some inline comments for a better understanding of the process involved.

All that remains is trying out these functions to get real recommendations!

Output:

Wrapping Up

And that’s it! With these steps, we have built our first matrix factorization-based movie recommender and seen it in action. The next steps to further navigate into the intricacies and wonders of recommender system models like this could be visualizing interesting data patterns like rating distributions per user or movie, finding similar movies to each other based on latent factor representations, or visualizing latent factors themselves.



Source link

Latest articles

How MetaMask Simplifies Your Entrance into DeFi Liquidity Pools

Summary MetaMask is a bridge that connects you to the DeFi world and allows...

How To Build A High-Converting Accounting Firm Website

Having an accounting firm website is a great start, but it’s not enough....

Revolutionary Graphene Flash Memory Achieves 400 Picosecond Writes

Revolutionary Graphene Flash Memory Achieves 400 Picosecond WritesRevolutionary Graphene Flash Memory Achieves 400...

More like this

How MetaMask Simplifies Your Entrance into DeFi Liquidity Pools

Summary MetaMask is a bridge that connects you to the DeFi world and allows...

How To Build A High-Converting Accounting Firm Website

Having an accounting firm website is a great start, but it’s not enough....