Comparison of Three Recent Personalization Algorithms

Personalization algorithms recommend products to users based on their previous interactions with the system. The products could be books, movies, or products in a retail system. The earliest personalization algorithms were based on factorization of the user-item matrix where each entry in the matrix would correspond to an interaction, or absence of an interaction of the user with the product. In this article, we compare three recently developed personalization algorithms. The three algorithms are Bayesian Personalized Ranking, Taxonomy Discovery for Personalized Recommendations and Multi-Matrix Factorization. We compare the three algorithms on the hit rate @ position 10 on a held out test set on 1 million users and 200 thousand items in the catalog of Target Corporation. We report our ﬁndings in table 1. We develop all three algorithms on an Apache Spark parallel implementation.


Introduction
Personalization systems are very important to the performance of a retailer in that it can improve the experience of a user by recommending items that the user would be interested in. The simplest and first personalization algorithm was based on factorizing the user-item matrix, also called matrix completion [1] [2]. These algorithms learn embeddings or latent factors of users and items and generate recommendations based on a similarity between the user and item latent factors. In an item-item recommender system, embeddings of items are learnt and recommendations are similar items to the item being viewed. The embeddings of items can be learnt through neural network based algorithms like Bayesian Personalization Ranking (BPR) [5], Taxonomy Discovery for Personalized Recommendations (Taxonomy) [6] and Multi-Matrix Factorization (MMF) [7] are three algorithms that learn the latent factors of users and items based on their respective loss functions. We compare the three algorithms based on a hit rate and report our findings in table 1. We find that MMF performs the best of the three algorithms. The next section describes the three algorithms briefly.

Taxonomy
Taxonomy Discovery for personalized recommendations [6] is work that learns the embeddings of users and items by alternating between latent factor updates and learning a taxonomy over the items based on the textual descriptions of the items. The algorithm can start with an existing taxonomy or it can learn a completely new taxonomy.
The algorithm also uses a BPR cost function.
The probability of preferring item i against item j is then: Where v u is the user's latent factor and v i and v j are the item latent factors, where the user prefers item i to item j.
Each node and item share information with other nodes and items under the same parent in the hierarchy: Where v πi is the latent factor of the parent of item i in the taxonomy. q i is a bias term which is modeled as a zero mean normal.
The cost function to learn the latent factors is as follows: The first term of the cost function is the sigmoid of the difference between the dot products of an interacted item against a non-interacted item. The second and third terms can be called regularization terms that ensure that nodes and parents have similar latent factors in the entire path in the taxonomy. The other two terms are bias terms that control for excessively popular items.
We implement the algorithm on Apache Spark as described in [8].

BPR
The Bayesian Personalized Ranking method [5] uses a similar cost function: The probability of a user, preferring item i against item j is: We also implement this on Apache Spark based on an implementation described in [9].

MMF
The Multi-Matrix Factorization approach [7] learns latent factors of users and attributes. These attributes are properties of items like size, color etc. The algorithm also learns the preferences of users for attributes, and also how important each attribute is, for an item.
The predicted rating of an item is as follows: Where M j is the set of attributes in an item, w ik is the preference of the user i on the attribute k, θ jk is the weight of attribute k in item j. u i is the latent factor of user i and f k is the latent factor of attribute k.
Similar to matrix factorization, this work uses the L 2 loss to optimize and learn these variables.
We implement this method on a distributed cluster using Apache Spark.

Results
We use a sample of 1 million users and 200 thousand items using 1 week of data from online interactions on Target.com. We hold a 40% of items as a test set to measure the hit rate. The results are in table 1. MMF clearly outperforms the other two algorithms. We developed a distributed implementation of all three algorithms on Apache Spark.
A hit is when an item preferred by a user occurs in the top 10 of all items sorted by the predicted ratings. The mean of the hits of all users is then the hit rate, which is shown in table 1. The hit rate is calculated on the held out test set.
In this article, we compared three recent algorithms for personalization. All three learn latent factors of the variables and use predicted ratings as a way of recommending items in the interests of a user. We implement all three algorithms on Apache Spark on a sample of 1 million users and 200 thousand items. We find that MMF clearly outperforms the other two algorithms. MMF has more parameters and learns the preferences of a user for attributes, rather than items. It then generates ratings for unobserved pairs of users and items based on an aggregate of all attributes of an item. Future work could be to run our algorithms on a much larger set of users and items and see if the results carry forward. Also, experiments could be conducted by increasing or decreasing the dimension of the latent factors (we use 50 for all three implementations).