loading page

Parallel Taxonomy Discovery
  • Shalin Shah
Shalin Shah
Target Corporation

Corresponding Author:[email protected]

Author Profile


Recommender systems aim to personalize the shopping experience of a user by suggesting related products, or products that are found to be in the general interests of the user. The information available for users and products is heterogenous, and many systems use one or some of the information. The information available include the user’s interactions history with the products and categories, textual information of the products, a hierarchical classification of the products into a taxonomy, user interests based on a questionnaire, the demographics of a user, inferred interests based on product reviews given by a user, interests based on the physical location of a user and so on. Taxonomy discovery for personalized recommendation is work published in 2014 which uses the first three information sources { the user’s interaction history, textual information of the products and optionally, an existing taxonomy of the products. In this paper, we describe a parallel implementation of this approach on Apache Spark and discuss the modifications to the algorithm in order to scale it to several hundreds of thousands of users with a large inventory of products at Target corporation. We run experiments on a sample of users and provide results including some sample recommendations generated by our parallel algorithm.