TechRxiv
Shah - 2020 - Parallel Taxonomy Discovery.pdf (5.12 MB)

Parallel Taxonomy Discovery

Download (5.12 MB)
preprint
posted on 04.12.2020, 21:19 by Shalin Shah

Recommender systems aim to personalize the shopping experience of a user by suggesting related products, or products that are found to be in the general interests of the user. The information available for users and products is heterogenous, and many systems use one or some of the information. The information available include the user's interactions history with the products and categories, textual information of the products, a hierarchical classification of the products into a taxonomy, user interests based on a questionnaire, the demographics of a user, inferred interests based on product reviews given by a user, interests based on the physical location of a user and so on. Taxonomy discovery for personalized recommendation is work published in 2014 which uses the first three information sources { the user's interaction history, textual information of the products and optionally, an existing taxonomy of the products. In this paper, we describe a parallel implementation of this approach on Apache Spark and discuss the modifications to the algorithm in order to scale it to several hundreds of thousands of users with a large inventory of products at Target corporation. We run experiments on a sample of users and provide results including some sample recommendations generated by our parallel algorithm.

History

Email Address of Submitting Author

sshah100@jhu.edu

ORCID of Submitting Author

0000-0002-3770-1391

Submitting Author's Institution

Target Corporation

Submitting Author's Country

United States of America

Licence

Exports