It’s no secret that the most successful businesses, stores, and restaurants with a loyal customer base do a great job of tailoring the experience to each individual consumer. Sure, there’s a convenience factor, but nothing can replace the feeling you get when the server at your favorite restaurant remembers your “usual.” Or when a store associate picks up on your style preferences and makes recommendations that are just right. Those are the experiences that keeps you coming back. They know who you are, tune into your preferences, and offer suggestions that are well suited to what you like. That’s what personalization is all about.
When building a personalization engine, a simple approach can be to recommend the most liked items by all the customers. This way of recommendation is fast and simple, but not personalized. In simpler terms, most popular product will be same for all the customers, and not specific to one. There exists many tailor-made algorithms for solving such recommendation scenarios such as Content-based filtering and Collaborative filtering. But all of them has one notion in common, defining the “similarity”.
- Content-based filtering is based on the similarity of items being recommended. The idea is if you like an item then you will like another item “similar” to the one.
- Collaborative filtering is entirely based on the past behavior and not the context. If a customer A likes items 1, 2, 3 and customer B likes 2, 3, 4 then they have similar interests, and customer A should like item 4 and customer B should like item 1.
But building a personalization engine can not just be built from standard algorithms, with our having to have mix of our own interests and requirements. For our personalization system, we applied collaborative filtering on item-categories rather than individual items, and each category is thus populated with the most popular items respectively. It was a hybrid-model of collaborative filtering and content-based filtering. This helped enhance the computations significantly and resolved the problem with new customer onboarding.
Before providing the personalized results to customers, there was pre-hand computations made for calculating the “similarity”/“affinity” between various categories. A matrix between the categories and customers was computed, Customer-Category Matrix, representing if a customer has ever purchased an item from that category.
This Customer-Category matrix can not be used to calculate the similarity between categories. Each column represents a category vector. Thus, the problem boils down to finding the closeness of two given category vectors, or how similar the two columns from the matrix are.
We used Jaccard Similarity to define how close two category vectors are to each other. Advantage of using Jaccard Similarity is that it only considers non-sparse data i.e. it won’t consider cases where customer has not purchased from both categories in consideration. We say two categories are similar if the similarity score between them is greater than a specified threshold. For more details refer here.
Once Jaccard Similarty formula is applied throughout the categories, we store the results in a separate Category-Category table. This table holds the information of affinities/similarities between various categories. This table can thus be updated every half an hour with new interactions from the customers. For any customer, personalized results can now be deduced from this table and showed the items from a similar category of items, customer bought in past. For a new customer, categories with most popular items can be shown and similar categories can thus be retrieved again from the same table.
Any suggestions or thought, let me know:
Insta + Twitter + LinkedIn + Medium | @shivama205