MachineX: Demystifying Market Basket analysis

栏目: IT技术 · 发布时间: 5年前

内容简介：In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by

Reading Time: 7 minutes

In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.

MachineX: Demystifying Market Basket analysis — source: oracle.com

Introduction to Market Basket analysis

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

The approach is based on the theory that customers who buy a certain item are more likely to buy another specific item ).

For example, People who buy Bread usually buy Butter too. The Marketing teams at retail stores should target customers who buy bread and butter and provide an offer to them so that they buy the third item, like eggs.

So if customers buy bread and butter and see a discount or an offer on eggs, they will be encouraged to spend more and buy the eggs. This is what market basket analysis is all about.

This is just a small example. So, if you take 10000 or 20000 items data of your Supermart to a Data Scientist, Just imagine the number of insights you can get. And that is why Association Rule mining is so important.

Real-life application

Market basket analysiscan also be used to cross-sell products. Amazon famously uses an algorithm to suggest items that you might be interested in, based on your browsing history or what other people have purchased.

A well known urban legend is that a supermarket, in the wake of running a business sector bushel examination, found that men were probably going to purchase brew and diapers together. Deals expanded deals by putting lager alongside the diapers.

It sounds straightforward (and much of the time, it is). Be that as it may, entanglements to know about:

For huge inventories (for example more than 10,000), the mix of things may detonate into the billions, making the math practically outlandish.
Information is regularly mined from enormous exchange chronicles. A lot of information is normally taken care of by particular measurable programming

Association Rule Mining

Association Rule Miningbasically used when we have to find an association between objects in a given set or to find some hidden pattern in any piece of Information.

Market Basket Analysisor Basket Data Analysis in retailing or clustering are some applications of Association Rule Mining.

The most widely Used way to deal with these examples is Market Basket Analysis. This is a key system utilized by many big companies in the retail sector like Amazon, Flipkart, and so forth to break down users of purchasing behavior by identifying the relationship between the various things that users place in their “shopping containers”. The revelation of these affiliations can assist retailers with creating advertising procedures by picking up knowledge into which things are as often as possible acquired together by clients. The methodologies may include:

Changing the store layout according to trends
Cross marketing on online stores
What are the trending items customers buy
Customized emails with add-on sales
Customer behavior analysis
Catalog design

Note: There is a lot of confusion in everyone’s mind regarding the similarity between Market Basket Analysis and Recommendation Systems

Difference between Association and Recommendation

As already discussed, the Association rules do not work on an individual’s preference. It always finds the relation between some sets of elements of every transaction. This makes them totally different than recommendation system method called Collaborating filtering.

If you want to learn about the recommendation system, you can go through my previous blog Recommendation Engines .

Example:

To understand it better take a look at below snapshot from amazon.com. You notice 2 headings “Frequently Bought Together” and the “Customers who bought this item also bought” on each product’s info page.

Frequently Bought Together → Association Customers who bought this item also bought → Recommendation

So this was the difference between association rules and recommendations.

Now, let’s talk about one of the main association Machine learning algorithms. ie. Apriori Algorithm

Apriori Algorithm

Let assume that we have a transaction containing a set {Banana, Pineapple, mango} also contain another set {Banana, Mango}. So, according to the principle of Apriori, if {Banana, Pineapple, Mango} is frequent, then {Banana, Mango} must also be frequent.

We have a dataset which is consist of some transactions.

0 -> absence of an item

1-> Presence of an item

In order to find out interesting rules out of multiple possible rules from this small business scenario, we will be using the following matrices:

Support:Support is the popularity(frequency of occurrence) of an item. It can be calculated by a number of transactions containing the item to the total number of transactions. So, if we want to calculate the support for the banana, here it is:

Support(Banana) = (Transactions involving Grapes)/(Total transaction)

Support(Banana) = 0.666

Confidence:Likelihood of occurrence of item B if item A occurs(Conditional Probability).

Confidence(A => B) = (Transactions involving both A and B)/(Transactions involving only A)

Confidence({Banana, Pineapple} => {Mango}) = Support(Banana, Pineapple, Mango)/Support(banana, Pineapple)

= 2/6 / 3/6

= 0.667

Lift:Increase in the ratio of occurence of item B if item A occurs.

Lift(A => B) = Confidence(A, B) / Support(B)

Lift ({Banana, Pineapple} => {Mango}) = 1

So, likelihood of a customer buying both A and B together is ‘lift-value’ times more than the chance if purchasing alone.

Lift (A=> B) = 1 means that there is no correlation within the item set.
Lift (A => B) > 1 means that there is a positive correlation within the itemset, i.e., products in the itemset, A , and B , are more likely to be bought together.
Lift (A => B) < 1 means that there is a negative correlation within the itemset, i.e., products in itemset, A , and B , are unlikely to be bought together.

Implementation

You can get the data from here .

This dataset containing transaction data of a store with various products.

Install apyori package before importing the library

conda install --yes apyori
OR
pip3 install --yes apyori

Import the packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

we have imported all the necessary libraries:

NumPy and pandas for basic operations
Matplotlib for data visualization
apyori for our data modeling

Import the data

store_data = pd.read_csv("store_data.csv",header = None)

we have read the dataset using pandas into a data frame with the name “store_data”. Now let’s see the data

store_data.head()

So, this is our data looks like, it contains all the transaction history of various products.

store_data.shape

7501 indicates the total number of transactions with different items bought together. 20 indicates the number of columns to display items

Data Preprocessing

Since the Apriori library requires our dataset to be in the form of a list of lists. So the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. [ [transaction1], [transaction2], . . [transaction7501] ]

Let’s Convert our pandas’ data frame into a list of lists as follows:

records = []
for i in range(0,7501):
    records.append([str(store_data.values[i,j]) for j in range(0,20)])

Let’s see these transaction sets:

for sets in records:
    print(sets)

Apriori Algorithm

Parameters of apriori:

records : list of lists
min_support : probability value to select the items with support values greater than the value specified by the parameter
min_confidence : probability value to filter rules with greater confidence than the specified threshold
min_lift : minimum lift value to shortlist the list of rules
min_length : minimum number of items you want in your rules

association_rules = apriori(records, min_support = 0.0055, min_confidence = .3, min_lift = 3, min_length = 2)

Convert above rules into a list of rules:

association_results = list(association_rules)

Now let’s see how many rules had been generated by our algorithm:

print(len(association_results))

So, In total, we have 18 rules and those have support, confidence and lift higher than what we expect. Let’s see some of the rules

print(association_results[5])

we can see that rule 5 contains (spaghetti, ground beef, frozen vegetables) which have a good association between them.

Display the list of rules

for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    print("Rule :"+ str(items[0]) + "->" + str(items[1]))
    print("Support : {}".format(item[1]))
    print("Confidence : {}".format(item[2][0][2]))
    print("List : {}".format(item[2][0][3]))
    print("\n-------------------------------------------------\n")

So, this was all about how to implement the apriori algorithm to find associativity in our set of transactions.

Stay Tunes, happy learning

Follow MachineX Intelligence for more:

References

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

MachineX: Demystifying Market Basket analysis

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

产品经理修炼之道

费杰 / 机械工业出版社华章公司 / 2012-7-30 / 59.00元

本书由资深产品经理、中国最大的产品经理沙龙Pmcaff创始人费杰亲自执笔，微软、腾讯、百度、新浪、搜狐、奇虎、阿里云、Evernote等国内外20余家大型互联网企业资深产品经理和技术专家联袂推荐。用系统化的方法论和丰富的实战案例解读了优秀产品经理所必须修炼的产品规划能力、产品设计能力、产品执行能力，以及思考、分析和解决问题的能力和方法，旨在为互联网产品经理打造核心竞争力提供实践指导。全书一......一起来看看《产品经理修炼之道》这本书的介绍吧!

码农工具

MachineX: Demystifying Market Basket analysis

Introduction to Market Basket analysis

Real-life application

Association Rule Mining

Difference between Association and Recommendation

Example:

Frequently Bought Together → Association Customers who bought this item also bought → Recommendation

Apriori Algorithm

Implementation

Import the packages

Import the data

Data Preprocessing

Apriori Algorithm

Display the list of rules

References

产品经理修炼之道

JSON 在线解析

MD5 加密

XML 在线格式化