内容简介:In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by
Reading Time: 7 minutes
In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.
Introduction to Market Basket analysis
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
The approach is based on the theory that customers who buy a certain item are more likely to buy another specific item ).
For example, People who buy Bread usually buy Butter too. The Marketing teams at retail stores should target customers who buy bread and butter and provide an offer to them so that they buy the third item, like eggs.
So if customers buy bread and butter and see a discount or an offer on eggs, they will be encouraged to spend more and buy the eggs. This is what market basket analysis is all about.
This is just a small example. So, if you take 10000 or 20000 items data of your Supermart to a Data Scientist, Just imagine the number of insights you can get. And that is why Association Rule mining is so important.
Real-life application
Market basket analysiscan also be used to cross-sell products. Amazon famously uses an algorithm to suggest items that you might be interested in, based on your browsing history or what other people have purchased.
A well known urban legend is that a supermarket, in the wake of running a business sector bushel examination, found that men were probably going to purchase brew and diapers together. Deals expanded deals by putting lager alongside the diapers.
It sounds straightforward (and much of the time, it is). Be that as it may, entanglements to know about:
- For huge inventories (for example more than 10,000), the mix of things may detonate into the billions, making the math practically outlandish.
- Information is regularly mined from enormous exchange chronicles. A lot of information is normally taken care of by particular measurable programming
Association Rule Mining
Association Rule Miningbasically used when we have to find an association between objects in a given set or to find some hidden pattern in any piece of Information.
Market Basket Analysisor Basket Data Analysis in retailing or clustering are some applications of Association Rule Mining.
The most widely Used way to deal with these examples is Market Basket Analysis. This is a key system utilized by many big companies in the retail sector like Amazon, Flipkart, and so forth to break down users of purchasing behavior by identifying the relationship between the various things that users place in their “shopping containers”. The revelation of these affiliations can assist retailers with creating advertising procedures by picking up knowledge into which things are as often as possible acquired together by clients. The methodologies may include:
- Changing the store layout according to trends
- Cross marketing on online stores
- What are the trending items customers buy
- Customized emails with add-on sales
- Customer behavior analysis
- Catalog design
Note: There is a lot of confusion in everyone’s mind regarding the similarity between Market Basket Analysis and Recommendation Systems
Difference between Association and Recommendation
As already discussed, the Association rules do not work on an individual’s preference. It always finds the relation between some sets of elements of every transaction. This makes them totally different than recommendation system method called Collaborating filtering.
If you want to learn about the recommendation system, you can go through my previous blog Recommendation Engines .
Example:
To understand it better take a look at below snapshot from amazon.com. You notice 2 headings “Frequently Bought Together” and the “Customers who bought this item also bought” on each product’s info page.
Frequently Bought Together → Association Customers who bought this item also bought → Recommendation
So this was the difference between association rules and recommendations.
Now, let’s talk about one of the main association Machine learning algorithms. ie. Apriori Algorithm
Apriori Algorithm
Let assume that we have a transaction containing a set {Banana, Pineapple, mango} also contain another set {Banana, Mango}. So, according to the principle of Apriori, if {Banana, Pineapple, Mango} is frequent, then {Banana, Mango} must also be frequent.
We have a dataset which is consist of some transactions.
0 -> absence of an item
1-> Presence of an item
In order to find out interesting rules out of multiple possible rules from this small business scenario, we will be using the following matrices:
Support:Support is the popularity(frequency of occurrence) of an item. It can be calculated by a number of transactions containing the item to the total number of transactions. So, if we want to calculate the support for the banana, here it is:
Support(Banana) = (Transactions involving Grapes)/(Total transaction)
Support(Banana) = 0.666
Confidence:Likelihood of occurrence of item B if item A occurs(Conditional Probability).
Confidence(A => B) = (Transactions involving both A and B)/(Transactions involving only A)
Confidence({Banana, Pineapple} => {Mango}) = Support(Banana, Pineapple, Mango)/Support(banana, Pineapple)
= 2/6 / 3/6
= 0.667
Lift:Increase in the ratio of occurence of item B if item A occurs.
Lift(A => B) = Confidence(A, B) / Support(B)
Lift ({Banana, Pineapple} => {Mango}) = 1
So, likelihood of a customer buying both A and B together is ‘lift-value’ times more than the chance if purchasing alone.
- Lift (A=> B) = 1 means that there is no correlation within the item set.
- Lift (A => B) > 1 means that there is a positive correlation within the itemset, i.e., products in the itemset, A , and B , are more likely to be bought together.
- Lift (A => B) < 1 means that there is a negative correlation within the itemset, i.e., products in itemset, A , and B , are unlikely to be bought together.
Implementation
You can get the data from here .
This dataset containing transaction data of a store with various products.
Install apyori package before importing the library
conda install --yes apyori OR pip3 install --yes apyori
Import the packages
import numpy as np import pandas as pd import matplotlib.pyplot as plt from apyori import apriori
we have imported all the necessary libraries:
- NumPy and pandas for basic operations
- Matplotlib for data visualization
- apyori for our data modeling
Import the data
store_data = pd.read_csv("store_data.csv",header = None)
we have read the dataset using pandas into a data frame with the name “store_data”. Now let’s see the data
store_data.head()
So, this is our data looks like, it contains all the transaction history of various products.
store_data.shape
7501 indicates the total number of transactions with different items bought together. 20 indicates the number of columns to display items
Data Preprocessing
Since the Apriori library requires our dataset to be in the form of a list of lists. So the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. [ [transaction1], [transaction2], . . [transaction7501] ]
Let’s Convert our pandas’ data frame into a list of lists as follows:
records = [] for i in range(0,7501): records.append([str(store_data.values[i,j]) for j in range(0,20)])
Let’s see these transaction sets:
for sets in records: print(sets)
Apriori Algorithm
Parameters of apriori:
- records : list of lists
- min_support : probability value to select the items with support values greater than the value specified by the parameter
- min_confidence : probability value to filter rules with greater confidence than the specified threshold
- min_lift : minimum lift value to shortlist the list of rules
- min_length : minimum number of items you want in your rules
association_rules = apriori(records, min_support = 0.0055, min_confidence = .3, min_lift = 3, min_length = 2)
Convert above rules into a list of rules:
association_results = list(association_rules)
Now let’s see how many rules had been generated by our algorithm:
print(len(association_results))
So, In total, we have 18 rules and those have support, confidence and lift higher than what we expect. Let’s see some of the rules
print(association_results[5])
we can see that rule 5 contains (spaghetti, ground beef, frozen vegetables) which have a good association between them.
Display the list of rules
for item in association_results: pair = item[0] items = [x for x in pair] print("Rule :"+ str(items[0]) + "->" + str(items[1])) print("Support : {}".format(item[1])) print("Confidence : {}".format(item[2][0][2])) print("List : {}".format(item[2][0][3])) print("\n-------------------------------------------------\n")
So, this was all about how to implement the apriori algorithm to find associativity in our set of transactions.
Stay Tunes, happy learning
Follow MachineX Intelligence for more:
References
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。