MachineX: Demystifying Market Basket analysis

栏目: IT技术 · 发布时间: 4年前

内容简介:In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by

Reading Time: 7 minutes

In this blog, we are going to see how we can Anticipate customer behavior with Market Basket analysis By using Association rules.

MachineX: Demystifying Market Basket analysis
source: oracle.com

Introduction to Market Basket analysis

Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

The approach is based on the theory that customers who buy a certain item are more likely to buy another specific item ).

For example, People who buy Bread usually buy Butter too. The Marketing teams at retail stores should target customers who buy bread and butter and provide an offer to them so that they buy the third item, like eggs.

MachineX: Demystifying Market Basket analysis

So if customers buy bread and butter and see a discount or an offer on eggs, they will be encouraged to spend more and buy the eggs. This is what market basket analysis is all about.

This is just a small example. So, if you take 10000 or 20000 items data of your Supermart to a Data Scientist, Just imagine the number of insights you can get. And that is why Association Rule mining is so important.

Real-life application

MachineX: Demystifying Market Basket analysis

Market basket analysiscan also be used to cross-sell products. Amazon famously uses an algorithm to suggest items that you might be interested in, based on your browsing history or what other people have purchased.

A well known urban legend is that a supermarket, in the wake of running a business sector bushel examination, found that men were probably going to purchase brew and diapers together. Deals expanded deals by putting lager alongside the diapers.

It sounds straightforward (and much of the time, it is). Be that as it may, entanglements to know about:

  • For huge inventories (for example more than 10,000), the mix of things may detonate into the billions, making the math practically outlandish.
  • Information is regularly mined from enormous exchange chronicles. A lot of information is normally taken care of by particular measurable programming

Association Rule Mining

Association Rule Miningbasically used when we have to find an association between objects in a given set or to find some hidden pattern in any piece of Information.

Market Basket Analysisor Basket Data Analysis in retailing or clustering are some applications of Association Rule Mining.

The most widely Used way to deal with these examples is Market Basket Analysis. This is a key system utilized by many big companies in the retail sector like Amazon, Flipkart, and so forth to break down users of purchasing behavior by identifying the relationship between the various things that users place in their “shopping containers”. The revelation of these affiliations can assist retailers with creating advertising procedures by picking up knowledge into which things are as often as possible acquired together by clients. The methodologies may include:

  • Changing the store layout according to trends
  • Cross marketing on online stores
  • What are the trending items customers buy
  • Customized emails with add-on sales
  • Customer behavior analysis
  • Catalog design

Note: There is a lot of confusion in everyone’s mind regarding the similarity between Market Basket Analysis and Recommendation Systems

Difference between Association and Recommendation

As already discussed, the Association rules do not work on an individual’s preference. It always finds the relation between some sets of elements of every transaction. This makes them totally different than recommendation system method called Collaborating filtering.

If you want to learn about the recommendation system, you can go through my previous blog Recommendation Engines .

Example:

To understand it better take a look at below snapshot from amazon.com. You notice 2 headings “Frequently Bought Together” and the “Customers who bought this item also bought” on each product’s info page.

Frequently Bought Together → Association Customers who bought this item also bought → Recommendation

MachineX: Demystifying Market Basket analysis

So this was the difference between association rules and recommendations.

Now, let’s talk about one of the main association Machine learning algorithms. ie. Apriori Algorithm

Apriori Algorithm

Let assume that we have a transaction containing a set {Banana, Pineapple, mango} also contain another set {Banana, Mango}. So, according to the principle of Apriori, if {Banana, Pineapple, Mango} is frequent, then {Banana, Mango} must also be frequent.

MachineX: Demystifying Market Basket analysis

We have a dataset which is consist of some transactions.

0 -> absence of an item

1-> Presence of an item

In order to find out interesting rules out of multiple possible rules from this small business scenario, we will be using the following matrices:

Support:Support is the popularity(frequency of occurrence) of an item. It can be calculated by a number of transactions containing the item to the total number of transactions. So, if we want to calculate the support for the banana, here it is:

Support(Banana) = (Transactions involving Grapes)/(Total transaction)

Support(Banana) = 0.666

Confidence:Likelihood of occurrence of item B if item A occurs(Conditional Probability).

Confidence(A => B) = (Transactions involving both A and B)/(Transactions involving only A)

Confidence({Banana, Pineapple} => {Mango}) = Support(Banana, Pineapple, Mango)/Support(banana, Pineapple)

= 2/6 / 3/6

= 0.667

Lift:Increase in the ratio of occurence of item B if item A occurs.

Lift(A => B) = Confidence(A, B) / Support(B)

Lift ({Banana, Pineapple} => {Mango}) = 1

So, likelihood of a customer buying both A and B together is ‘lift-value’ times more than the chance if purchasing alone.

  • Lift (A=> B) = 1 means that there is no correlation within the item set.
  • Lift (A => B) > 1 means that there is a positive correlation within the itemset, i.e., products in the itemset, A , and B , are more likely to be bought together.
  • Lift (A => B) < 1 means that there is a negative correlation within the itemset, i.e., products in itemset, A , and B , are unlikely to be bought together.

Implementation

You can get the data from here .

This dataset containing transaction data of a store with various products.

Install apyori package before importing the library

conda install --yes apyori
OR
pip3 install --yes apyori

Import the packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

we have imported all the necessary libraries:

  • NumPy and pandas for basic operations
  • Matplotlib for data visualization
  • apyori for our data modeling

Import the data

store_data = pd.read_csv("store_data.csv",header = None)

we have read the dataset using pandas into a data frame with the name “store_data”. Now let’s see the data

store_data.head()
MachineX: Demystifying Market Basket analysis

So, this is our data looks like, it contains all the transaction history of various products.

store_data.shape
MachineX: Demystifying Market Basket analysis

7501 indicates the total number of transactions with different items bought together. 20 indicates the number of columns to display items

Data Preprocessing

Since the Apriori library requires our dataset to be in the form of a list of lists. So the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list. [ [transaction1], [transaction2], . . [transaction7501] ]

Let’s Convert our pandas’ data frame into a list of lists as follows:

records = []
for i in range(0,7501):
    records.append([str(store_data.values[i,j]) for j in range(0,20)])

Let’s see these transaction sets:

for sets in records:
    print(sets)
MachineX: Demystifying Market Basket analysis

Apriori Algorithm

Parameters of apriori:

  • records : list of lists
  • min_support : probability value to select the items with support values greater than the value specified by the parameter
  • min_confidence : probability value to filter rules with greater confidence than the specified threshold
  • min_lift : minimum lift value to shortlist the list of rules
  • min_length : minimum number of items you want in your rules
association_rules = apriori(records, min_support = 0.0055, min_confidence = .3, min_lift = 3, min_length = 2)

Convert above rules into a list of rules:

association_results = list(association_rules)

Now let’s see how many rules had been generated by our algorithm:

print(len(association_results))
MachineX: Demystifying Market Basket analysis

So, In total, we have 18 rules and those have support, confidence and lift higher than what we expect. Let’s see some of the rules

print(association_results[5])
MachineX: Demystifying Market Basket analysis

we can see that rule 5 contains (spaghetti, ground beef, frozen vegetables) which have a good association between them.

Display the list of rules

for item in association_results:
    pair = item[0]
    items = [x for x in pair]
    print("Rule :"+ str(items[0]) + "->" + str(items[1]))
    print("Support : {}".format(item[1]))
    print("Confidence : {}".format(item[2][0][2]))
    print("List : {}".format(item[2][0][3]))
    print("\n-------------------------------------------------\n")
MachineX: Demystifying Market Basket analysis

So, this was all about how to implement the apriori algorithm to find associativity in our set of transactions.

Stay Tunes, happy learning

Follow MachineX Intelligence for more:

References


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据结构与算法

数据结构与算法

许卓群、杨冬青、唐世渭、张铭 / 高等教育出版社 / 2004-1 / 29.50元

《数据结构与算法》把数据结构的原理和算法分析技术有机地结合在一起,系统地介绍了各种类型的数据结构和排序、检索的各种算法,还引入了一些比较高级的数据结构及相关的算法分析技术。.《数据结构与算法》分为基本数据结构、排序和检索、高级数据结构三部分。借助抽象数据类型,从逻辑结构的角度系统地介绍了线性表、字符串、二叉树、树和图等各种基本数据结构;从算法的角度讨论排序、检索和索引算法;从应用的角度介绍了一些复......一起来看看 《数据结构与算法》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具