Interpreting my 7-Eleven visits with hierarchical clustering, anomaly detection, and time s...

栏目: IT技术 · 发布时间: 4年前

内容简介:Name a better pair than 7-Eleven and Asia. I’ll wait…Ok, fair enough. Food, culture, and surreal cities are also valid choices. But my point still stands.During my traveling adventures in this beautiful continent, I’ve quite often found solace (and snacks)

Name a better pair than 7-Eleven and Asia. I’ll wait…

Ok, fair enough. Food, culture, and surreal cities are also valid choices. But my point still stands. 7-Eleven is, in my opinion, a staple of the lifestyle of certain Asian countries. There, you can find (almost) anything you need. Did you just land and need a SIM card? Go to 7–11. Do you need water because security made you empty your bottle? Go to 7–11. Hungry and want a cheap (and tasty) breakfast? You already know.

During my traveling adventures in this beautiful continent, I’ve quite often found solace (and snacks) in this convenience store. In fact, I’ve been so many times there that I have enough data to study, and since I don’t like wasting data, well, I analyzed it.

In this article, I’ll show what I discovered after investigating my 7-Eleven check-in data collected during my time in Asia. My analysis consists of two sections: which , when . In which , the goal is to discover which 7-Eleven’s I visited and how many times. In this part, I’ll summarize the dataset and explore it with a series of visualizations, while trying to answer the question, “which 7–11 have I visited?”

Then comes the when part. In this segment, the objective is finding the trend or pattern of my visits (if there’s any!). To find this out, I’ll use hierarchical clustering , anomaly detection , and time series .

The data

This project’s dataset consists of 99 check-ins I logged using the Foursquare’s Swarm app during the period of July 7, 2019, to December 15, 2019. I collected the data in the following countries: Singapore, Malaysia, Thailand, Hong Kong, and Japan.

The tools

The experiment uses R and Python code. The primary analysis — visualizations, clustering, and data exploration — is done in R. With Python, I used the library foursquare , Prophet to perform the time series analysis, and scikit-learn to do the anomaly detection.

Let’s begin!

Getting the data

Every data project starts with data — the new electricity (catchy phrase, I know). So, my first step was collecting it. To do this, I wrote a small Python script that retrieves my Swarm check-in data and stores it in a JSON file. The following code shows how:

Here, I’m getting all my check-ins created between two dates. Since each API call retrieves a max of 250 check-ins, in each iteration, I had to increase the offset variable by 250 to get the subsequent 250.

Which 7-Eleven?

From July 7 to December 15, 2019 (162 days), I had the pleasure of visiting the store 99 times on 78 different days. That’s 48% of the days — almost one visit every two days! However, and here comes the first pitfall of the analysis, I have to keep in mind that not every country I was at had 7-Elevens. For example, Cambodia has none. As a result, I need to remove those cases from my count. With that done, the total amount of days spent in countries with 7-Elevens reduces to 127, increasing the percentage to 61% — a bit more than once every two days. Figure 1 below presents the calendar. The days coded in red are those where I didn’t visit the shop, while those in blue are the days where I did visit.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Introduction to Computer Science Using Python

Introduction to Computer Science Using Python

Dierbach, Charles / 2012-12 / $ 133.62

Introduction to Computer Science Using Python: A Computational Problem-Solving Focus introduces students to programming and computational problem-solving via a back-to-basics, step-by-step, objects-la......一起来看看 《Introduction to Computer Science Using Python》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

随机密码生成器
随机密码生成器

多种字符组合密码

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具