Interpreting my 7-Eleven visits with hierarchical clustering, anomaly detection, and time s...

栏目: IT技术 · 发布时间: 4年前

内容简介：Name a better pair than 7-Eleven and Asia. I’ll wait…Ok, fair enough. Food, culture, and surreal cities are also valid choices. But my point still stands.During my traveling adventures in this beautiful continent, I’ve quite often found solace (and snacks)

Name a better pair than 7-Eleven and Asia. I’ll wait…

Ok, fair enough. Food, culture, and surreal cities are also valid choices. But my point still stands. 7-Eleven is, in my opinion, a staple of the lifestyle of certain Asian countries. There, you can find (almost) anything you need. Did you just land and need a SIM card? Go to 7–11. Do you need water because security made you empty your bottle? Go to 7–11. Hungry and want a cheap (and tasty) breakfast? You already know.

During my traveling adventures in this beautiful continent, I’ve quite often found solace (and snacks) in this convenience store. In fact, I’ve been so many times there that I have enough data to study, and since I don’t like wasting data, well, I analyzed it.

In this article, I’ll show what I discovered after investigating my 7-Eleven check-in data collected during my time in Asia. My analysis consists of two sections: which , when . In which , the goal is to discover which 7-Eleven’s I visited and how many times. In this part, I’ll summarize the dataset and explore it with a series of visualizations, while trying to answer the question, “which 7–11 have I visited?”

Then comes the when part. In this segment, the objective is finding the trend or pattern of my visits (if there’s any!). To find this out, I’ll use hierarchical clustering , anomaly detection , and time series .

The data

This project’s dataset consists of 99 check-ins I logged using the Foursquare’s Swarm app during the period of July 7, 2019, to December 15, 2019. I collected the data in the following countries: Singapore, Malaysia, Thailand, Hong Kong, and Japan.

The tools

The experiment uses R and Python code. The primary analysis — visualizations, clustering, and data exploration — is done in R. With Python, I used the library foursquare , Prophet to perform the time series analysis, and scikit-learn to do the anomaly detection.

Let’s begin!

Getting the data

Every data project starts with data — the new electricity (catchy phrase, I know). So, my first step was collecting it. To do this, I wrote a small Python script that retrieves my Swarm check-in data and stores it in a JSON file. The following code shows how:

Here, I’m getting all my check-ins created between two dates. Since each API call retrieves a max of 250 check-ins, in each iteration, I had to increase the offset variable by 250 to get the subsequent 250.

Which 7-Eleven?

From July 7 to December 15, 2019 (162 days), I had the pleasure of visiting the store 99 times on 78 different days. That’s 48% of the days — almost one visit every two days! However, and here comes the first pitfall of the analysis, I have to keep in mind that not every country I was at had 7-Elevens. For example, Cambodia has none. As a result, I need to remove those cases from my count. With that done, the total amount of days spent in countries with 7-Elevens reduces to 127, increasing the percentage to 61% — a bit more than once every two days. Figure 1 below presents the calendar. The days coded in red are those where I didn’t visit the shop, while those in blue are the days where I did visit.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Interpreting my 7-Eleven visits with hierarchical clustering, anomaly detection, and time s...

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

超简单！一学就懂的互联网金融

视觉图文 / 人民邮电出版社 / 2015-2-1 / 45.00元

零基础、全图解，通过130多个精辟的知识点、220多张通俗易懂的逻辑图表，让您一书在手，即可彻底看懂、玩转互联网金融从菜鸟成为达人，从新手成为互联网金融高手！本书主要特色：最简洁的版式＋最直观的图解＋最实用的内容。本书细节特色：10章专题内容详解＋80多个特别提醒奉献＋130多个知识点讲解＋220多张图片全程图解，深度剖析互联网金融的精华之处，帮助读者在最短的时间内掌握互联网金融知......一起来看看《超简单！一学就懂的互联网金融》这本书的介绍吧!

码农工具