Retail Data: R Package

栏目: IT技术 · 发布时间: 4年前

内容简介:Have you ever noticed how things seem to get really expensive at specific times of the year? Like Mother’s Day and Valentine’s Day? Have you ever felt a bit ripped off when buying an over-priced bouquet of flowers or box of chocolates? Have you ever wonder
[This article was first published on R | datawookie , and kindly contributed toR-bloggers]. (You can report issue about the content on this pagehere)Want to share your content on R-bloggers?click here if you have a blog, or here if you don't.

Retail Data: R Package

Have you ever noticed how things seem to get really expensive at specific times of the year? Like Mother’s Day and Valentine’s Day? Have you ever felt a bit ripped off when buying an over-priced bouquet of flowers or box of chocolates? Have you ever wondered just how much those prices have been inflated?

Of course you have!

But it’s always been a niggling suspicion, never a fact. Where’s the evidence?

I set out to gather that evidence using the Retail Pricing Data API .

Retail Data: R Package

I could have gathered the required data by hitting the API with a series of curl requests from the command line. Or using the {httr} package from R. However, interacting with the API directly is somewhat laborious. It’d be a lot easier if the API was wrapped up in an R package.

The R Package

Retail Data: R Package

So I made a hex logo and after that the R package basically wrote itself.

The repository is at https://github.com/datawookie/retail/ and there’s a (WIP) homepage too.

Install the package from GitHub.

> remotes::install_github("datawookie/retail")

Then load it up.

> library(retail)

List of Retailers

The retailer() function gives access to a table of retailers, their website URLs and operating currencies.

> retailer()
# A tibble: 64 x 4
      id name             url                              currency
   
<int> 
 <chr>            
  <chr>                            
   <chr>
       
 1     1 EEM Technologies https://www.eemtechnologies.com/ USD     
 2     2 Clicks           https://clicks.co.za/            ZAR     
 3     3 Dischem          https://www.dischem.co.za/       ZAR     
 4     4 Game             https://www.game.co.za/          ZAR     
 5     5 Woolworths       https://www.woolworths.co.za/    ZAR     
 6     6 Fortnum & Mason  https://www.fortnumandmason.com/ GBP     
 7     7 John Lewis       https://www.johnlewis.com/       GBP     
 8     8 Marks & Spencer  https://www.marksandspencer.com/ GBP     
 9     9 Pick n Pay       https://www.pnp.co.za/           ZAR     
10    10 Makro            https://www.makro.co.za/         ZAR     
# … with 54 more rows
   </chr>
  </chr>
 </chr>
</int>

List of Products

To dig any deeper than that you’ll need an API key (ping me if you want one!). Use set_api_key() to specify the key. You only need to do this once. The key will then be used for all subsequent transactions.

> API_KEY = "5bed3ac9-6dc9-4926-aed8-8c97a7cb8057"
> set_api_key(API_KEY)

The retailer_products() function yields a product table for a specific retailer, where each product is assigned a name, brand, model, SKU and barcode (if available).

> retailer_products(5) %>% select(id, name, brand, sku)
# A tibble: 39,913 x 4
       id name                                              brand        sku          
    
<int> 
 <chr>                                             
  <chr>        
   <chr>
            
 1 611975 Two Toned Skater Print Trunks 3 Pack              NA           6009214350547
 2 611980 Sock Knit Bumper Sneakers (Size 4-13) Younger Boy NA           6009214831060
 3 611983 Pattern Cotton Boxers 2 Pack                      (&US)        6009214703176
 4 611985 Cargo Shorts                                      NA           6009214449494
 5 611990 Nautical Cotton Shirt                             (&US)        6009214476001
 6 611992 Dino Cotton Rich Socks 3 Pack                     NA           6009214359908
 7 611997 COUNTRY ROAD Spliced T-Shirt                      Country Road 9340243972506
 8 612000 Restlessness Flatbill Cap                         (&US)        6009214695327
 9 612024 Cuffed Abrasion Stonewash Jeans                   NA           6009214054353
10 612032 Striped Cotton Shirt                              NA           6009214471167
# … with 39,903 more rows
   </chr>
  </chr>
 </chr>
</int>

Product Details

You can get more granular, looking at a specific product using the product() and product_prices() functions.

Product: Wine

Let’s take a look at a bottle of wine.

> nederburg_lyric <- product(531589)
> names(nederburg_lyric)
[1] "id"          "retailer_id" "url"         "name"        "brand"       "sku"        
[7] "barcodes"
> nederburg_lyric$name
[1] "Nederburg Lyric 750ml"
> nederburg_lyric$sku
[1] "000000000000230428_EA"
> nederburg_lyric$barcodes
[1] "6001452314503"

The price history data includes both regular and promotion prices, as well as availability. Availability data are not currently being gathered for this product.

> product_prices(531589)
  product_id                      time price price_promotion available
1     531589 2020-03-07T01:04:08+00:00 55.00              45        NA
2     531589 2020-02-22T01:00:32+00:00 55.00              NA        NA
3     531589 2020-02-15T01:00:02+00:00 55.00              45        NA
4     531589 2020-02-08T00:43:46+00:00 51.99              45        NA
5     531589 2020-02-01T00:57:02+00:00 51.99              45        NA

Product: Clothing

Let’s take a look at an item of clothing from another retailer. In this case we are gathering availability data.

> product_prices(788165)
   product_id                      time  price price_promotion available
1      788165 2020-03-12T01:24:47+00:00  79.99              NA     FALSE
2      788165 2020-03-11T01:14:35+00:00  79.99              NA     FALSE
3      788165 2020-03-10T01:16:10+00:00  79.99              NA     FALSE
4      788165 2020-03-09T01:20:16+00:00  79.99              NA     FALSE
5      788165 2020-03-08T01:13:43+00:00  79.99              NA     FALSE
6      788165 2020-03-07T01:18:16+00:00  79.99              NA     FALSE
7      788165 2020-03-06T01:07:52+00:00  79.99              NA     FALSE
8      788165 2020-03-05T01:18:41+00:00  79.99              NA     FALSE
9      788165 2020-03-04T01:10:30+00:00  79.99              NA     FALSE
10     788165 2020-03-03T01:12:49+00:00  79.99              NA     FALSE
11     788165 2020-03-02T01:17:34+00:00  79.99              NA     FALSE
12     788165 2020-03-01T04:51:23+00:00  79.99              NA     FALSE
13     788165 2020-02-27T01:18:01+00:00  79.99              NA      TRUE
14     788165 2020-02-26T01:23:26+00:00  79.99              NA      TRUE
15     788165 2020-02-25T01:10:53+00:00  79.99              NA      TRUE
16     788165 2020-02-24T01:21:07+00:00  79.99              NA      TRUE
17     788165 2020-02-23T01:21:20+00:00  79.99              NA      TRUE
18     788165 2020-02-22T01:22:24+00:00  79.99              NA      TRUE
19     788165 2020-02-21T01:24:15+00:00  79.99              NA      TRUE
20     788165 2020-02-20T01:23:19+00:00  79.99              NA      TRUE
21     788165 2020-02-19T01:34:00+00:00  99.99              NA      TRUE
22     788165 2020-02-18T01:13:51+00:00  99.99              NA      TRUE
23     788165 2020-02-17T01:40:17+00:00  99.99              NA      TRUE
24     788165 2020-02-16T01:30:09+00:00  99.99              NA      TRUE
25     788165 2020-02-15T01:37:56+00:00  99.99              NA      TRUE
26     788165 2020-02-14T01:21:11+00:00 119.99              NA      TRUE
27     788165 2020-02-13T01:30:18+00:00 119.99              NA      TRUE

This product was initially selling at R 119.99. The price dropped to R 99.99 on 15 February 2020 and then R 79.99 on 20 February 2020. On 1 March 2020 it sold out (no longer available).

Let’s take a look at the product in question.

Retail Data: R Package

Yeah, I can see why that sold out!

Valentine’s Day

Let’s return to our original question: to what degree are prices inflated around “special” days like Valentine’s Day?

Getting Screwed

If you want to get screwed on Valentines day, buy roses.

Retail Data: R Package

From the middle of January the price of this bunch of roses increased repeatedly, so that by Valentine’s Day, you were paying at least 50% more per bunch. That’s really getting screwed. And not in a good way.

Here’s another example where the price of a bunch of 100 roses varied wildly before Valentine’s Day, the final price being roughly 60% higher than it was a month earlier.

Retail Data: R Package

The price of a small vase of pink roses would set you back by less than R300 at the beginning of January, but this shot up to just less than R500 in the latter part of January and remained around that level until after Valentine’s Day.

Retail Data: R Package

Clearly flowers in general (and roses in particular!) are not an economical gift on Valentine’s Day.

Not Getting Screwed

What if you don’t want to get screwed? Well, then the obvious solution is… buy beer.

Retail Data: R Package

Why? Because the price of beer, from the same retailer, remains completely unchanged.

Retail Data: R Package

The price of a bunch of roses is included in the above plot (grey dashed line) for reference.

Conclusion

This is just a fun illustration of what’s possible with retail price historical data. There are many serious and useful things that you can do with it too! If you’re interested in these data, then check out the project page and get in touch for an API key.

Finally, a note on sampling frequency: we are currently sampling most retailers only once a week. If there is interest in these data then we’ll be able to scale out our infrastructure and start to sample more retailers daily. Ultimate goal is to sample all retailers and products once per day.


以上所述就是小编给大家介绍的《Retail Data: R Package》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

统计思维:程序员数学之概率统计

统计思维:程序员数学之概率统计

Allen B.Downey / 张建锋、陈钢 / 人民邮电出版社 / 2013-5 / 29.00元

代码跑出来的概率统计问题; 程序员的概率统计开心辞典; 开放数据集,全代码攻略。 现实工作中,人们常被要求用数据说话。可是,数据自己是不能说话的,只有对它进行可靠分析和深入挖掘才能找到有价值的信息。概率统计是数据分析的通用语言,是大数据时代预测未来的根基。 站在时代浪尖上的程序员只有具备统计思维才能掌握数据分析的必杀技。本书正是一本概率统计方面的入门图书,但视角极为独特,折......一起来看看 《统计思维:程序员数学之概率统计》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

URL 编码/解码
URL 编码/解码

URL 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器