Storing GitHub Traffic with Azure Functions

栏目: IT技术 · 发布时间: 4年前

内容简介：As an owner of a handful of large public repos for my job, one of the things that I care about is knowing the traffic to the repos to get an understanding of visibility. GitHub has this amazing feature, calledI have seen a few ways people are getting aroun

Frustrated by GitHub Insights

As an owner of a handful of large public repos for my job, one of the things that I care about is knowing the traffic to the repos to get an understanding of visibility. GitHub has this amazing feature, called GitHub Insights , which allows you to see some of this data, however it is capped at a date range of 14 days in the past. I am not sure on the long-term goals of Insights, but if that cap stays, it will frustrate folks, some of them very well known folks .

Storing Traffic Data Over Time

I have seen a few ways people are getting around this date cap, written in all sorts of languages on all sorts of platforms. They all had one thing in common, they all use the GitHub Api Traffic Endpoint to get this data. This Api has all the info I need in it, so thinking like a lazy developer, here are my options.

Go to the Traffic Pages for all my repos twice a month and collect the data (yea…. no)
Write some tool that will “poll” the GitHub Api and store the results of the Api in some data store

Azure Functions Saving Me Again

It seems like whenever I blog, the topic is usually some silly task that I have that I automate with Azure Functions, and this is a silly task, time to automate! The first things that I needed to do was spin up Visual Studio (VS Code works has great tooling for Functions as well) and create a new Azure Functions Project. After that, time to plan out what I needed to do, which I concluded was

Hit the GitHub Api to get the Traffic for some repos
Store that data somewhere I can easily build a Power BI report with

I wanted to get a working prototype as soon as possible, so I just picked a random repo out of the many I have, wired up the call to get the traffic data, which looked like

var client = new GitHubClient(new ProductHeaderValue("TrackRepos"));
var basicAuth = new Credentials(Environment.GetEnvironmentVariable("GitHubToken"));
client.Credentials = basicAuth;
var data = await client.Repository.Traffic.GetViews("owner", "repo", new RepositoryTrafficRequest(TrafficDayOrWeek.Day));

I found out very quickly that in order to get Traffic data for a repo, I need push access, so I needed a token. After getting a token from GitHub, I was seeing data come in.

Now that I had some data, I started to think about the quickest way to get that data into storage. I could put it in a SQL table, but that seemed like a ton of overkill for what I was doing(had to create a server as well) and I realized that I needed an Azure Storage account to store the metadata for Azure Functions, which also has a Table feature in that would allow me to store data (Cosmos DB is an option here, but a little overkill as well). So the next thing I needed to do was structure my data and wire up all the bits to Insert and Update Records in there. Good thing I could “borrow” all this code from the Table Api Docs so that was solved pretty quick. At this point, I wire it all up and I have a function that hits the GitHub Api and stores the results in Azure Table Storage.

Scaling for More Repos

Now that I have something working, time to build some code that I can use to collect data for all my repos. I came to an idea that if I had some JSON representation of all my repos, I could easily alter the list and the function would just use it as I go. I knew I had a hierarchy of sorts in my case.

Campaign Name (Collection of Repos)
Organization Name (Also known as the “owner”)
- Repository Name (The Repo itself)

So I created a class structure and JSON schema that represents that.

[
  {
    "CampaignName": "",
    "OrgName": "",
    "Repos": [ { "RepoName": "" } ]
  },
  {
    "CampaignName": "",
    "OrgName": "",
    "Repos": [
      { "RepoName": "" }
    ]
  }
]

Now with a schema. I will be able to effectively store and group all my traffic data. I decided to store this JSON file that houses my campaigns as a blob in the same Azure Storage account as the data store, reuse is awesome! So the flow my code looks like this

Read the JSON campaigns from Blob Storage
Loop through all the campaigns and repos in campaigns
Call GitHub Api to get traffic data
Build POCO that represents a row in Azure Table Storage
Store said data

A list is cool, but what about the code?

var client = new GitHubClient(new ProductHeaderValue("TrackRepos"));

var basicAuth = new Credentials(Environment.GetEnvironmentVariable("GitHubToken"));
client.Credentials = basicAuth;

string storageConnectionString = Environment.GetEnvironmentVariable("AzureWebJobsStorage");
BlobServiceClient blobServiceClient = new BlobServiceClient(storageConnectionString);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("repos");
BlobClient blobClient = containerClient.GetBlobClient("Repos.json");
BlobDownloadInfo download = await blobClient.DownloadAsync();

List<Campaign> campaigns = JsonConvert.DeserializeObject<List<Campaign>>(new StreamReader(download.Content).ReadToEnd());

List<RepoStats> views = new List<RepoStats>();

foreach (Campaign campaign in campaigns)
{
    if (!string.IsNullOrEmpty(campaign.CampaignName) && !string.IsNullOrEmpty(campaign.OrgName))
        foreach (Repo repo in campaign.Repos)
        {
            if (!string.IsNullOrEmpty(repo.RepoName))
            {
                var data = await client.Repository.Traffic.GetViews(campaign.OrgName, repo.RepoName, new RepositoryTrafficRequest(TrafficDayOrWeek.Day));
                foreach (var item in data.Views)
                {
                    var stat = new RepoStats($"{campaign.CampaignName}{repo.RepoName}", item.Timestamp.UtcDateTime.ToShortDateString().Replace("/", ""))
                    {
                        OrgName = campaign.OrgName,
                        CampaignName = campaign.CampaignName,
                        RepoName = repo.RepoName,
                        Date = item.Timestamp.UtcDateTime.ToShortDateString(),
                        Views = item.Count,
                        UniqueUsers = item.Uniques
                    };
                    views.Add(stat);
                }
                Thread.Sleep(3000);
            }
        }
}

string tableName = "RepoStats";
CloudTable table = await TableStorageHelper.CreateTableAsync(tableName);

foreach (var view in views)
{
    Console.WriteLine("Insert an Entity.");
    await TableStorageHelper.InsertOrMergeEntityAsync(table, view);
}

I am glossing over a bit here, but most of the code you don’t see is boilerplate, and you can look at the repo if you are curious. Now all I needed to do was hook it up to a function that runs on a CRON timer for once a day, and reap the rewards.

Hope you enjoy!

I had fun with this little project, and it solved a need, exactly what I love. From idea to deployed it took a little less than an hours, man Azure Functions are awesome. As always, feel free to let me know your thoughts. TTFN!

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Storing GitHub Traffic with Azure Functions

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Java JDK6学习笔记

林信良 / 清华大学出版社 / 2007-4 / 59.90元

《Java JDK6学习笔记》是作者良葛格本人近几年来学习Java的心得笔记，结构按照作者的学习脉络依次展开，从什么是Java、如何配置Java开发环境、基本的Java语法到程序流程控制、管理类文件、异常处理、枚举类型、泛型、J2SE中标准的API等均进行了详细介绍。本书还安排了一个“文字编辑器”的专题制作。此外，Java SE6的新功能，对Java lang等套件的功能加强，以及JDBC4.0、......一起来看看《Java JDK6学习笔记》这本书的介绍吧!

码农工具

JS 压缩/解压工具

在线压缩/解压 JS 代码

RGB HSV 转换

RGB HSV 互转工具