Storing GitHub Traffic with Azure Functions

栏目: IT技术 · 发布时间: 5年前

内容简介:As an owner of a handful of large public repos for my job, one of the things that I care about is knowing the traffic to the repos to get an understanding of visibility. GitHub has this amazing feature, calledI have seen a few ways people are getting aroun

Frustrated by GitHub Insights

As an owner of a handful of large public repos for my job, one of the things that I care about is knowing the traffic to the repos to get an understanding of visibility. GitHub has this amazing feature, called GitHub Insights , which allows you to see some of this data, however it is capped at a date range of 14 days in the past. I am not sure on the long-term goals of Insights, but if that cap stays, it will frustrate folks, some of them very well known folks .

Storing Traffic Data Over Time

I have seen a few ways people are getting around this date cap, written in all sorts of languages on all sorts of platforms. They all had one thing in common, they all use the GitHub Api Traffic Endpoint to get this data. This Api has all the info I need in it, so thinking like a lazy developer, here are my options.

  • Go to the Traffic Pages for all my repos twice a month and collect the data (yea…. no)
  • Write some tool that will “poll” the GitHub Api and store the results of the Api in some data store

Azure Functions Saving Me Again

It seems like whenever I blog, the topic is usually some silly task that I have that I automate with Azure Functions, and this is a silly task, time to automate! The first things that I needed to do was spin up Visual Studio (VS Code works has great tooling for Functions as well) and create a new Azure Functions Project. After that, time to plan out what I needed to do, which I concluded was

  • Hit the GitHub Api to get the Traffic for some repos
  • Store that data somewhere I can easily build a Power BI report with

I wanted to get a working prototype as soon as possible, so I just picked a random repo out of the many I have, wired up the call to get the traffic data, which looked like

var client = new GitHubClient(new ProductHeaderValue("TrackRepos"));
var basicAuth = new Credentials(Environment.GetEnvironmentVariable("GitHubToken"));
client.Credentials = basicAuth;
var data = await client.Repository.Traffic.GetViews("owner", "repo", new RepositoryTrafficRequest(TrafficDayOrWeek.Day));

I found out very quickly that in order to get Traffic data for a repo, I need push access, so I needed a token. After getting a token from GitHub, I was seeing data come in.

Now that I had some data, I started to think about the quickest way to get that data into storage. I could put it in a SQL table, but that seemed like a ton of overkill for what I was doing(had to create a server as well) and I realized that I needed an Azure Storage account to store the metadata for Azure Functions, which also has a Table feature in that would allow me to store data (Cosmos DB is an option here, but a little overkill as well). So the next thing I needed to do was structure my data and wire up all the bits to Insert and Update Records in there. Good thing I could “borrow” all this code from the Table Api Docs so that was solved pretty quick. At this point, I wire it all up and I have a function that hits the GitHub Api and stores the results in Azure Table Storage.

Scaling for More Repos

Now that I have something working, time to build some code that I can use to collect data for all my repos. I came to an idea that if I had some JSON representation of all my repos, I could easily alter the list and the function would just use it as I go. I knew I had a hierarchy of sorts in my case.

  • Campaign Name (Collection of Repos)
  • Organization Name (Also known as the “owner”)
    • Repository Name (The Repo itself)

So I created a class structure and JSON schema that represents that.

[
  {
    "CampaignName": "",
    "OrgName": "",
    "Repos": [ { "RepoName": "" } ]
  },
  {
    "CampaignName": "",
    "OrgName": "",
    "Repos": [
      { "RepoName": "" }
    ]
  }
]

Now with a schema. I will be able to effectively store and group all my traffic data. I decided to store this JSON file that houses my campaigns as a blob in the same Azure Storage account as the data store, reuse is awesome! So the flow my code looks like this

  • Read the JSON campaigns from Blob Storage
  • Loop through all the campaigns and repos in campaigns
  • Call GitHub Api to get traffic data
  • Build POCO that represents a row in Azure Table Storage
  • Store said data

A list is cool, but what about the code?

var client = new GitHubClient(new ProductHeaderValue("TrackRepos"));

var basicAuth = new Credentials(Environment.GetEnvironmentVariable("GitHubToken"));
client.Credentials = basicAuth;

string storageConnectionString = Environment.GetEnvironmentVariable("AzureWebJobsStorage");
BlobServiceClient blobServiceClient = new BlobServiceClient(storageConnectionString);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient("repos");
BlobClient blobClient = containerClient.GetBlobClient("Repos.json");
BlobDownloadInfo download = await blobClient.DownloadAsync();

List<Campaign> campaigns = JsonConvert.DeserializeObject<List<Campaign>>(new StreamReader(download.Content).ReadToEnd());

List<RepoStats> views = new List<RepoStats>();

foreach (Campaign campaign in campaigns)
{
    if (!string.IsNullOrEmpty(campaign.CampaignName) && !string.IsNullOrEmpty(campaign.OrgName))
        foreach (Repo repo in campaign.Repos)
        {
            if (!string.IsNullOrEmpty(repo.RepoName))
            {
                var data = await client.Repository.Traffic.GetViews(campaign.OrgName, repo.RepoName, new RepositoryTrafficRequest(TrafficDayOrWeek.Day));
                foreach (var item in data.Views)
                {
                    var stat = new RepoStats($"{campaign.CampaignName}{repo.RepoName}", item.Timestamp.UtcDateTime.ToShortDateString().Replace("/", ""))
                    {
                        OrgName = campaign.OrgName,
                        CampaignName = campaign.CampaignName,
                        RepoName = repo.RepoName,
                        Date = item.Timestamp.UtcDateTime.ToShortDateString(),
                        Views = item.Count,
                        UniqueUsers = item.Uniques
                    };
                    views.Add(stat);
                }
                Thread.Sleep(3000);
            }
        }
}

string tableName = "RepoStats";
CloudTable table = await TableStorageHelper.CreateTableAsync(tableName);

foreach (var view in views)
{
    Console.WriteLine("Insert an Entity.");
    await TableStorageHelper.InsertOrMergeEntityAsync(table, view);
}

I am glossing over a bit here, but most of the code you don’t see is boilerplate, and you can look at the repo if you are curious. Now all I needed to do was hook it up to a function that runs on a CRON timer for once a day, and reap the rewards.

Hope you enjoy!

I had fun with this little project, and it solved a need, exactly what I love. From idea to deployed it took a little less than an hours, man Azure Functions are awesome. As always, feel free to let me know your thoughts. TTFN!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

计算广告

计算广告

刘鹏、王超 / 人民邮电出版社 / 2015-9-1 / 69.00元

计算广告是一项新兴的研究课题,它涉及大规模搜索和文本分析、信息获取、统计模型、机器学习、分类、优化以及微观经济学等诸多领域的知识。本书从实践出发,系统地介绍计算广告的产品、问题、系统和算法,并且从工业界的视角对这一领域具体技术的深入剖析。 本书立足于广告市场的根本问题,从计算广告各个阶段所遇到的市场挑战出发,以广告系统业务形态的需求和变化为主线,依次介绍合约广告系统、竞价广告系统、程序化交易......一起来看看 《计算广告》 这本书的介绍吧!

URL 编码/解码
URL 编码/解码

URL 编码/解码

SHA 加密
SHA 加密

SHA 加密工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具