Practical use of Ruby PStore

栏目: IT技术 · 发布时间: 4年前

内容简介:Arkency blog has undergone several improvements over recent weeks. One of such changes was openingFor years the blog has been driven byChoosing Github as a backend for our posts was no-brainer. Developers are familiar with it. It has quite a nice integrate

Arkency blog has undergone several improvements over recent weeks. One of such changes was opening the source of blog articles . We’ve have concluded that having posts in the open would shorten the feedback loop and allow our readers to collaborate and make the articles better for all.

Nanoc + Github

For years the blog has been driven by nanoc , which is a static-site generator. You put a bunch of markdown files in, drop a layout and on the other side out of it comes the HTML. Let’s call this magic “compilation”. One of nanoc prominent features is data sources . With it one could render content not only from a local filesystem. Given appropriate adapter posts, pages or other data items can be fetched from 3rd party API. Like SQL database. Or Github!

Choosing Github as a backend for our posts was no-brainer. Developers are familiar with it. It has quite a nice integrated web editor with Markdown preview — which gives in-place editing. Pull requests create the space for discussion. Last but not least there is octokit gem for API interaction, taking much of the implementation burden out of our shoulders.

An initial data adapter looked like this to fetch articles looked like this:

class Source < Nanoc::DataSource
  identifier :github

  def items
    client = Octokit::Client.new(access_token: ENV['GITHUB_TOKEN'])
    client
      .contents(ENV['GITHUB_REPO'])
      .select { |item| item.end_with?(".md") }
      .map    { |item| client.contents(ENV['GITHUB_REPO'], path: item[:path]) }
      .map    { |item| new_item(item[:content], item, Nanoc::Identifier.new(item[:path])) }  
  end
end

This code:

  • gets a list of files in repository
  • filters it by extension to only let markdowns stay
  • gets content of each markdown file
  • transforms it into a nanoc item object

Good enough for a quick spike and exploration of the problem. Becomes problematic as soon as you start using it for real . Can you spot the problems?

Source data improved

For a repository with 100 markdown files we will have to make 100 + 1 HTTP requests in order to retrieve the content

  • it takes time and becomes annoying when you’re in the change-layout-recompile-content cycle of the work on the site
  • there is an API request limit per hour (slightly bigger when using token but still present)

Making those requests parallel will only make the process of hitting request quota faster. Something has to be done to limit number of requests that are needed.

Luckily enough octokit gem used faraday library for HTTP interaction and some kind souls documented how one could leverage faraday-http-cache middleware.

class Source < Nanoc::DataSource
  identifier :github

  def up
    stack = Faraday::RackBuilder.new do |builder|
      builder.use Faraday::HttpCache,
                  serializer: Marshal,
                  shared_cache: false
      builder.use Faraday::Request::Retry,
                  exceptions: [Octokit::ServerError]
      builder.use Octokit::Middleware::FollowRedirects
      builder.use Octokit::Response::RaiseError
      builder.use Octokit::Response::FeedParser
      builder.adapter Faraday.default_adapter
    end
    Octokit.middleware = stack
  end

  def items
    repository_items.map do |item|
      identifier     = Nanoc::Identifier.new("/#{item[:name]}")
      metadata, data = decode(item[:content])

      new_item(data, metadata, identifier, checksum_data: item[:sha])
    end
  end

  private

  def repository_items
    pool  = Concurrent::FixedThreadPool.new(10)
    items = Concurrent::Array.new
    client
      .contents(repository, path: path)
      .select { |item| item[:type] == "file" }
      .each   { |item| pool.post { items << client.contents(repository, path: item[:path]) } }
    pool.shutdown
    pool.wait_for_termination
    items
  rescue Octokit::NotFound => exc
    []
  end

  def client
    Octokit::Client.new(access_token: access_token)
  end

  def repository
    # ...
  end

  def path
    # ...
  end

  def access_token
    # ...
  end

  def decode(content)
    # ...
  end
end

Notice two main additions here:

  • the up method, used by nanoc when spinning the data source, which introduces cache middleware
  • Concurrent::FixedThreadPool from concurrent-ruby gem for concurrent requests in multiple threads

If only that cache worked… Faraday ships with in-memory cache, which is useless for the flow of work one has with nanoc. We’d very much like to persist the cache across runs of the compile process. Documentation indeed shows how one could switch cache backend to one from Rails but that is not helpful advice in nanoc context either. You probably wouldn’t like to start Redis or Memcache instance just to compile a bunch of HTML!

Time to roll-up sleeves again. Knowing what API is expected, we can build file-based cache backend. And there little-known standard library gem we could use to free ourselves of reimplementing the basics again. So much for standing on the shoulders of giants again.

Enter PStore

PStore is a file based persistence mechanism based on a Hash. We can store Ruby objects — they’re serialized with Marshal before being dumped on disk. It supports transactional behaviour and can be madethread safe. Sounds perfect for the job!

class Cache
  def initialize(cache_dir)
    @store = PStore.new(File.join(cache_dir, "nanoc-github.store"), true)
  end

  def write(name, value, options = nil)
    store.transaction { store[name] = value }
  end

  def read(name, options = nil)
    store.transaction(true) { store[name] }
  end

  def delete(name, options = nil)
    store.transaction { store.delete(name) }
  end

  private
  attr_reader :store
end

In the end that cache store turned out to be merely a wrapper on pstore. How convenient! Thread safety is achieved here by using Mutex internaly around transaction block.

class Source < Nanoc::DataSource
  identifier :github

  def up
    stack = Faraday::RackBuilder.new do |builder|
      builder.use Faraday::HttpCache,
                  serializer: Marshal,
                  shared_cache: false,
                  store: Cache.new(tmp_dir)
      # ...            
    end
    Octokit.middleware = stack
  end

  # ...
end

With persistent cache store plugged into Faraday we can now reap benefits of cached responses. Subsequent requests to Github API are skipped. Responses are being served directly from local files. That is, as long as the cache stays fresh..

Cache validity can be controlled by several HTTP headers . In case of Github API it is the Cache-Control: private, max-age=60, s-maxage=60 that matters. Together with Date header this roughly means that the content will be valid for 60 seconds since the response was received. Is it much? For frequently changed content — probably. For blog articles I’d prefer something more long-lasting…

And that is how we arrive to the last piece of nanoc-github . A faraday middleware to allow extending cache time. It is a quite primitive piece of code that substitutes max-age value to the desired one. For my particular needs I set this value 3600 seconds. The general idea is that we modify HTTP responses from API before they hit the cache. Then the cache middleware examines cache validity based on modified age, rather than original one. Simple and good enough. Just be careful to add this to middleware stack in correct order :sweat_smile:

class ModifyMaxAge < Faraday::Middleware
  def initialize(app, time:)
    @app  = app
    @time = Integer(time)
  end

  def call(request_env)
    @app.call(request_env).on_complete do |response_env|
      response_env[:response_headers][:cache_control] = "public, max-age=#{@time}, s-maxage=#{@time}"
    end
  end
end

And that’s it! I hope you found this article useful and learned a bit or two. Drop me a line on my twitter or leave a star on this project:

Happy hacking!


以上所述就是小编给大家介绍的《Practical use of Ruby PStore》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

图解物联网

图解物联网

[ 日] NTT DATA集团、河村雅人、大塚纮史、小林佑辅、小山武士、宫崎智也、石黑佑树、小岛康平 / 丁 灵 / 人民邮电出版社 / 2017-4 / 59.00元

本书图例丰富,从设备、传感器及传输协议等构成IoT的技术要素讲起,逐步深入讲解如何灵活运用IoT。内容包括用于实现IoT的架构、传感器的种类及能从传感器获取的信息等,并介绍了传感设备原型设计必需的Arduino等平台及这些平台的选择方法,连接传感器的电路,传感器的数据分析,乃至IoT跟智能手机/可穿戴设备的联动等。此外,本书以作者们开发的IoT系统为例,讲述了硬件设置、无线通信及网络安全等运用Io......一起来看看 《图解物联网》 这本书的介绍吧!

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具