Building Hex Diff

栏目: IT技术 · 发布时间: 4年前

内容简介:I wanted to give some insight into the Hex Diff project, how it works, and some issues we ran into on the way. The project isI was excited to finally have a good use case for LiveView, and although it took some time to figure out how to interact withIf you

I wanted to give some insight into the Hex Diff project, how it works, and some issues we ran into on the way. The project is open-source , so feel free to look behind the curtain to see what makes it tick. Some parts of the project originate from my toy version, still running at https://diff.jola.dev , but others were written specifically for this project. We also ended up finding and fixing a bunch of bugs in other projects!

I was excited to finally have a good use case for LiveView, and although it took some time to figure out how to interact with select tags, the end result is amazingly snappy. Adding suggestions, based on what you’ve typed, and handling corner cases like packages with only one version was a breeze. The result is a user-friendly UI that searches as you type and when you find the package you’re looking for makes it easy to select a version range. A page like this really makes LiveView shine.

Reading Data From Hex

If you want to interact with Hex and you’re using a BEAM language, you’re in luck. The Hex team created a library called hex_core that cover all kinds of interesting functionality.

Hex has an API with loads of useful endpoints, including search, but we ended up not using those at all. The Hex API is rate limited based on your API key, and even though we could have set up a special key for this project, the very fact that the endpoints are rate-limited tell us something. Calling them is not free. If we can avoid them, we should.

Really, the only things we need are a list of package names and a list of versions for each of the packages, and we’ll be able to implement our own search. Luckily, hex_core also lets you query the CDN fronted registry itself for simple stuff, like getting all package names and versions! This means we can do super cheap requests for this data.

The last thing to consider is how LiveView works. While typing into the input field every single key input turns into an event on the backend. Although you could build throttling into this, the slower you are to react to the events, the less snappy the UI feels. Instead of loading all the repo data from Hex on mount into each LV process, or throttling or debouncing events, I went with a different approach. I set up an ETS table to own all the data, and then a process that periodically refreshes it with the latest results from Hex. Now we can query the package name and version data directly from ETS, and not worry about LiveView overloading anything while making the UI super fast.

Generating and Rendering Diffs

The first toy version of my web differ just shelled out to mix hex.package diff and used a JS library on the frontend to render the diff. This was an easy way to get started, but it limited our ability to make improvements. We needed to move more of the functionality into the project, and into Elixir.

To avoid requiring mix on the machine that runs diff.hex.pm , we recreated the functionality in mix hex.package diff command. Again relying on hex_core , which also exposes functions for downloading package tarballs and unpacking them. Once we have the two packages unpacked locally, it’s just a question of shelling out to git diff . A possible route of improvement, also suggested by Hauleth on the forum , is to run the git diff algorithm as a NIF, to avoid shelling out at all!

But now we have a text representation of a diff. How to do we get to rendered HTML from here? Luckily I found an Elixir package that parses that text representation and turns it into Elixir structs. To verify that the package works correctly, I downloaded the entire Hex registry, created diffs for the first and last version of each, and parsed all of them. I found some diffs that broke the parsing logic, and we were able to fix those bugs.

Once we had the diff represented as Elixir data structures, it was just a question of rendering it as HTML. Because this was already a Phoenix project, Phoenix.View.render_to_iodata did the trick. Note that the diff itself is rendered separately from the response sent to users. Every time a new diff is generated it is rendered and stored in the cloud. This means if you request the same package and from/to range, we serve a pre-rendered diff!

The key to effective use of caching is in how you create the cache key. An early version of Diff based the key on the package name, from version, and to version, but that introduces a subtle bug where the Diff service can return incorrect diffs. When a package is pushed to Hex it is immutable and kept forever, except not immediately. The package uploader has a grace period, where they can upload a changed version of the package. This means package maintainers have a chance to fix their mistake if they accidentally uploaded a broken package and noticed in time. It also means that an attacker with access to the credentials to upload a package can create a new version, ask Diff to generate a diff for it (and cache it), and then modify the package on Hex. The diff would be unchanged because the key only relies on name and version range.

To avoid this problem, Diff also incorporates the latest checksum of the two diffed packages in the key. So even if a diff is created in the grace period, it wouldn’t be served after the package modified because the checksum would change. Finally, in order to update the rendering logic for diffs, the key also includes a cache version key. In order to invalidate the diff cache, all you have to do is bump the cache version key in the configuration. Or clear out the storage itself. Here’s the module for local diff storage.

GCP Load Balancer

One thing we noticed just as it was put into production was that the GCP load balancer is not very friendly to apps using WebSockets. Unlike many other load balancers, it doesn’t differentiate between active and inactive connections. Both are closed after the timeout, which is 30s by default , and there’s no way to configure it to allow active connections to stay around longer than that timeout. The way Phoenix handles WebSockets is that it keeps the connection alive by repeatedly sending heartbeats, but that doesn’t help here.

For many WebSocket applications that might be fine, you just re-establish the WebSocket connection. But LiveView doesn’t handle it well. When the connection is lost, the LiveView process on the server-side dies. That means that when you reconnect, the state is lost. Steve Bussey , the author of Real-Time Phoenix , pointed me to this forum thread where Chris Mccord mentions a potential solution where you keep the state on the client-side. That way, when you reconnect, you can rehydrate the LiveView process with the state from the client. Hopefully, a future version of LiveView will have this built-in. Just as an experiment I implemented this, but it’s a lot of code to maintain when there’s an easier solution that covers most cases.

According to GCP’s documentation , if you want to use WebSockets or other long-running connections, the solution is just to increase the timeout. That’s what we did, and now it’s fairly unlikely that anyone is interrupted by the WebSocket connection timing out!

Testing the search view was an interesting problem. There’s a lot of logic there, plus the search itself ultimately relies on Hex. I decided to use Mox to be able to control the results of get_packages and get_versions in the search view. Once I had that set up, using the LiveView testing helpers made the rest a walk in the park. Being able to do this without having to set up automated browser testing was a great relief! You can see what that looks like here .


以上所述就是小编给大家介绍的《Building Hex Diff》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

颠覆式创新:移动互联网时代的生存法则

颠覆式创新:移动互联网时代的生存法则

李善友 / 机械工业出版社 / 2015-3-1

为什么把每件事情都做对了,仍有可能错失城池?为什么无人可敌的领先企业,却在一夜之间虎落平阳?短短三年间诺基亚陨落,摩托罗拉以区区29亿美元出售给联想,芯片业霸主英特尔在移动芯片领域份额几乎为零,风光无限的巨头转眼成为被颠覆的恐龙,默默无闻的小公司一战成名迅速崛起,令人瞠目结舌的现象几乎都能被“颠覆式创新”法则所解释。 颠覆式创新教你在新的商业竞争中“换操作系统”而不是“打补丁”,小公司用破坏......一起来看看 《颠覆式创新:移动互联网时代的生存法则》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码