Symlinks and hardlinks, move over, make room for reflinks!

栏目: IT技术 · 发布时间: 4年前

内容简介:If you’ve been around Linux for a while, you know aboutsymlinks andhardlinks. You’ve used them and you know the differences between how each of them behaves. Besides being a useful filesystem tool, they’re also a favorite interview question, used to gauge

If you’ve been around Linux for a while, you know aboutsymlinks andhardlinks. You’ve used them and you know the differences between how each of them behaves. Besides being a useful filesystem tool, they’re also a favorite interview question, used to gauge a candidate’s familiarity with filesystems.

What you might not know is that there’s also a thing called reflink. Right now it’s supported only on a handful of filesystems, such as Apple’sAPFS, used on all Apple devices,XFS which is used on lots of Linux file-sharing servers,Btrfs,OCFS2, and Microsoft’sReFS.

If a symlink is a shortcut to another file, and a hardlink is a first-class pointer to the same inode as another file, what’s a reflink, and when is it useful?

A reflink is a tool for doingcopy-on-write on the filesystem.

If you’ve heard the term copy-on-write before, I’m willing to bet that it was in the context of the Linux fork call. Let’s talk a bit about that.

Copy-on-write when forking a process

When you fork a process in Linux, the new process has a new copy of the old process’s memory. This is essential, because if the new process shared the old process’s memory, either process could crash if the other process was making an unexpected change to their shared memory. Therefore, Linux needs to make a copy.

However, Linux is smart, and it knows better than to just make a naive copy. Making a naive copy could be a waste of memory, especially if your process has several gigabytes of memory allocated, and you’re forking lots of processes for small tasks. If Linux were to make naive copies, you could find yourself with an out-of-memory crash very quickly.

When you fork a process, Linux uses copy-on-write to create the new process’s memory. This means that it holds off on making actual copies of the existing memory pages until the last possible moment; which means, the moment when the two processes start having different ideas on what the content of these memory pages should be. In other words, as soon as one of these processes start writing to these memory pages, Linux makes a copy of it, assigning the original page to the original process, and the new copy to the newly-forked process.

This is a huge boon, because most of the time, the new process will either only be reading the memory, or not even that. So many copy actions are avoided thanks to this technique. The beauty part is that these shenanigans are completely transparent to the process, and to the developer who’s writing the logic that this process performs. The new process behaves as if it has its own copy of the parent’s memory pages, and the floor is being paved ahead of it as it walks forward, so to speak. It’ll never even know that copy-on-write was performed.

Now we’re ready to talk about reflinks.

Reflinks are copy-on-write for the filesystem

If you read the section above, you already know 90% of what you need to know to understand and use reflinks.

A reflink is a copy of a file, except that copy isn’t really created on the hard-drive until the last possible moment. Like the forking version, this logic is invisible. You could do a reflink of a 10 gigabyte file, and the new “copy” would be created immediately, because the 10 gigabytes wouldn’t really be duplicated. They’ll only be duplicated once you start modifying one of the copies.

All the while, you could treat the reflink as if it was a completely legitimate copy of your original file.

How do you create reflinks?

On Linux, run the following:

$ cp --reflink old_file new_file

On Mac, there’s a different flag for some reason:

$ cp -c old_file new_file

If you’re creating reflinks programmatically, you could also use dedicated libraries such asthis one for Python.

When are reflinks useful?

Here’s an example of where I’ve used reflinks for a client of mine years back. They had a tool for developers that takes their entire codebase and copies it into a Docker container to run tests on it. (Don’t ask.)

That recursive copying took a while, and the developers couldn’t change their code in the meanwhile, or checkout any other branches, because then an inconsistent version of their code would be copied into the container. That was pretty annoying for me personally, because I was twiddling my thumbs whenever I started the test process.

I figured, why not use reflinks?

I wrote some Python code that creates reflinks to the code in a temporary folder, and then does a real copy from that temporary folder to the Docker container. The big advantage here is that as soon as the reflinks were created, I could modify the original code as much as I wanted, without affecting the tests.

Fortunately, all the developers were using Macs in that company, so I knew I didn’t have to worry about filesystem support.

How can reflinks go wrong?

You might be thinking, “What happens if I create a reflink of a huge file, that’s bigger than the amount of space I have available on the harddrive?”

I’ve never tried this, but here’s what I heard: The reflink will be created, but then you’ll get an error as soon as one of the copies will be changed, and an actual copy will need to be created. I haven’t tested this, but this is something you should take into account if you’re relying on reflinks in your business logic.

#planetpython #programming #filesystems

Symlinks and hardlinks, move over, make room for reflinks!

Written on June 8th, 2020 by

Ram Rachum

I’m a software developer based in Israel, specializing in the Python programming language. I write about technology, programming, startups, Python, and any other thoughts that come to my mind.

I’m sometimes available for freelance work in Python and Django . My expertise is in developing a product from scratch.

Older post


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

算法帝国

算法帝国

克里斯托弗•斯坦纳 / 李筱莹 / 人民邮电出版社 / 2014-6 / 49.00

人类正在步入与机器共存的科幻世界?看《纽约时报》畅销书作者讲述算法和机器学习技术如何悄然接管人类社会,带我们走进一个算法统治的世界。 今天,算法涉足的领域已经远远超出了其创造者的预期。特别是进入信息时代以后,算法的应用涵盖金融、医疗、法律、体育、娱乐、外交、文化、国家安全等诸多方面,显现出源于人类而又超乎人类的强大威力。本书是《纽约时报》畅销书作者的又一力作,通过一个又一个引人入胜的故事,向......一起来看看 《算法帝国》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具