Bloomfilters debunked: Dispelling 30 Years of math with Coq

栏目: IT技术 · 发布时间: 5年前

内容简介:There's this rather nifty feature of modern web browsers (such asWhile

There's this rather nifty feature of modern web browsers (such as Firefox or Chrome ) where the browser will automatically warn the user if they happen to navigate to a " malicious " url:

Bloomfilters debunked: Dispelling 30 Years of math with Coq

While conceptually simple, this feature actually requires more engineering effort than one would expect - in particular, tracking the set of known malicious URLs in a practical manner turns out to be somewhat difficult.

This is because, on one hand, trying to store the database of all known malicious URLs, which, bear in mind, may contain millions and millions of entries, is something that is just practically infeasible for most users.

Bloomfilters debunked: Dispelling 30 Years of math with Coq

Conversely, sending every URL that a user visits to some external service, where it could be logged and data-mined by nefarious third parties (i.e Google), is something that most users should probably not be comfortable with.

As it turns out, browsers actually implement this functionality by means of a rather interesting compromise .

Using a probabilistic data structure known as a Bloomfilter , Browsers maintain a approximate representation of the set of known malicious URLs locally. By querying this space-efficient local set, browsers will only send up a small proportion of URLs that have a high likelihood of actually being malicious.

Use a Bloomfilter to act as a local proxy for queries to an external database .

This proxy technique, which safe-guards the privacy of millions of users ever day, depends on two key properties of Bloomfilters:

No false negatives
This states that if a query for an URL in the Bloomfilter returns negative, then the queried item can be guaranteed to not be present in the set of malicious URLs - i.e the Bloomfilter is guaranteed to return a positive result for all known malicious URLs .
Low false positive rate
This property states that for any URL that is not in the set of malicious URLs, the likelihood of a Bloomfilter query returning a positive result should be fairly low - thus minimising the number of unnecessary infractions on user privacy.

This mechanism works quite well, and the guarantees of a Bloomfilter seem to be perfectly suited for this particular task, however it has one small problem, and that is:

The widely cited expression for the false positive rate of a bloomfilter is wrong!

In fact, as it turns out, the behaviours of a Bloomfilter have actually been the subject of 30 years of mathematical contention, requiring multiple corrections and even corrections of these corrections .

Given this history of errors, can we really have any certainty in our understanding of a Bloomfilter at all?

Well, never fear , I am writing this post to inform you that we have just recently used Coq to produce the very first certified proof of the false positive rate of a Bloomfilter, finally putting an end to this saga of errors and returning certainty (pardon the pun) to a mechanism that countless people rely on every single day.

In the rest of this post, we'll explore the main highlights of this research, answering the following questions:

  • What is a Bloomfilter?
  • Why did these errors initially arise?
  • What is the true behaviour of a Bloomfilter? and how can we be certain?

This research was just recently presented at CAV2020 under the title "Certifying Certainty and Uncertainty in Approximate Membership Query Structures", you can find the paper here , and a video presentation of it here .

The code and proofs used in this research project are FOSS, and can be found on the Ceramist repo: https://github.com/certichain/ceramist


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

奔跑吧 Linux内核

奔跑吧 Linux内核

张天飞 / 人民邮电出版社 / 2017-9-1 / CNY 158.00

本书内容基于Linux4.x内核,主要选取了Linux内核中比较基本和常用的内存管理、进程管理、并发与同步,以及中断管理这4个内核模块进行讲述。全书共分为6章,依次介绍了ARM体系结构、Linux内存管理、进程调度管理、并发与同步、中断管理、内核调试技巧等内容。本书的每节内容都是一个Linux内核的话题或者技术点,读者可以根据每小节前的问题进行思考,进而围绕问题进行内核源代码的分析。 本书内......一起来看看 《奔跑吧 Linux内核》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

随机密码生成器
随机密码生成器

多种字符组合密码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器