Bloomfilters debunked: Dispelling 30 Years of math with Coq

栏目: IT技术 · 发布时间: 5年前

内容简介:There's this rather nifty feature of modern web browsers (such asWhile

There's this rather nifty feature of modern web browsers (such as Firefox or Chrome ) where the browser will automatically warn the user if they happen to navigate to a " malicious " url:

Bloomfilters debunked: Dispelling 30 Years of math with Coq

While conceptually simple, this feature actually requires more engineering effort than one would expect - in particular, tracking the set of known malicious URLs in a practical manner turns out to be somewhat difficult.

This is because, on one hand, trying to store the database of all known malicious URLs, which, bear in mind, may contain millions and millions of entries, is something that is just practically infeasible for most users.

Bloomfilters debunked: Dispelling 30 Years of math with Coq

Conversely, sending every URL that a user visits to some external service, where it could be logged and data-mined by nefarious third parties (i.e Google), is something that most users should probably not be comfortable with.

As it turns out, browsers actually implement this functionality by means of a rather interesting compromise .

Using a probabilistic data structure known as a Bloomfilter , Browsers maintain a approximate representation of the set of known malicious URLs locally. By querying this space-efficient local set, browsers will only send up a small proportion of URLs that have a high likelihood of actually being malicious.

Use a Bloomfilter to act as a local proxy for queries to an external database .

This proxy technique, which safe-guards the privacy of millions of users ever day, depends on two key properties of Bloomfilters:

No false negatives
This states that if a query for an URL in the Bloomfilter returns negative, then the queried item can be guaranteed to not be present in the set of malicious URLs - i.e the Bloomfilter is guaranteed to return a positive result for all known malicious URLs .
Low false positive rate
This property states that for any URL that is not in the set of malicious URLs, the likelihood of a Bloomfilter query returning a positive result should be fairly low - thus minimising the number of unnecessary infractions on user privacy.

This mechanism works quite well, and the guarantees of a Bloomfilter seem to be perfectly suited for this particular task, however it has one small problem, and that is:

The widely cited expression for the false positive rate of a bloomfilter is wrong!

In fact, as it turns out, the behaviours of a Bloomfilter have actually been the subject of 30 years of mathematical contention, requiring multiple corrections and even corrections of these corrections .

Given this history of errors, can we really have any certainty in our understanding of a Bloomfilter at all?

Well, never fear , I am writing this post to inform you that we have just recently used Coq to produce the very first certified proof of the false positive rate of a Bloomfilter, finally putting an end to this saga of errors and returning certainty (pardon the pun) to a mechanism that countless people rely on every single day.

In the rest of this post, we'll explore the main highlights of this research, answering the following questions:

  • What is a Bloomfilter?
  • Why did these errors initially arise?
  • What is the true behaviour of a Bloomfilter? and how can we be certain?

This research was just recently presented at CAV2020 under the title "Certifying Certainty and Uncertainty in Approximate Membership Query Structures", you can find the paper here , and a video presentation of it here .

The code and proofs used in this research project are FOSS, and can be found on the Ceramist repo: https://github.com/certichain/ceramist


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

写给大家看的Web设计书

写给大家看的Web设计书

Robin Williams、John Tollett / 苏金国、刘亮 / 人民邮电出版社 / 201005 / 69.00元

在这个网络时代,Web设计几乎已经成为每个人生活的必备技能。如果你想自力更生创建一个网站,或者认为自己的网站在设计上还不尽如人意,希望它看上去更具创意和专业性,那么本书正是为你准备的! 作者Robin和John先采用通俗易懂的方式将有关基础知识娓娓道来,比如Internet、搜索信息、构建简单网页等,然后为我们奉上了精妙的技巧、技术和设计示例来启发大家的设计灵感,比如Web基本设计原则、实现......一起来看看 《写给大家看的Web设计书》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换