Earthstar - Offline-first, distributed, syncable, embedded document database for use in p2p

栏目: IT技术 · 发布时间: 4年前

内容简介:An offline-first, distributed, syncable, embedded document database for use in p2p softwareAlpha - do not use for important data yet

Earthstar

Earthstar - Offline-first, distributed, syncable, embedded document database for use in p2p

An offline-first, distributed, syncable, embedded document database for use in p2p software

Alpha - do not use for important data yet

Discuss Earthstar at #earthstar:matrix.org

Related tools:

Other docs:

Motivation

Centralized services like Facebook have lots of problems: privacy, surveillance, addiction, dark patterns, ads, "real name" policies that harm minority groups, poor handling of moderation, bots, ...

These problems come from 2 root causes:

  • They have a profit motive , so their purpose is to extract value from users.
  • They're big , so they're a bureaucracy that can't understand the nuances of the diverse world population they serve, and they're unresponsive to demands from their users.

We'd be better off with communication services that are:

  • Not for profit - run by volunteers on small, cheap servers
  • Small - close to their users, locally governed, more democratic, understanding local norms and culture

This is the "federated" model, like Mastodon: instead of one big Twitter, make a thousand small Twitters.

But this has problems too -- volunteer-run servers don't last forever, and when they shut down their users lose their communities.

Earthstar goes one step further: you can communicate without servers at all, directly from device to device; or you can run Earthstar on servers ("called pubs") which help sync your data but have no authority over your account. You can easily run your own servers for yourself and your friends. Instructions coming soon.

All the authority in the system rests in the users and is enforced by cryptography, not central servers. Since servers only exist to help with syncing, and your data and identity are independent of the servers, you can easily use as many or as few servers as you want, and move between them without losing anything.

Earthstar's philosophy is that apps should be tools, not services . Authority no longer rests in a central server, it rests in each user.

Data model

An Earthstar workspace holds mutable documents with unique paths (ids), similar to leveldb or CouchDb.

There are no transactions. The unit of atomic change is writing a value to one path. Causal order is not preserved for edits across multiple paths.

Earthstar - Offline-first, distributed, syncable, embedded document database for use in p2p

Block diagram

Earthstar - Offline-first, distributed, syncable, embedded document database for use in p2p

There isn't a formal written spec yet - it will be written once things stabilize.

Scope, typical usage

Each user will have their own instance of an Earthstar database, maybe in their browser or embedded in an Electron app. There might also be some instances on cloud servers ("pubs"). The databases can sync with each other across HTTP or duplex stream connections.

Each database instance can hold a subset of the entire data, specified by a query such as "paths starting with a certain substring", "paths written after a certain timestamp", etc.

The universe of data is divided into workspaces -- collections of documents accessed by certain users. Syncing happens one workspace at a time.

Read the vocabulary and concepts documentation for more about workspaces.

Editing data; multiple users and devices; conflicts

A single author can use multiple devices at the same time. You "log into" a device by providing your public and private key, like a username and password.

Data is mutable. Authors can update paths with new values. The old data is thrown away and doesn't take up space. You can also delete paths (soon); this replaces them with a tombstone value which is kept around.

There is a write permission system to allow only certain authors to write to certain paths. Other paths can be written by anyone.

Soon, there will also be a way to write immutable messages, for use cases where you need those.

Conflicts from a single author (on multiple devices) are resolved by timestamp and the old data is discarded. Conflicts from multiple authors are resolved by timestamp by default, but the old data is kept to allow for manual conflict resolution if needed.

Security properties

Each author is identified by a public key.

Data updates are signed by the author, so it's safe for untrusted peers to help host and propagate data. (Soon) you will also be able to encrypt data so untrusted peers can host it without reading it.

Malicious peers can:

  • Withold updates to individual documents during a sync
  • See the data (until encryption is implemented, soon)
  • Know which peers they're connecting to

Malicious peers cannot:

  • Alter documents from another author
  • Forge new documents signed by another author

Although malicious peers can withold individual documents, as long as there are honest peers around you will eventually get the latest verison of everything.

Because it allows partial sync and mutation of data, Earthstar doesn't cryptographically guarantee that you have a complete dataset with no gaps from a certain author. Authors could chose to use sequential paths or embed a blockchain of hash backlinks into their messages, and you could verify these in your app (outside of Earthstar).

Comparisons

Like leveldb, but:

  • multi-author
  • it syncs

Like CouchDb, but:

  • with signatures and encryption, so untrusted nodes can help move data around.

Like Scuttlebutt, but:

  • one author can use multiple devices
  • data is mutable
  • instead of an append-only log, it's a set of mutable items
  • supports partial replication
  • will sync over HTTP, not just TCP, so it's easy to host in places like Glitch.com
  • has less strict data guarantees -- you can't tell if you have someone's entire dataset or not

Like DAT, but:

  • multi-author, multi-device
  • simpler

Like IPFS, but:

  • multi-author, multi-device
  • mutable

Like Firebase, but:

  • open source
  • distributed

Sometimes immutability is needed, like if you're running a package registry or want an audit log. For social-network style use cases, though, immutability can be a privacy liability instead of a desirable guarantee.

Use cases

Earthstar can be used for invite-only tools where you have all the data (think Slack) and large open-world tools where you only replicate part of the entire dataset (think SSB). Apps tell Earthstar what paths and what authors to replicate.

These styles of app could be built on top of Earthstar:

  • wikis
  • chat: Slack, IRC
  • social: Facebook, Scuttlebutt, Discourse, LiveJournal, Forums
  • microblogging: Twitter, Mastodon
  • productivity: Trello, GitHub Issues, Asana, Todo lists
  • office: Google sheets
  • art: Collaborative drawing or music tools, multi-user Twine for making interactive fiction games

What doesn't it do?

Earthstar doesn't handle multi-user text editing like Google Docs. This is a different challenge. You could probably use a CRDT like Automerge and rely on Earthstar to move and sync the patches it produces.

It will eventually stream live changes, but it's not designed for low-latency live data.

The cryptography is not audited yet.

Earthstar is not focused on anonymity -- more on autonomy. You could use it over Tor or something.

Earthstar doesn't provide transactions or causality guarantees.

Moderation and blocking support is not built in, but apps can build it on top of Earthstar.

Example

The main interface to Earthstar is the Storage class:

// A Store is all about a single workspace.
// Workspaces are separate universes of data
// that don't sync with each other.
// You also choose which feed formats you want to support
// by supplying validators (one for each format).
constructor(validators : IValidator[], workspace : string)

// onChange is called whenever any data changes.
// it doesn't yet send any details about the changes.
// subscribe with onChange.subscribe(...cb...);
onChange: Emitter<undefined>;

// look up a path and get the corresponding value...
getValue(path: string): string | undefined;
// or get the whole document, which is an object with more details (author, timestamp...)
getDocument(path: string): Document | undefined;

// query with a variety of options - filter by paths and authors, etc
documents(query?: QueryOpts): Document[];
paths(query?: QueryOpts): string[];
values(query?: QueryOpts): string[];

// list all authors who have written
authors(): EncodedKey[];

// write a document to the database, which will be signed by your author key.
set(keypair: Keypair, docToSet: DocToSet): boolean;

// try to import an doc from another Store.
ingestDocument(doc: Document): boolean;

// basic sync algorithm.  a faster one will be made later.
sync(otherStore: IStorage, opts?: SyncOpts): SyncResults;

Usage example:

const { StorageMemory, ValidatorEs2, generateAuthorKeypair } = require('earthstar');

// Create a database for a particular workspace, '//gardening.xxxxxxxx'
// We've chosen to use the latest 'es.2' feed format so we supply the matching validator.
let storage = new StorageMemory([ValidatorEs2], '//gardening.xxxxxxxx');

// Make up some authors for testing.
// A keypair is { address: '@aaa.xxx', secret: 'xxx' }.
let keypair1 = generateAuthorKeypair('aaaa');
let keypair2 = generateAuthorKeypair('bbbb');
let author1 = keypair1.address;
let author2 = keypair2.address;

// It's a key-value store.
storage.set(keypair1, { format: 'es.2', path: '/wiki/Strawberry', value: 'Tasty' });
storage.getValue('/wiki/Strawberry'); // --> 'Tasty'

// One author can use multiple devices with no problems.
// Conflicts are resolved by timestamp.
// Here the same author overwrites their previous value,
// which is forgotten from the database.
storage.set(keypair1, { format: 'es.2', path: '/wiki/Strawberry', value: 'Tasty!!' });
storage.getValue('wiki/Strawberry'); // --> 'Tasty!!'

// Multiple authors can overwrite each other (also by timestamp).
storage.set(keypair2, { format: 'es.2', path: '/wiki/Strawberry', value: 'Yum' });
storage.getValue('wiki/Strawberry'); // --> 'Yum'

// We keep the one most recent value from each author, in case
// you need to do better conflict resolution later.
// To see the old values, use a query:
storage.values({ path: 'wiki/Strawberry', includeHistory: true });
// --> ['Yum', 'Tasty!!']  // newest first

// Get more context about a document, besides just the value.
storage.getDocument('wiki/Strawberry');
/* --> {
    format: 'es.2',
    workspace: '//gardening.xxxxxxxx',
    path: 'wiki/Strawberry',
    value: 'Yum',
    author: '@aaa.xxx',
    timestamp: 1503982037239,  // in microseconds: Date.now()*1000
    signature: 'xxxxxxxx.sig.ed25519',
} */

// WRITE PERMISSIONS
//
// Paths can specify which authors are allowed to write to them.
// Author names that occur prefixed by '~' in a path can write to that path.
//
// Examples:
// One author write permission:
//   '/about/~@aaa/name'  -- only @aaa can write here.
//   '/about/~@aaa/follows/@bbb'  -- only @aaa can write here
// Public:
//   '/wall/@aaa'  -- no tilde, so anyone can write here
//   '/wiki/Kittens'  -- no tilde, so anyone can write here
// Multiple authors:
//   '/whiteboard/~@aaa~@bbb'  -- both @aaa and @bbb can write here
//
// Here we'll set the author's display name
storage.set(keypair1, {
    format: 'es.2',
    path: '/about/~' + keypair1.address + '/name',
    value: 'Suzie',
});

// You can do leveldb style queries.
storage.paths()
storage.paths({ lowPath: '/abc', limit: 100 })
storage.paths({ pathPrefix: '/wiki/' })

// You can sync to another Storage with the same workspace address
let storage2 = new StorageMemory([ValidatorEs2], '//gardening.xxxxxxxx');
storage.sync(storage2);
// Now storage and storage2 are identical.

// Get notified when anything changes.
let unsub = storage.onChange.subscribe(() => console.log('something changed'));

// Later, you can turn off your subscription
unsub();

Details and notes

Signatures

Document hashes are used within signatures and to link to specific versions of a doc.

There's a simple canonical way to hash a doc: mostly you just concat all the fields in a predefined order:

sha256([
        doc.schema,
        doc.workspace,
        doc.path,
        sha256(doc.value),
        '' + doc.timestamp,
        doc.author,
    ].join('\n')).hexDigest();

None of those fields are allowed to contain newlines (except value, which is hashed for that reason) so newlines are safe to use as a delimiter.

To sign a doc, you sign its hash with the author's secret key.

Note that docs only ever have to get transformed INTO this representation just before hashing -- we never need to convert from this representation BACK into a real doc.

There is no canonical "feed encoding" - only the canonical way to hash an doc, above. Databases and network code can represent the doc in any way they want.

The hash and signature specification may change as the schema evolves beyond es.2 . For example there may be another schema ssb.1 which wraps and embeds SSB messages and knows how to validate their signatures.

Sync over duplex streams:

Here's a very simple but inefficient algorithm to start with:

sort paths by (path, timestamp DESC, signature ASC)
    filter by my interest query
    while true:
        both sides send next 100 docs to each other as (path, timestamp, signature)
        both sides figure out what the other needs and send it in full
        both sides send 'ok'

Sync over HTTP when only one peer is publicly available:

Here's a very simple but inefficient algorithm to start with:

the client side is in charge and does these actions:
    sort paths by (path, timestamp DESC, signature ASC)
    filter by my interest query
    GET /interest-query   the server's interest query
    while true:
        GET /doc-versions?after='foo/bar'&limit=100 from server
        compare with my own docs
        pull things I need
            POST /batch-get-docs       paths=[foo, foo/a, foo/b, ...]
        push things they need:
            POST /batch-ingest-docs    docs=[ {...}, {...} ]

Note that everything happens in batches instead of infinite streams. The code avoids using streams.

For efficient sync, we'll use something similar to this set reconciliation algorithm described by Aljoscha Meyer .

HTTP API

This is out of date -- See earthstar-pub for the latest

HTTP replication:

GET /interest-query   the server's interest query
    return a list of query objects -- what does the server want?
GET /doc-versions [after='...'] [limit=100] [before='...']
    return array of docs with just path, version, first N bytes of signature
POST /batch-get-docs       paths=[foo, foo/a, foo/b, ...]
    return complete docs
POST /batch-ingest-docs    docs=[ {...}, {...} ]
    ask server to write these docs
    return 'ok' or error

Future plans

Immutable documents by putting "(hash)" in the path

You'll be able to put the literal string " (hash) " in a path. It will be replaced with the hash of that document. Since no other document can have the same path, it will be impossible to overwrite or delete this document, making it immutable.

  • Signatures are based on the shorthand path
  • Upon ingestion to the db, replace (hash) with the actual hash of the document
  • Database lookups use the full expanded path
  • Across the network, we only need to send the shorthand path. The expanded path is derived state.
  • (hash) lets you create docs that can never be overwritten, because you can never make another doc with the same hash.
  • The expanded path needs to contain a character that's not allowed in regular paths, so the only way to get that path is via (hash) . e.g. a percent sign like (hash) --> %a93jDj39sdH...

Path overlays

  • It would be easy make an overlay view of different path prefixes ("folders") within a Storage
  • Or overlay two Storage instances
  • For example you could combine /wiki/~@a.xxxx and /wiki/~@b.xxxx into one namespace

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

奔腾年代

奔腾年代

郭万盛 / 中信出版集团 / 2018-9-1 / 68.00

1994年4月20日,一条64K国际专线将中关村地区教育与科研示范网络与互联网连在了一起。中国,成为第77个全功能接入互联网的国家。 1995年,中国互联网正式开始商业化应用。浪潮开始! 这是一个波澜壮阔的年代,带给我们翻天覆地的变化。中国互联网25年发展史的全景展示、忠实梳理和记录。 在更宏观的角度审视互联网与中国的关系,人们将会发现,互联网革命给中国带来了重新崛起的时代机遇。......一起来看看 《奔腾年代》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具