内容简介:An offline-first, distributed, syncable, embedded document database for use in p2p softwareAlpha - do not use for important data yet
Earthstar
An offline-first, distributed, syncable, embedded document database for use in p2p software
Alpha - do not use for important data yet
Discuss Earthstar at #earthstar:matrix.org
Related tools:
- earthstar-cli -- command line utility
- earthstar-pub -- server to help you sync
- earthstar-wiki -- an example app built with Earthstar
Other docs:
- Earthstar concepts and vocabulary -- more details about workspaces, etc
Motivation
Centralized services like Facebook have lots of problems: privacy, surveillance, addiction, dark patterns, ads, "real name" policies that harm minority groups, poor handling of moderation, bots, ...
These problems come from 2 root causes:
- They have a profit motive , so their purpose is to extract value from users.
- They're big , so they're a bureaucracy that can't understand the nuances of the diverse world population they serve, and they're unresponsive to demands from their users.
We'd be better off with communication services that are:
- Not for profit - run by volunteers on small, cheap servers
- Small - close to their users, locally governed, more democratic, understanding local norms and culture
This is the "federated" model, like Mastodon: instead of one big Twitter, make a thousand small Twitters.
But this has problems too -- volunteer-run servers don't last forever, and when they shut down their users lose their communities.
Earthstar goes one step further: you can communicate without servers at all, directly from device to device; or you can run Earthstar on servers ("called pubs") which help sync your data but have no authority over your account. You can easily run your own servers for yourself and your friends. Instructions coming soon.
All the authority in the system rests in the users and is enforced by cryptography, not central servers. Since servers only exist to help with syncing, and your data and identity are independent of the servers, you can easily use as many or as few servers as you want, and move between them without losing anything.
Earthstar's philosophy is that apps should be tools, not services . Authority no longer rests in a central server, it rests in each user.
Data model
An Earthstar workspace holds mutable documents with unique paths (ids), similar to leveldb or CouchDb.
There are no transactions. The unit of atomic change is writing a value to one path. Causal order is not preserved for edits across multiple paths.
Block diagram
There isn't a formal written spec yet - it will be written once things stabilize.
Scope, typical usage
Each user will have their own instance of an Earthstar database, maybe in their browser or embedded in an Electron app. There might also be some instances on cloud servers ("pubs"). The databases can sync with each other across HTTP or duplex stream connections.
Each database instance can hold a subset of the entire data, specified by a query such as "paths starting with a certain substring", "paths written after a certain timestamp", etc.
The universe of data is divided into workspaces -- collections of documents accessed by certain users. Syncing happens one workspace at a time.
Read the vocabulary and concepts documentation for more about workspaces.
Editing data; multiple users and devices; conflicts
A single author can use multiple devices at the same time. You "log into" a device by providing your public and private key, like a username and password.
Data is mutable. Authors can update paths with new values. The old data is thrown away and doesn't take up space. You can also delete paths (soon); this replaces them with a tombstone value which is kept around.
There is a write permission system to allow only certain authors to write to certain paths. Other paths can be written by anyone.
Soon, there will also be a way to write immutable messages, for use cases where you need those.
Conflicts from a single author (on multiple devices) are resolved by timestamp and the old data is discarded. Conflicts from multiple authors are resolved by timestamp by default, but the old data is kept to allow for manual conflict resolution if needed.
Security properties
Each author is identified by a public key.
Data updates are signed by the author, so it's safe for untrusted peers to help host and propagate data. (Soon) you will also be able to encrypt data so untrusted peers can host it without reading it.
Malicious peers can:
- Withold updates to individual documents during a sync
- See the data (until encryption is implemented, soon)
- Know which peers they're connecting to
Malicious peers cannot:
- Alter documents from another author
- Forge new documents signed by another author
Although malicious peers can withold individual documents, as long as there are honest peers around you will eventually get the latest verison of everything.
Because it allows partial sync and mutation of data, Earthstar doesn't cryptographically guarantee that you have a complete dataset with no gaps from a certain author. Authors could chose to use sequential paths or embed a blockchain of hash backlinks into their messages, and you could verify these in your app (outside of Earthstar).
Comparisons
Like leveldb, but:
- multi-author
- it syncs
Like CouchDb, but:
- with signatures and encryption, so untrusted nodes can help move data around.
Like Scuttlebutt, but:
- one author can use multiple devices
- data is mutable
- instead of an append-only log, it's a set of mutable items
- supports partial replication
- will sync over HTTP, not just TCP, so it's easy to host in places like Glitch.com
- has less strict data guarantees -- you can't tell if you have someone's entire dataset or not
Like DAT, but:
- multi-author, multi-device
- simpler
Like IPFS, but:
- multi-author, multi-device
- mutable
Like Firebase, but:
- open source
- distributed
Sometimes immutability is needed, like if you're running a package registry or want an audit log. For social-network style use cases, though, immutability can be a privacy liability instead of a desirable guarantee.
Use cases
Earthstar can be used for invite-only tools where you have all the data (think Slack) and large open-world tools where you only replicate part of the entire dataset (think SSB). Apps tell Earthstar what paths and what authors to replicate.
These styles of app could be built on top of Earthstar:
- wikis
- chat: Slack, IRC
- social: Facebook, Scuttlebutt, Discourse, LiveJournal, Forums
- microblogging: Twitter, Mastodon
- productivity: Trello, GitHub Issues, Asana, Todo lists
- office: Google sheets
- art: Collaborative drawing or music tools, multi-user Twine for making interactive fiction games
What doesn't it do?
Earthstar doesn't handle multi-user text editing like Google Docs. This is a different challenge. You could probably use a CRDT like Automerge and rely on Earthstar to move and sync the patches it produces.
It will eventually stream live changes, but it's not designed for low-latency live data.
The cryptography is not audited yet.
Earthstar is not focused on anonymity -- more on autonomy. You could use it over Tor or something.
Earthstar doesn't provide transactions or causality guarantees.
Moderation and blocking support is not built in, but apps can build it on top of Earthstar.
Example
The main interface to Earthstar is the Storage class:
// A Store is all about a single workspace. // Workspaces are separate universes of data // that don't sync with each other. // You also choose which feed formats you want to support // by supplying validators (one for each format). constructor(validators : IValidator[], workspace : string) // onChange is called whenever any data changes. // it doesn't yet send any details about the changes. // subscribe with onChange.subscribe(...cb...); onChange: Emitter<undefined>; // look up a path and get the corresponding value... getValue(path: string): string | undefined; // or get the whole document, which is an object with more details (author, timestamp...) getDocument(path: string): Document | undefined; // query with a variety of options - filter by paths and authors, etc documents(query?: QueryOpts): Document[]; paths(query?: QueryOpts): string[]; values(query?: QueryOpts): string[]; // list all authors who have written authors(): EncodedKey[]; // write a document to the database, which will be signed by your author key. set(keypair: Keypair, docToSet: DocToSet): boolean; // try to import an doc from another Store. ingestDocument(doc: Document): boolean; // basic sync algorithm. a faster one will be made later. sync(otherStore: IStorage, opts?: SyncOpts): SyncResults;
Usage example:
const { StorageMemory, ValidatorEs2, generateAuthorKeypair } = require('earthstar'); // Create a database for a particular workspace, '//gardening.xxxxxxxx' // We've chosen to use the latest 'es.2' feed format so we supply the matching validator. let storage = new StorageMemory([ValidatorEs2], '//gardening.xxxxxxxx'); // Make up some authors for testing. // A keypair is { address: '@aaa.xxx', secret: 'xxx' }. let keypair1 = generateAuthorKeypair('aaaa'); let keypair2 = generateAuthorKeypair('bbbb'); let author1 = keypair1.address; let author2 = keypair2.address; // It's a key-value store. storage.set(keypair1, { format: 'es.2', path: '/wiki/Strawberry', value: 'Tasty' }); storage.getValue('/wiki/Strawberry'); // --> 'Tasty' // One author can use multiple devices with no problems. // Conflicts are resolved by timestamp. // Here the same author overwrites their previous value, // which is forgotten from the database. storage.set(keypair1, { format: 'es.2', path: '/wiki/Strawberry', value: 'Tasty!!' }); storage.getValue('wiki/Strawberry'); // --> 'Tasty!!' // Multiple authors can overwrite each other (also by timestamp). storage.set(keypair2, { format: 'es.2', path: '/wiki/Strawberry', value: 'Yum' }); storage.getValue('wiki/Strawberry'); // --> 'Yum' // We keep the one most recent value from each author, in case // you need to do better conflict resolution later. // To see the old values, use a query: storage.values({ path: 'wiki/Strawberry', includeHistory: true }); // --> ['Yum', 'Tasty!!'] // newest first // Get more context about a document, besides just the value. storage.getDocument('wiki/Strawberry'); /* --> { format: 'es.2', workspace: '//gardening.xxxxxxxx', path: 'wiki/Strawberry', value: 'Yum', author: '@aaa.xxx', timestamp: 1503982037239, // in microseconds: Date.now()*1000 signature: 'xxxxxxxx.sig.ed25519', } */ // WRITE PERMISSIONS // // Paths can specify which authors are allowed to write to them. // Author names that occur prefixed by '~' in a path can write to that path. // // Examples: // One author write permission: // '/about/~@aaa/name' -- only @aaa can write here. // '/about/~@aaa/follows/@bbb' -- only @aaa can write here // Public: // '/wall/@aaa' -- no tilde, so anyone can write here // '/wiki/Kittens' -- no tilde, so anyone can write here // Multiple authors: // '/whiteboard/~@aaa~@bbb' -- both @aaa and @bbb can write here // // Here we'll set the author's display name storage.set(keypair1, { format: 'es.2', path: '/about/~' + keypair1.address + '/name', value: 'Suzie', }); // You can do leveldb style queries. storage.paths() storage.paths({ lowPath: '/abc', limit: 100 }) storage.paths({ pathPrefix: '/wiki/' }) // You can sync to another Storage with the same workspace address let storage2 = new StorageMemory([ValidatorEs2], '//gardening.xxxxxxxx'); storage.sync(storage2); // Now storage and storage2 are identical. // Get notified when anything changes. let unsub = storage.onChange.subscribe(() => console.log('something changed')); // Later, you can turn off your subscription unsub();
Details and notes
Signatures
Document hashes are used within signatures and to link to specific versions of a doc.
There's a simple canonical way to hash a doc: mostly you just concat all the fields in a predefined order:
sha256([ doc.schema, doc.workspace, doc.path, sha256(doc.value), '' + doc.timestamp, doc.author, ].join('\n')).hexDigest();
None of those fields are allowed to contain newlines (except value, which is hashed for that reason) so newlines are safe to use as a delimiter.
To sign a doc, you sign its hash with the author's secret key.
Note that docs only ever have to get transformed INTO this representation just before hashing -- we never need to convert from this representation BACK into a real doc.
There is no canonical "feed encoding" - only the canonical way to hash an doc, above. Databases and network code can represent the doc in any way they want.
The hash and signature specification may change as the schema evolves beyond es.2
. For example there may be another schema ssb.1
which wraps and embeds SSB messages and knows how to validate their signatures.
Sync over duplex streams:
Here's a very simple but inefficient algorithm to start with:
sort paths by (path, timestamp DESC, signature ASC) filter by my interest query while true: both sides send next 100 docs to each other as (path, timestamp, signature) both sides figure out what the other needs and send it in full both sides send 'ok'
Sync over HTTP when only one peer is publicly available:
Here's a very simple but inefficient algorithm to start with:
the client side is in charge and does these actions: sort paths by (path, timestamp DESC, signature ASC) filter by my interest query GET /interest-query the server's interest query while true: GET /doc-versions?after='foo/bar'&limit=100 from server compare with my own docs pull things I need POST /batch-get-docs paths=[foo, foo/a, foo/b, ...] push things they need: POST /batch-ingest-docs docs=[ {...}, {...} ]
Note that everything happens in batches instead of infinite streams. The code avoids using streams.
For efficient sync, we'll use something similar to this set reconciliation algorithm described by Aljoscha Meyer .
HTTP API
This is out of date -- See earthstar-pub for the latest
HTTP replication: GET /interest-query the server's interest query return a list of query objects -- what does the server want? GET /doc-versions [after='...'] [limit=100] [before='...'] return array of docs with just path, version, first N bytes of signature POST /batch-get-docs paths=[foo, foo/a, foo/b, ...] return complete docs POST /batch-ingest-docs docs=[ {...}, {...} ] ask server to write these docs return 'ok' or error
Future plans
Immutable documents by putting "(hash)" in the path
You'll be able to put the literal string " (hash)
" in a path. It will be replaced with the hash of that document. Since no other document can have the same path, it will be impossible to overwrite or delete this document, making it immutable.
- Signatures are based on the shorthand path
- Upon ingestion to the db, replace
(hash)
with the actual hash of the document - Database lookups use the full expanded path
- Across the network, we only need to send the shorthand path. The expanded path is derived state.
-
(hash)
lets you create docs that can never be overwritten, because you can never make another doc with the same hash. - The expanded path needs to contain a character that's not allowed in regular paths, so the only way to get that path is via
(hash)
. e.g. a percent sign like(hash)
-->%a93jDj39sdH...
Path overlays
- It would be easy make an overlay view of different path prefixes ("folders") within a Storage
- Or overlay two Storage instances
- For example you could combine
/wiki/~@a.xxxx
and/wiki/~@b.xxxx
into one namespace
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Base64 编码/解码
Base64 编码/解码
HEX HSV 转换工具
HEX HSV 互换工具