Merge branch 'ipfs' into 'master'

Ipfs

See merge request arnoldjun/bluesky!71
This commit is contained in:
Jay Graber 2020-06-15 17:16:47 +00:00
commit a2ad7de88f
2 changed files with 85 additions and 54 deletions

View File

@ -1,75 +1,111 @@
# IPFS
## Overview
IPFS is a [content-addressed protocol](https://docs.ipfs.io/concepts/content-addressing/) for peer-to-peer hypermedia storage and distribution. The IPFS network is mainly used as a storage layer for decentralized applications. Its main purpose is to add, share, find, and transfer files in a globally distributed file system. Components of IPFS, such as [libp2p](https://libp2p.io/), the p2p networking library, and [IPLD](https://ipld.io/), the data model for content-addressed Merkle DAGs, are used separately by applications that do not participate in the IPFS network.
IPFS is a content-addressed protocol for peer-to-peer hypermedia storage and distribution. It builds on top of libp2p (as the peer-to-peer transport layer) and IPLD (as the data model for content-addressed merkle dags). Its main purpose is to add, share, find, and transfer files in a globally distributed file system without any central controller. IPFS is entirely open source, as are libp2p and IPLD - supported by an OSS community of over 4 thousand contributors. The IPFS protocol continues to evolve through the main reference implementations in go and JS, stewarded by the core implementations working group.
The IPFS Alpha launched in February 2015, and the ecosystem has since grown to serve millions of monthly users through hundreds of applications including social networking platforms (Peepeth, Akasha, Peergos), content distribution networks (Dtube, Everipedia, Audius), decentralized identity solutions (Microsoft ION, 3box, ENS) and a host of other projects (Textile, Terminal, Infura, AnyType, etc). The majority of users today use IPFS as a content addressable storage layer to back their decentralized applications, using libp2p for networking and IPLD as a low-level data model. Many groups, like Textile, orbitdb, and 3box, build additional layers of tooling on top of IPFS to support a wider range of developers.
![Ecosystem](https://ipfs.io/ipfs/QmSKi1BCzVmiPPSFNbTN5FafZxA1kLxsKyLQtQrAQEVS3H?filename=Ecosystem%20Diagram.png)
In order to build a universal, inclusive, resilient and sustainable network for human knowledge and information, we need a network that provides technical features like:
Ability to connect to any other user in the network, independently of the terminal that the user is using (e.g. Browser, Mobile, VR, Desktop and so on)
Ability to verify receiving and sending information to the desired destination, without requiring to reveal what that data is or who the destination is.
Ability to verify the integrity of the information received as consumers of the network
Support for the creation of applications and businesses
Support for an ever growing number of devices and users
Ability to adapt and evolve to adjust to new needs (future-proofing)
These challenges need to be solved at the network fabric level in order to preserve a baseline of what the values of the network are.
### Network architecture & Connectivity
IPFS is fully peer-to-peer. When joining the network, nodes bootstrap off of long-lived peers or those in their local area network. A node running the IPFS protocol can opt-in to be part of the main public network and/or some other alternative network, either independently or simultaneously. Nodes joining the main public network will join the DHT as either clients (consumers) or servers (participants in providing the content routing service), or by directly connecting to nodes theyre interested in either by peer id or through subscribing to relevant pubsub channels. All nodes in the IPFS network use libp2p to arrange peer-to-peer connections, a project that is now used by other decentralized networks such as Polkadot, ETH2 and more (including Matrix, which is experimenting with it to become full p2p).
IPFS is a distributed protocol, what this means is that every single node gets to participate in the network in whatever capacity they so desire, enabling different kinds of network topologies to emerge (Distributed, Decentralized, Federated, Centralized, Full Mesh and so on).
Peers dial to each other using a multiaddr, a self-describing address that lets peers know a nodes prefered way to be dialed. All connections in IPFS are end-to-end encrypted and authenticated using modern cryptographic primitives (Public/Private Key Crypto).
The IPFS Alpha launched in February 2015. IPFS protocols are designed for upgradeability.
### Identity
IPFS is frequently used as a content addressed storage system for decentralized identity solutions (like Microsoft ION for example). This allows IPFS to flexibly serve a variety of opinionated privacy constraints and application-layer preferences around reputation, trust, and anonymity.
IPFS nodes have a peer ID, the [hash of their public/private keypair](https://docs.libp2p.io/concepts/peer-id/). When peers connect, they exchange public keys and check to make sure they match the node IDs. Communications are encrypted using these keys. Node IDs are pseudonymous, and can be reset as needed to maintain privacy. Node private keys are stored in the IPFS config by default.
At the protocol layer, IPFS peer ids are pseudonymous - and can be reset as needed for increased privacy. IPNS and provider records both rely on time-based revocation after which records expire.
IPFS can serve as a content addressed storage system for decentralized identity solutions (such as [Microsoft's ION](https://techcommunity.microsoft.com/t5/identity-standards-blog/ion-booting-up-the-network/ba-p/1441552)). [IPID](https://github.com/jonnycrunch/ipid) is an implementation of the DID (decentralized identifiers) specification over IPFS, using [IPNS](https://docs.ipfs.io/concepts/ipns/).
### Data layer
### Network Architecture & Connectivity
IPFS is not just for files, in fact, the IPFS data layer is much more close to a graph database with elements of linked data then a file system. IPFS uses IPLD for representing any piece of data available in the network. This is a low level data structure that empowers many higher level abstractions to be built on top of it - like the orbitdb database, or Textile threads. These application-layer data models also often include privacy-preserving aspects like end-to-end encryption and access controls (like the boom.fyi exploding links). IPFS content identifiers (CIDs) are immutable, however many groups also build mutability layers on IPFS using public key cryptography - our native version of this is called IPNS, but many other versions are also compatible and supported (ENS, etc).
IPFS is a distributed protocol. Every node gets to participate in the network in any configuration, enabling different kinds of network topologies to emerge. Nodes can connect to each other independently of the client (browser, mobile, desktop, command line).
IPFS doesnt have any built-in persistence incentives, however it is compatible with many. Users tend to store their own data by pinning it to their local IPFS nodes, small communities collaborate in backing up relevant data to the group through tools like collaborative clusters, enterprises use pinning services like Infura to pay 3rd parties to store their data on IPFS and ensure fast reliability, and there is a growing contingent of decentralized incentive layers including Storj, Filecoin, and others.
When nodes join the network, they bootstrap off of long-lived peers or those in their local area network. IPFS comes with a [default list of trusted peers](https://docs.ipfs.io/how-to/modify-bootstrap-list/), which can be modified. A node can opt-in to be part of the main public network and/or an alternative network. Nodes joining the main public network will join the [DHT](https://docs.ipfs.io/concepts/dht/) as either clients (consumers) or servers (providers of the content routing service). Nodes can also directly connect to peers theyre interested in either through peer ID or through subscribing to relevant pubsub channels.
Since each user is responsible for pinning their own data, deleting data just requires all the users with the data to stop hosting it. This is good for censorship-resistance (individual nodes with valuable data being blacklisted by an authoritarian regime wont delete all copies), and also good for agreed content deletion (where all data hosts can unpin or avoid resolving content deemed bad).
All nodes in the IPFS network use [libp2p](https://libp2p.io/), the modular networking library, to make peer-to-peer connections. Libp2p is [transport agnostic](https://docs.libp2p.io/concepts/transport/), leaving the choice of transport protocol up to the developer, and allowing an application to support many different transports at the same time. Peers dial to each other using a [multiaddr](https://multiformats.io/multiaddr), a self-describing network address that lets peers know a nodes preferred way to be dialed. The use of multiaddrs is intended to future-proof addresses, and allow multiple transport protocols and addresses to coexist. All connections in IPFS are end-to-end encrypted and authenticated using public/private key cryptography.
### Monetization & Business models
[Gateways](https://docs.ipfs.io/concepts/ipfs-gateway/) allow IPFS to be accessed over HTTP, which makes content stored in IPFS accessible through a standard browser.
IPFS is fully open source and free to use. Each individual in the network is responsible for persisting the data they care about by either adding their own resources (run a node) or incentivizing another group to persist their data (pay a pinning service). Therefore, the network grows in capacity as new users join. Services that rely on IPFS are incentivized to participate in the public DHT as servers to improve performance and availability of their data - and all participating nodes help with peer-to-peer data transfer and routing.
### Data Layer
A number of developers in the IPFS ecosystem are supported by companies who have raised money to build projects on top of IPFS - including Protocol Labs, Textile, Anytype, Infura, 3box, Audius, and many others. While many open source developers contribute pro bono part time, other groups are funded through grants or bounties by one or many of these organizations.
IPFS is commonly known as a distributed file system, but the data layer is closer to a graph database with elements of linked data. IPFS uses [IPLD](https://ipld.io/) for representing any piece of data available in the network. The IPLD data model treats all hash-linked data structures as subsets of a unified information space. Higher level abstractions can be built on top of it, like the OrbitDB database, or Textile threads.
### Curation/Discovery
Files are located in the IPFS network by their [CID](https://docs.ipfs.io/concepts/content-addressing/), a content identifier that is based on the hash of the file. The hash of a file does not change, so this form of addressing does not allow updates. However, mutable addresses can be built on top by using the hash of a public key as an address. IPFS's native version is [IPNS (InterPlanetary Name System)](https://docs.ipfs.io/concepts/ipns/), although other versions (such as [ENS](https://gist.github.com/PhyrexTsai/cffcbfa1d752b9cf817d920dfcd1ec9f)) are compatible. The keypair associated with the address is used to sign content that is published under it. [DNSLink](https://docs.ipfs.io/concepts/dnslink/) is also used to map a domain name to an IPFS address. It is currently faster than IPNS and has the advantage of being human-readable and memorable.
IPFS generally uses a DHT for finding content in the network. Each data host advertises the data theyre storing once per day, which can be looked up by consumers through the DHT. Nodes also discover peers through their local area network, and by bootstrapping with other nodes in the network. The DHT is also used for bootstrapping pubsub channels which groups can subscribe to for topic-based updates to content they care about.
IPFS does not have built-in incentives for nodes to persist content to the network. Users store their own data by [pinning](https://docs.ipfs.io/concepts/persistence/) it to their local IPFS nodes, communities collaborate in backing up data through tools like [collaborative clusters](https://blog.ipfs.io/2020-01-09-collaborative-clusters/), and enterprises pay 3rd party pinning services like Infura to ensure availability and reliability. Projects like [Storj](https://storj.io/blog/2019/10/ipfs-now-on-storj-network/) and [Filecoin](https://filecoin.io/) are building blockchain networks for incentivizing persistent IPFS storage.
These tools can be mixed and matched by applications developing on IPFS to create very flexible structures for curating and sharing data. A great example of this is Textile [threads](https://docs.textile.io/concepts/threads/) and [buckets](https://docs.textile.io/concepts/buckets/) - which are both higher-level structures built on IPFS.
If all users stop hosting a piece of data, it is removed from the network. However, if a node chooses to continue hosting it, [it can still be located](https://github.com/ipfs-inactive/faq/issues/9) by its content ID.
### Moderation & Reputation
Libp2p has some low-level primitives around connection management which can be used to encode peer reputation (how good has this peer been at sending me the data Im asking for), however IPFS is mostly a layer below identity and reputation right now. Both libp2p and IPFS do support the explicit configuration to avoid or block known bad IPs.
IPFS is mostly a layer below identity and reputation, but libp2p has some low-level primitives around connection management which can be used to encode peer reputation. Peer reputation is based on how reliable the peer is at returning requested data. Both libp2p and IPFS support the explicit configuration to avoid or block known bad IPs.
In terms of content moderation - each IPFS node is in full control of the data it pins, and we have early designs for how to implement an autonomy-preserving content moderation system by which nodes can subscribe to denylists from entities they trust to help avoid or filter unwanted content
Each IPFS node is in full control of the data it pins. Nodes can add a denylist to their configuration, optionally using the one [used by public gateways](https://github.com/ipfs/infra/blob/master/ipfs/gateway/denylist.conf) to block DMCA takedown content, malware, and other illegal or pernicious content. There is a proposed design for an autonomy-preserving content moderation system by which nodes can subscribe to denylists from entities they trust to help avoid or filter unwanted content.
### Social & Discovery
IPFS uses a DHT for finding content in the network. Each host advertises the data theyre storing once per day, which can be looked up through the DHT. Nodes also discover peers through their local area network, and by bootstrapping with other nodes. The DHT is also used for bootstrapping pubsub channels which groups can subscribe to for topic-based updates to content they care about.
### Privacy & Access Control
Content published to IPFS is public by default. Encryption can be used to add privacy and access control layers on top of IPFS (see [Peergos](peergos.md)).
### Interoperability
IPFS [Gateways](https://docs.ipfs.io/concepts/ipfs-gateway/) allow the network to be accessed over HTTP in browsers without native IPFS support.
The IPLD data structure is designed to allow any kind of hash linked data can be ingested into IPFS, including blockchains like [Bitcoin](https://github.com/ipld/go-ipld-btc) and [Ethereum](https://github.com/ipfs/go-ipld-eth), and [git repos](https://github.com/ipfs-shipyard/git-remote-ipld).
### Scalability
The IPFS public network currently has hundreds of thousands of nodes, but there are also many private networks running IPFS without connecting to the main DHT. Most nodes participate as DHT clients, using the network to find desired content or propagate messages or data to other peers.
The IPFS public network currently has hundreds of thousands of nodes. Private networks also run IPFS without connecting to the main DHT, and are not included in the node count.
Other groups have built distributed search indexes over the public DHT either through incentivized curation or by introspecting public data announced to the wider network.
IPFS nodes have historically had [high resource consumption](https://hackernoon.com/ipfs-a-complete-analysis-of-the-distributed-web-6465ff029b9b), although improvements and ['low power' settings](https://www.reddit.com/r/ipfs/comments/7sfcbq/im_running_ipfs_on_my_raspberry_pi/) for weaker devices have since been added.
### Governance
### Metrics
The core implementations working group is responsible for reviewing and merging/rejecting internal and external contributions to the IPFS protocol, rather than through broader consensus. There is also a wider community of 4000+ OSS contributors helping improve and test IPFS.
- [Automated metrics about IPFS related projects](https://github.com/ipfs/metrics)
- [100ks of nodes as of May 2020](https://youtu.be/RxJSUBeqOKU?t=392)
- [~4000 contributors](https://github.com/ipfs-shipyard/get-gh-contributors)
IPFS contributors interface closely with contributors to both libp2p and IPLD, since many features or improvements require cross-cutting collaboration, however all 3 protocols are independently stewarded and have their own unique end users they optimize for.
### Governance & Business Models
IPFS is developed by [Protocol Labs](https://protocol.ai/), a VC-funded company that [raised over 200 million](https://filecoin.io/blog/sale-completed/) in a token sale for Filecoin. The core implementations working group, consisting of both employees of the company and external contributors, has decision-making authority over contributions to the IPFS protocol. Libp2p, IPLD, and Filecoin are stewarded by separate working groups.
Contributors from the open source community either volunteer their time, or are funded through companies that have raised money to build on top of IPFS.
### Implementations & Applications
Implementions of IPFS exist in go and javascript, and a [Rust implementation is under development](https://blog.ipfs.io/2020-03-18-announcing-rust-ipfs/). Projects like [Textile](https://textile.io/), [OrbitDB](https://orbitdb.org/), and [3box](https://www.3box.io/) have built additional layers of tooling on top of IPFS to support a wider range of applications.
Examples of tools that have expanded the use cases of IPFS include:
- Textile [buckets](https://docs.textile.io/concepts/buckets/) - dynamic folders for decentralized applications, distributed over IPFS
- [OrbitDB](https://github.com/orbitdb/orbit-db) is a serverless, p2p database that uses IPFS to store data, and IPFS PubSub to sync databases with peers. It uses CRDTs to resolve conflicts.
- 3box has a [web application framework](https://docs.3box.io/build/web-apps) that stores data in IPFS
Libp2p is used, independently of IPFS, by other decentralized networks such as Polkadot, ETH2, and [Matrix](matrix.md), which is experimenting with as a transport layer for the p2p version.
A list of applications that use IPFS: https://awesome.ipfs.io/
![Ecosystem](https://ipfs.io/ipfs/QmSKi1BCzVmiPPSFNbTN5FafZxA1kLxsKyLQtQrAQEVS3H?filename=Ecosystem%20Diagram.png)
Notable p2p applications include:
- [OpenBazaar](https://openbazaar.org/), a marketplace
- [Peepeth](https://peepeth.com/welcome), a social network on IPFS and Ethereum
- [Peergos](https://peergos.org/), a private distributed file sharing protocol and application
- [Dtube](https://about.d.tube/), a Youtube alternative
- [Everipedia](https://everipedia.org/), a wikipedia alternative [built on IPFS](https://qz.com/1151073/wikipedias-cofounder-on-how-hes-creating-a-bigger-better-rival-on-the-blockchain/)
- [Audius](https://github.com/AudiusProject), a music streaming service
- Anytype, a locally hosted Notion-like writing platform
Enterprise adoptions and integrations include:
- [Microsoft](https://techcommunity.microsoft.com/t5/identity-standards-blog/ion-booting-up-the-network/ba-p/1441552) uses IPFS for their DID implementation
- [Cloudflare](https://developers.cloudflare.com/distributed-web/ipfs-gateway/) runs a popular IPFS gateway hosted at https://cloudflare-ipfs.com/
- [Netflix](https://blog.ipfs.io/2020-02-14-improved-bitswap-for-container-distribution/) switched to IPFS for docker container distribution, improving performance 2x
- [Opera](https://blog.ipfs.io/2020-03-30-ipfs-in-opera-for-android/) IPFS is supported by default in the Opera browser for Android
### Related
[OrbitDB](https://github.com/orbitdb/orbit-db) is a serverless, distributed, p2p database. It uses IPFS as its data storage, and IPFS PubSub to sync databases with peers. It uses CRDTs to resolve conflicts.
- [Hypercore/DAT](hypercore.md)
### Links
[Docs](https://docs.ipfs.io/)
[Mapping the Interplanetary Filesystem](https://arxiv.org/pdf/2002.07747.pdf)

View File

@ -2,21 +2,21 @@
GUN is a decentralized graph database with a conflict resolution algorithm (CRDT) and synchronization protocol. It includes a library of tools for merging conflicting data and handling routing, security, and storage.
In GUN's graph store, entries are [javascript objects under UUID keys](https://gun.eco/docs/Porting-GUN). Objects can be data of any type, including key-value, files, JSON, documents, tables, relational, and graph or hyper-graph data. Data is stored in the browser by default, with backup "superpeers" to ensure persistence. Peers connect to other peers, and choose what data to synchronize and persist.
In GUN's graph store, entries are [javascript objects under UUID keys](https://gun.eco/docs/Porting-GUN). Objects can be data of any type, including files, JSON, or other documents. Data is stored in the browser by default, with backup "superpeers" to ensure persistence. Peers connect to other peers, and choose what data to synchronize and persist.
There is a public space and a user space. In the public space are all graphs without a public key as their ID. In the user space, graphs are signed with the user's keys, and their IDs must include the user's public key.
There is a public space, and a subset of that is the user space. In the public space are all graphs without a public key as their ID. In the user space, graphs are signed with the user's keys, and their IDs must include the user's public key.
### Identity
Gun's [User System](https://gun.eco/docs/Auth) creates a username and password. Usernames are global but not unique.
[Multi-device login](https://gun.eco/docs/Auth) is handled by encrypting a user's crytographic keypair, which is stored in the GUN graph. Keypairs are not derived from the password. PBDKF proof is derived from the password, and AES is derived from that to encrypt the keypair. GUN treats this method as "secure enough" for applications in which private keys do not control financial information. "Auth" is doing a GUN query for that account, subscribing to it, and then attempts to brute force decrypt the keys of all accounts that match that username. Once loaded once, it's cached on that device, loading from localstorage or local harddrive.
[Multi-device login](https://gun.eco/docs/Auth) is handled by encrypting a user's crytographic keypair, which is stored in the GUN graph. Keypairs are not derived from the password. PBKDF2 proof is derived from the password, and AES keys are derived from that to encrypt the keypair. GUN treats this method as "secure enough" for applications in which private keys do not control financial information. "Auth" is doing a GUN query for that account, subscribing to it, and then attempting to brute force decrypt the keys of all accounts that match that username. Once an account has been loaded once, it's cached on that device, loading from localstorage or the local harddrive.
GUN's SEA (Security, Encryption, Authorization) module provides the capability to directly create a [public/private keypair](https://gun.eco/docs/SEA) for a user, without a username and account.
### Network structure
GUN uses a gossip protocol along with a topic-based PubSub protocol to sync data between peers. GUN peers fall back to the [gossip-based protocol](https://gun.eco/docs/DAM) when the more optimized PubSub [routing](https://gun.eco/docs/Routing) protocol fails. Messages are routed across different transport layers (websockets, WebRTC, multicast UDP etc).
GUN uses a gossip protocol along with a topic-based PubSub protocol to sync data between peers. GUN peers fall back to the [gossip-based protocol](https://gun.eco/docs/DAM) when the more optimized PubSub [routing](https://gun.eco/docs/Routing) protocol fails. Messages can be routed across different transport layers (websockets, WebRTC, multicast UDP etc).
Peers subscribe to graphs relevant to their application's logic, although the global GUN graph is accessible to all peers.
@ -24,16 +24,12 @@ Planned future network upgrades include the addition of a DHT. A [tokenized ince
### Data Storage
Peers subscribe to the data they need and the network retrieves it from any peer (including browsers, where GUN stores data in localStorage). Running always-online peers, such as a "superpeer", is recommended for most applications to ensure availability of data when most browser-based peers may be offline. A superpeer is an IP addressable machine running node.js that persists data to disk. [RAD](https://gun.eco/docs/RAD), GUN's storage adaptor, saves data to disk using a radix tree.
Peers subscribe to the data they need and the network retrieves it from any peer (including browsers, where GUN stores data in localStorage). Running always-online peers, a "superpeer", is recommended for most applications to ensure availability of data when most browser-based peers may be offline. A superpeer is an IP addressable machine running node.js that persists data to disk. [RAD](https://gun.eco/docs/RAD), GUN's storage adaptor, saves data to disk.
GUN uses a CRDT (Conflict-free Replicated Data Type) to merge data. Conflicts are handled by a [conflict resolution algorithm](https://gun.eco/docs/Conflict-Resolution-with-Guns) that uses lexical sort. GUN is [strongly eventually consistent](https://pages.lip6.fr/Marc.Shapiro/slides/CRDTs%20Google%20Zurich-2011-09.pdf), meaning that peers will eventually converge upon the last updated value when nodes that are offline eventually receive updates.
GUN focuses on mutability by not using an append-only log, which implements updates, insertions, and deletion as a layer on top. [Deletion](https://stackoverflow.com/questions/37758618/how-to-delete-data-in-gun-db) in GUN works by overwriting bytes with `null`, or by de-referencing portions of a graph. A content-addressed graph space is used to implement immutable, append-only data.
#### Filtering
There is a [GraphQL](https://github.com/brysgo/graphql-gun) API for the gun p2p graph database. SQL and Mango (MongoDB) queries were available in the past, but deprecated.
### Privacy and Access Control
Access control is built into the [User system](https://gun.eco/docs/Auth) and can be combined with [SEA](https://gun.eco/docs/SEA), GUN's encryption utilities, for more advanced use cases.
@ -52,7 +48,6 @@ Test relays (superpeers) on GUN can handle about 10k simultaneous connections: h
### Metrics
- 11K+ [github](https://github.com/amark/gun) stars
- 10M ~ 30M monthly [downloads](https://www.jsdelivr.com/package/npm/gun)
### Monetization