diff --git a/protocols/IPFS.md b/protocols/IPFS.md index 2f22027..43d38f9 100644 --- a/protocols/IPFS.md +++ b/protocols/IPFS.md @@ -1,75 +1,111 @@ # IPFS -## Overview +IPFS is a [content-addressed protocol](https://docs.ipfs.io/concepts/content-addressing/) for peer-to-peer hypermedia storage and distribution. The IPFS network is mainly used as a storage layer for decentralized applications. Its main purpose is to add, share, find, and transfer files in a globally distributed file system. Components of IPFS, such as [libp2p](https://libp2p.io/), the p2p networking library, and [IPLD](https://ipld.io/), the data model for content-addressed Merkle DAGs, are used separately by applications that do not participate in the IPFS network. -IPFS is a content-addressed protocol for peer-to-peer hypermedia storage and distribution. It builds on top of libp2p (as the peer-to-peer transport layer) and IPLD (as the data model for content-addressed merkle dags). It’s main purpose is to add, share, find, and transfer files in a globally distributed file system without any central controller. IPFS is entirely open source, as are libp2p and IPLD - supported by an OSS community of over 4 thousand contributors. The IPFS protocol continues to evolve through the main reference implementations in go and JS, stewarded by the core implementations working group. - -The IPFS Alpha launched in February 2015, and the ecosystem has since grown to serve millions of monthly users through hundreds of applications including social networking platforms (Peepeth, Akasha, Peergos), content distribution networks (Dtube, Everipedia, Audius), decentralized identity solutions (Microsoft ION, 3box, ENS) and a host of other projects (Textile, Terminal, Infura, AnyType, etc). The majority of users today use IPFS as a content addressable storage layer to back their decentralized applications, using libp2p for networking and IPLD as a low-level data model. Many groups, like Textile, orbitdb, and 3box, build additional layers of tooling on top of IPFS to support a wider range of developers. - -![Ecosystem](https://ipfs.io/ipfs/QmSKi1BCzVmiPPSFNbTN5FafZxA1kLxsKyLQtQrAQEVS3H?filename=Ecosystem%20Diagram.png) - -In order to build a universal, inclusive, resilient and sustainable network for human knowledge and information, we need a network that provides technical features like: -Ability to connect to any other user in the network, independently of the terminal that the user is using (e.g. Browser, Mobile, VR, Desktop and so on) -Ability to verify receiving and sending information to the desired destination, without requiring to reveal what that data is or who the destination is. -Ability to verify the integrity of the information received as consumers of the network -Support for the creation of applications and businesses -Support for an ever growing number of devices and users -Ability to adapt and evolve to adjust to new needs (future-proofing) - -These challenges need to be solved at the network fabric level in order to preserve a baseline of what the values of the network are. - -### Network architecture & Connectivity - -IPFS is fully peer-to-peer. When joining the network, nodes bootstrap off of long-lived peers or those in their local area network. A node running the IPFS protocol can opt-in to be part of the main public network and/or some other alternative network, either independently or simultaneously. Nodes joining the main public network will join the DHT as either clients (consumers) or servers (participants in providing the content routing service), or by directly connecting to nodes they’re interested in either by peer id or through subscribing to relevant pubsub channels. All nodes in the IPFS network use libp2p to arrange peer-to-peer connections, a project that is now used by other decentralized networks such as Polkadot, ETH2 and more (including Matrix, which is experimenting with it to become full p2p). - -IPFS is a distributed protocol, what this means is that every single node gets to participate in the network in whatever capacity they so desire, enabling different kinds of network topologies to emerge (Distributed, Decentralized, Federated, Centralized, Full Mesh and so on). - -Peers dial to each other using a multiaddr, a self-describing address that lets peers know a node’s prefered way to be dialed. All connections in IPFS are end-to-end encrypted and authenticated using modern cryptographic primitives (Public/Private Key Crypto). +The IPFS Alpha launched in February 2015. IPFS protocols are designed for upgradeability. ### Identity -IPFS is frequently used as a content addressed storage system for decentralized identity solutions (like Microsoft ION for example). This allows IPFS to flexibly serve a variety of opinionated privacy constraints and application-layer preferences around reputation, trust, and anonymity. +IPFS nodes have a peer ID, the [hash of their public/private keypair](https://docs.libp2p.io/concepts/peer-id/). When peers connect, they exchange public keys and check to make sure they match the node IDs. Communications are encrypted using these keys. Node IDs are pseudonymous, and can be reset as needed to maintain privacy. Node private keys are stored in the IPFS config by default. -At the protocol layer, IPFS peer ids are pseudonymous - and can be reset as needed for increased privacy. IPNS and provider records both rely on time-based revocation after which records expire. +IPFS can serve as a content addressed storage system for decentralized identity solutions (such as [Microsoft's ION](https://techcommunity.microsoft.com/t5/identity-standards-blog/ion-booting-up-the-network/ba-p/1441552)). [IPID](https://github.com/jonnycrunch/ipid) is an implementation of the DID (decentralized identifiers) specification over IPFS, using [IPNS](https://docs.ipfs.io/concepts/ipns/). -### Data layer +### Network Architecture & Connectivity -IPFS is not just for files, in fact, the IPFS data layer is much more close to a graph database with elements of linked data then a file system. IPFS uses IPLD for representing any piece of data available in the network. This is a low level data structure that empowers many higher level abstractions to be built on top of it - like the orbitdb database, or Textile threads. These application-layer data models also often include privacy-preserving aspects like end-to-end encryption and access controls (like the boom.fyi exploding links). IPFS content identifiers (CIDs) are immutable, however many groups also build mutability layers on IPFS using public key cryptography - our native version of this is called IPNS, but many other versions are also compatible and supported (ENS, etc). +IPFS is a distributed protocol. Every node gets to participate in the network in any configuration, enabling different kinds of network topologies to emerge (Distributed, Decentralized, Federated, Centralized, Full Mesh, etc.). Nodes can connect to each other independently of the client (browser, mobile, desktop, command line). -IPFS doesn’t have any built-in persistence incentives, however it is compatible with many. Users tend to store their own data by ‘pinning’ it to their local IPFS nodes, small communities collaborate in backing up relevant data to the group through tools like ‘collaborative clusters’, enterprises use pinning services like Infura to pay 3rd parties to store their data on IPFS and ensure fast reliability, and there is a growing contingent of decentralized incentive layers including Storj, Filecoin, and others. +When nodes join the network, they bootstrap off of long-lived peers or those in their local area network. IPFS comes with a [default list of trusted peers](https://docs.ipfs.io/how-to/modify-bootstrap-list/), which can be modified. A node can opt-in to be part of the main public network and/or an alternative network. Nodes joining the main public network will join the [DHT](https://docs.ipfs.io/concepts/dht/) as either clients (consumers) or servers (providers of the content routing service). Nodes can also directly connect to peers they’re interested in either through peer ID or through subscribing to relevant pubsub channels. -Since each user is responsible for pinning their own data, deleting data just requires all the users with the data to stop hosting it. This is good for censorship-resistance (individual nodes with valuable data being blacklisted by an authoritarian regime won’t delete all copies), and also good for agreed content deletion (where all data hosts can unpin or avoid resolving content deemed bad). +All nodes in the IPFS network use [libp2p](https://libp2p.io/), the modular networking library, to make peer-to-peer connections. Libp2p is [transport agnostic](https://docs.libp2p.io/concepts/transport/), leaving the choice of transport protocol up to the developer, and allowing an application to support many different transports at the same time. Peers dial to each other using a [multiaddr](https://multiformats.io/multiaddr), a self-describing network address that lets peers know a node’s preferred way to be dialed. The use of multiaddrs is intended to future-proof addresses, and allow multiple transport protocols and addresses to coexist. All connections in IPFS are end-to-end encrypted and authenticated using public/private key cryptography. -### Monetization & Business models +[Gateways](https://docs.ipfs.io/concepts/ipfs-gateway/) allow IPFS to be accessed over HTTP, which makes content stored in IPFS accessible through a standard browser. -IPFS is fully open source and free to use. Each individual in the network is responsible for persisting the data they care about by either adding their own resources (run a node) or incentivizing another group to persist their data (pay a pinning service). Therefore, the network grows in capacity as new users join. Services that rely on IPFS are incentivized to participate in the public DHT as servers to improve performance and availability of their data - and all participating nodes help with peer-to-peer data transfer and routing. +### Data Layer -A number of developers in the IPFS ecosystem are supported by companies who have raised money to build projects on top of IPFS - including Protocol Labs, Textile, Anytype, Infura, 3box, Audius, and many others. While many open source developers contribute pro bono part time, other groups are funded through grants or bounties by one or many of these organizations. +IPFS is commonly known as a distributed file system, but the data layer is closer to a graph database with elements of linked data. IPFS uses [IPLD](https://ipld.io/) for representing any piece of data available in the network. The IPLD data model treats all hash-linked data structures as subsets of a unified information space. Higher level abstractions can be built on top of it, like the OrbitDB database, or Textile threads. -### Curation/Discovery +Files are located in the IPFS network by their [CID](https://docs.ipfs.io/concepts/content-addressing/), a content identifier that is based on the hash of the file. The hash of a file does not change, so this form of addressing does not allow updates. However, mutable addresses can be built on top by using the hash of a public key as an address. IPFS's native version is [IPNS (InterPlanetary Name System)](https://docs.ipfs.io/concepts/ipns/), although other versions (such as [ENS](https://gist.github.com/PhyrexTsai/cffcbfa1d752b9cf817d920dfcd1ec9f)) are compatible. The keypair associated with the address is used to sign content that is published under it. [DNSLink](https://docs.ipfs.io/concepts/dnslink/) is also used to map a domain name to an IPFS address. It is currently faster than IPNS and has the advantage of being human-readable and memorable. -IPFS generally uses a DHT for finding content in the network. Each data host advertises the data they’re storing once per day, which can be looked up by consumers through the DHT. Nodes also discover peers through their local area network, and by bootstrapping with other nodes in the network. The DHT is also used for bootstrapping pubsub channels which groups can subscribe to for topic-based updates to content they care about. +IPFS does not have built-in incentives for nodes to persist content to the network. Users store their own data by [‘pinning’](https://docs.ipfs.io/concepts/persistence/) it to their local IPFS nodes, communities collaborate in backing up data through tools like [‘collaborative clusters’](https://blog.ipfs.io/2020-01-09-collaborative-clusters/), and enterprises pay 3rd party pinning services like Infura to ensure availability and reliability. Projects like [Storj](https://storj.io/blog/2019/10/ipfs-now-on-storj-network/) and [Filecoin](https://filecoin.io/) are building blockchain networks for incentivizing persistent IPFS storage. -These tools can be mixed and matched by applications developing on IPFS to create very flexible structures for curating and sharing data. A great example of this is Textile [threads](https://docs.textile.io/concepts/threads/) and [buckets](https://docs.textile.io/concepts/buckets/) - which are both higher-level structures built on IPFS. +If all users stop hosting a piece of data, it is removed from the network. However, if a node chooses to continue hosting it, [it can still be located](https://github.com/ipfs-inactive/faq/issues/9) by its content ID. ### Moderation & Reputation -Libp2p has some low-level primitives around connection management which can be used to encode peer reputation (how good has this peer been at sending me the data I’m asking for), however IPFS is mostly a layer below identity and reputation right now. Both libp2p and IPFS do support the explicit configuration to avoid or block known bad IPs. +IPFS is mostly a layer below identity and reputation, but libp2p has some low-level primitives around connection management which can be used to encode peer reputation. Peer reputation is based on how reliable the peer is at returning requested data. Both libp2p and IPFS support the explicit configuration to avoid or block known bad IPs. -In terms of content moderation - each IPFS node is in full control of the data it pins, and we have early designs for how to implement an autonomy-preserving content moderation system by which nodes can subscribe to denylists from entities they trust to help avoid or filter unwanted content +Each IPFS node is in full control of the data it pins. Nodes can add a denylist to their configuration, optionally using the one [used by public gateways](https://github.com/ipfs/infra/blob/master/ipfs/gateway/denylist.conf) to block DMCA takedown content, malware, and other illegal or pernicious content. There is a proposed design for an autonomy-preserving content moderation system by which nodes can subscribe to denylists from entities they trust to help avoid or filter unwanted content. + +### Social & Discovery + +IPFS uses a DHT for finding content in the network. Each host advertises the data they’re storing once per day, which can be looked up through the DHT. Nodes also discover peers through their local area network, and by bootstrapping with other nodes. The DHT is also used for bootstrapping pubsub channels which groups can subscribe to for topic-based updates to content they care about. + +### Privacy & Access Control + +Content published to IPFS is public by default. Encryption can be used to add privacy and access control layers on top of IPFS (see [Peergos](peergos.md)). + +### Interoperability + +IPFS [Gateways](https://docs.ipfs.io/concepts/ipfs-gateway/) allow the network to be accessed over HTTP in browsers without native IPFS support. + +The IPLD data structure is designed to allow any kind of hash linked data can be ingested into IPFS, including blockchains like [Bitcoin](https://github.com/ipld/go-ipld-btc) and [Ethereum](https://github.com/ipfs/go-ipld-eth), and [git repos](https://github.com/ipfs-shipyard/git-remote-ipld). ### Scalability -The IPFS public network currently has hundreds of thousands of nodes, but there are also many private networks running IPFS without connecting to the main DHT. Most nodes participate as DHT clients, using the network to find desired content or propagate messages or data to other peers. +The IPFS public network currently has hundreds of thousands of nodes. Private networks also run IPFS without connecting to the main DHT, and are not included in the node count. -Other groups have built distributed search indexes over the public DHT either through incentivized curation or by introspecting public data announced to the wider network. +IPFS nodes have historically had [high resource consumption](https://hackernoon.com/ipfs-a-complete-analysis-of-the-distributed-web-6465ff029b9b), although improvements and ['low power' settings](https://www.reddit.com/r/ipfs/comments/7sfcbq/im_running_ipfs_on_my_raspberry_pi/) for weaker devices have since been added. -### Governance +### Metrics -The core implementations working group is responsible for reviewing and merging/rejecting internal and external contributions to the IPFS protocol, rather than through broader consensus. There is also a wider community of 4000+ OSS contributors helping improve and test IPFS. +- [Automated metrics about IPFS related projects](https://github.com/ipfs/metrics) +- public network node count? +- 4000 contributors? -IPFS contributors interface closely with contributors to both libp2p and IPLD, since many features or improvements require cross-cutting collaboration, however all 3 protocols are independently stewarded and have their own unique end users they optimize for. +### Governance & Business Models + +IPFS is developed by [Protocol Labs](https://protocol.ai/), a VC-funded company that [raised over 200 million](https://filecoin.io/blog/sale-completed/) in a token sale for Filecoin. The core implementations working group, consisting of both employees of the company and external contributors, has decision-making authority over contributions to the IPFS protocol. Libp2p, IPLD, and Filecoin are stewarded by separate working groups. + +Contributors from the open source community either volunteer their time, or are funded through companies that have raised money to build on top of IPFS. + +### Implementations & Applications + +Implementions of IPFS exist in go and javascript, and a [Rust implementation is under development](https://blog.ipfs.io/2020-03-18-announcing-rust-ipfs/). Projects like [Textile](https://textile.io/), [OrbitDB](https://orbitdb.org/), and [3box](https://www.3box.io/) have built additional layers of tooling on top of IPFS to support a wider range of applications. + +Examples of tools that have expanded the use cases of IPFS include: + +- Textile [buckets](https://docs.textile.io/concepts/buckets/) - dynamic folders for decentralized applications, distributed over IPFS +- [OrbitDB](https://github.com/orbitdb/orbit-db) is a serverless, p2p database that uses IPFS to store data, and IPFS PubSub to sync databases with peers. It uses CRDTs to resolve conflicts. +- 3box has a [web application framework](https://docs.3box.io/build/web-apps) that stores data in IPFS + +Libp2p is used, independently of IPFS, by other decentralized networks such as Polkadot, ETH2, and [Matrix](matrix.md), which is experimenting with as a transport layer for the p2p version. + +A list of applications that use IPFS: https://awesome.ipfs.io/ + +![Ecosystem](https://ipfs.io/ipfs/QmSKi1BCzVmiPPSFNbTN5FafZxA1kLxsKyLQtQrAQEVS3H?filename=Ecosystem%20Diagram.png) + +Notable p2p applications include: + +- [OpenBazaar](https://openbazaar.org/), a marketplace +- [Peepeth](https://peepeth.com/welcome), a social network on IPFS and Ethereum +- [Peergos](https://peergos.org/), a private distributed file sharing protocol and application +- [Dtube](https://about.d.tube/), a Youtube alternative +- [Everipedia](https://everipedia.org/), a wikipedia alternative [built on IPFS](https://qz.com/1151073/wikipedias-cofounder-on-how-hes-creating-a-bigger-better-rival-on-the-blockchain/) +- [Audius](https://github.com/AudiusProject), a music streaming service +- Anytype, a locally hosted Notion-like writing platform + +Enterprise adoptions and integrations include: + +- [Microsoft](https://techcommunity.microsoft.com/t5/identity-standards-blog/ion-booting-up-the-network/ba-p/1441552) uses IPFS for their DID implementation +- [Cloudflare](https://developers.cloudflare.com/distributed-web/ipfs-gateway/) runs a popular IPFS gateway hosted at https://cloudflare-ipfs.com/ +- [Netflix](https://blog.ipfs.io/2020-02-14-improved-bitswap-for-container-distribution/) switched to IPFS for docker container distribution, improving performance 2x +- [Opera](https://blog.ipfs.io/2020-03-30-ipfs-in-opera-for-android/) IPFS is supported by default in the Opera browser for Android ### Related -[OrbitDB](https://github.com/orbitdb/orbit-db) is a serverless, distributed, p2p database. It uses IPFS as its data storage, and IPFS PubSub to sync databases with peers. It uses CRDTs to resolve conflicts. +- [Hypercore/DAT](hypercore.md) + +### Links + +[Docs](https://docs.ipfs.io/) +[Mapping the Interplanetary Filesystem](https://arxiv.org/pdf/2002.07747.pdf)