By vasa in IPFS Cluster — Dec 15, 2019

Download and Installation IPFS Cluster

In order to run an IPFS Cluster peer and perform actions on the Cluster, you will need to obtain the ipfs-cluster-service and ipfs-cluster-ctl binaries. The former runs the Cluster peer. The latter allows to interact with it:

Visit the download page for instructions on the different ways to obtain ipfs-cluster-service and ipfs-cluster-ctl.
Place the binaries in a place where they can be run by an unattended by an ipfs system user (usually /usr/local/bin). IPFS Cluster should be installed and run along ipfs (go-ipfs).
Consider configuring your systems to start ipfs and ipfs-cluster-service automatically (beware to check that you need to ensure your cluster is fully operational and peers discover each other beforehand). Some sample Systemd service files are available here: ipfs-cluster-service, ipfs.

Initialization

To create and generate a default configuration file, a unique identity for your peer and an empty peerstore file, run:

$ ipfs-cluster-service init --consensus <crdt/raft>

This assumes that the ipfs-cluster-service command is installed in one of the folders in your $PATH.

If all went well, after running this command there will be three different files in $HOME/.ipfs-cluster:

service.json contains a default peer configuration. Usually, all peers in a Cluster should have exactly the same configuration.
identity.json contains the peer private key and ID. These are unique to each Cluster peer.
peerstore is an empty file used to store the peer addresses of other peers so that this peer knows where to contact them.

The --consensus flag chooses whether to initialize the configuration with a raft or a crdt section. All peers should be initialized in the same way. The choice between raft and crdt depends on multiple factors and affects how the cluster is started and the peerset modified. We have gathered more in-depth explanations in the Consensus Components section.

The new service.json file generated by ipfs-cluster-service init will have a randomly generated secret value in the cluster section. For a Cluster to work, this value should be the same in all cluster peers. This is usually a source of pitfalls since initializing default configurations everywhere results in different random secrets.

If present, the CLUSTER_SECRET environment value is used when running ipfs-cluster-service init to set the cluster secret value.

Remote configuration

ipfs-cluster-service can be initialized to use a remote configuration file accessible on an HTTP(s) location which is read to obtain the running configuration every time the peer is launched. This is useful to initialize all peers with the same configuration and provide seamless upgrades to it.

A good trick is to use IPFS to store the actual configuration and, for example, call init with a gateway url as follows:

$ ipfs-cluster-service init http://localhost:8080/ipns/config.mydomain.com

(a DNSLink TXT record needs to be configured for the example above to work. A regular URL can be used too).

Do not host configurations publicly unless it is OK to expose the Cluster secret. This is only OK in crdt-based clusters which have configured trusted_peers to other than *.

Trusted peers

The crdt section of the service.json file includes a single * value for the trusted_peers array. By default, peers running on crdt-mode trusts all other peers. In raft mode, all peers trust all other peers and this option does not exist.

Read more about trusted peers in the Security and Ports guide.

The peerstore file

The peerstore file will be maintained by the running Cluster peer and will be used to store known-peer addresses. However, you can also pre-fill this file (one line per multiaddress) to help this peer connect to others during its first start. Here is an example:

/dns4/cluster1.domain/tcp/9096/ipfs/QmcQ5XvrSQ4DouNkQyQtEoLczbMr6D9bSenGy6WQUCQUBt
/dns4/cluster2.domain/tcp/9096/ipfs/QmdFBMf9HMDH3eCWrc1U11YCPenC3Uvy9mZQ2BedTyKTDf
/ip4/192.168.1.10/tcp/9096/ipfs/QmSGCzHkz8gC9fNndMtaCZdf9RFtwtbTEEsGo4zkVfcykD

Ports

By default, Cluster uses:

9096/tcp as the cluster swarm endpoint which should be open and diallable by other cluster peers.
9094/tcp as the HTTP API endpoint
9095/tcp as the Proxy API endpoint

A full description of the ports and endpoints is available in the Security guide.

Settings for production

The default IPFS and Cluster settings are conservative and work for most setups out of the box. There are however, a number of options that can be optimized with regards to:

Large pinsets
Large number of peers
Networks with very high or lower latencies

Additionally to the settings mentioned here, the configuration reference contains detailed information for every configuration section, with extended descriptions of what each value means.

IPFS Configuration

IPFS deamons can be optimized for production. The options are documented in the official repository:

Server profile for cloud deployments

Initialize ipfs using the server profile: ipfs init --profile=server or ipfs config profile apply server if the configuration already exists.

Pay attention to AddrFilters and NoAnnounce options. They should be pre-filled to sensible values with the server configuration profile, but depending on the type of network you are running on, you may want to modify them.

Datastore settings

For very large repos, consider enabling the Badger datastore. You can convert between datastores using ipfs-ds-convert (instructions). Badger should be significantly faster for very large pinsets, at the expense of memory.

Increase Datastore.BloomFilterSize according to your repo size (in bytes): 1048576 (1MB) is a good value (more info here)

Do not forget to set Datastore.StorageMax to a value according to the disk you want to dedicate for the ipfs repo. This will affect how cluster calculates how much free space there is in every peer.

Connection manager settings

Increase the Swarm.ConnMgr.HighWater (maximum number of connections) and reduce GracePeriod to 20s. It can be as high as your machine would take (10000 is a good value for large machines). Adjust Swarm.ConnMgr.LowWater to about a 25% of the HighWater value.

File descriptor limit

The IPFS_FD_MAX environment variable controls the FD ulimit value that go-ipfs sets for itself. Depending on your Highwater value, you may want to increase it to 8192 or more.

IPFS Cluster configuration

The service.json configuration file contains a few options which should be tweaked according to your environment, capacity and requirements.

`cluster` section

When dealing with large amount of pins, you may further increase the cluster.state_sync_interval and cluster.ipfs_sync_interval. These operations will perform checks for every pin in the pinset and will trigger ipfs pin ls --type=recursive calls, which may be slow when the number of pinned items is huge.

Consider increasing the cluster.monitor_ping_interval and monitor.*.check_interval. This dictactes how long cluster takes to realize a peer is not responding (and potentially trigger re-pins). Re-pinning might be a very expensive in your cluster. Thus, you may want to set this a bit high (several minutes). You can use same value for both.

Under the same consideration, you might want to set cluster.disable_repinning to true if you don’t wish repinnings to be triggered at all on peer downtime and want to handle things manually when content becomes underpinned. replication_factor_max and replication_factor_min allow some leeway: i.e. a ²⁄₃ will allow one peer to be down without re-allocating the content assigned to it somewhere else.

`raft` section

These options only apply when running raft-based clusters.

If you are planning to re-start all Raft peers at the same time (for example, after an upgrade), consider setting raft.wait_for_leader_timeout to something that gives ample time for all your peers to be restarted and come online at once. Usually 30s or 1m.

If your network is very unstable, you can try increasing raft.commit_retries, raft.commit_retry_delay. Note: more retries and higher delays imply slower failures.

For high-latency clusters (like having peers around the world), you can try increasing heartbeat_timeout, election_timeout, commit_timeout and leader_lease_timeout, although defaults are quite big already. For low-latency clusters, these can all be decreased (at least by half).

For very large pinsets, increase raft.snapshot_interval. If your cluster pins or unpins very frequently, increase raft.snapshot_threshold.

`crdt` section

These options only apply when running crdt-based clusters.

Reducing the crdt.rebroadcast_interval (default 1m) to a few seconds should make new peers start downloading the state faster, and badly connected peers should have more options to receive bits of information, at the expense of increased pubsub chatter in the network.

You can edit the crdt.cluster_name, as long as it is the same for all peers.

`restapi` section

Adjust the api.restapi network timeouts depending on your API usage. This may protect against misuse of the API or DDoS attacks. Note that there are usually client-side timeouts that can be modified too if you control the clients.

The API can be disabled by removing the configuration section.

`ipfshttp` section

Adjust the ipfs_connector.ipfshttp network timeouts if you are using the ipfs proxy in the same fashion as the restapi.

The Proxy API can be disabled by removing the configuration section.