By vasa in IPFS Cluster — Dec 15, 2019

Data, Backups, and Recovery

The configurations and data persisted by a running IPFS Cluster peer (with ipfs-cluster-service) is, by default, in the $HOME/.ipfs-cluster/ folder. A Cluster peer persists several types of information on disk:

The list of known peer addresses for future use. Is stored in the peerstore file during shutdown.
The cluster pinset (the list of objects that are pinned in the cluster along with all the options associated to them (like the name, the allocations or the replication factor) are stored depending on the consensus component chosen:
crdt stores everything in a key-value BadgerDB datastore in the badger folder.
raft stores the-append-only log making up the pinset, along with the list of cluster peers in a BoltDB store frequently snapshotted. All is saved in the raft folder.
service.json and identity.json are also persistent data, but normally they are not modified.

Offline state: export and import

Since the pinset information is persistend on disk, it can be exported from an offline peer with:

$ ipfs-cluster-service state export

This will produce a list of json objects that represent the current pinset (very similar to ipfs-cluster-ctl --enc=json pin ls on peers that are online). The resulting file can be re-imported with:

$ ipfs-cluster-service state import

Always re-import using the same ipfs-cluster-service version that you exported with.

Note that the state dump just contains the pinset. It does not include any bookeeping information, Raft peerset membership, Raft current term, CRDT Merkle-DAG nodes etc. Thus, when re-importing a pinset it is important to remember that:

In raft, the given pinset will be used to create a new snapshot, newer than any existing ones, but including information like the current peerset when existing.
In crdt, importing will clean the state completely and create a single batch Merkle-DAG node. This effectively compacts the state by replacing the Merkle-DAG, but to prevent this peer from re-downloading the old DAG, all other peers in the Cluster should have replaced or removed it too.

See Disaster recovery below for more information.

raft state dumps can be imported as crdt pinsets and vice-versa.

Resetting a peer: state cleanup

Cleaning up the state results in a blank cluster peer. Such peer will need to re-bootstrap (raft) or reconnect (crdt) to a Cluster in order to re-download the state. The state can also be provided by importing it, as described above. The cleanup can be performed by:

$ ipfs-cluster-service state cleanup

Note that this does not remove or rewrite the configuration, the identity or the peerstore files. Removing the raft or crdt data folders is to all effects the equivalent of a state cleanup.

When using Raft, the raft folder will be renamed as raft.old.X. Several copies will be kept depending on the backups_rotate configuration value. When using CRDT, the crdt related data will be deleted from the badger datastore.

Disaster recovery

The only content that IPFS Cluster stores and which is unique to a cluster peer is the pinset. IPFS content is stored by IPFS. Usually, if you are running a cluster, there will be several peers replicating the content and the cluster pinset so that when one or several peers crash, are destroyed, dissappear or simply fail, they can be reset to their clean form re-sync from other existing peers.

A healthy cluster is that with at least 50% of healthy online peers (raft) or at least one trusted, healthy peer (crdt).

Thus, any peer can be fully reset and re-join an otherwise healthy cluster with the same procedure that you would add a new peer. In raft, departed peers should be nevertheless manually removed with ipfs-cluster-ctl peer rm if they are never going to re-join again.

Unhealthy clusters

Things change for unhealthy clusters:

In crdt, the lack of trusted peers will prevent the restored peer from re-syncing to the cluster state (although, as a workaround, it could temporally trust any other peer).
In raft, the lack of quorum when more than 50% of peers are down, prevents adding new peers, removing broken peers or operating the cluster.

In such events, it may be easier to simply salvage the state and re-create your cluster following the next procedure:

Locate a peer that still stores the state (raft or badger folders)
Export the pinset with ipfs-cluster-service state export
Reset your peer or setup a new peer from scratch
Run ipfs-cluster-service state import to import the state copy from step 2
Start the peer as a single-peer-cluster
Fully cleanup, upgrade and bootstrap the rest of the peers to the running one

State upgrades

Since version 0.10.0, Cluster peers will not need manual state upgrades (the state upgrade command is gone).

Offline state: export and import

Resetting a peer: state cleanup

Disaster recovery

Unhealthy clusters

State upgrades

Subscribe to SimpleAsWater