Building an Online Publishing Platform on IPLD

In this tutorial, we will go through how IPLD works and build an online publishing platform.

  • Understanding IPLD
  • Creating an IPLD node
  • Connecting IPLD objects
  • Reading Nested data using Links
  • Creating a Publication System

You can check out the complete code for the tutorial here.

If you want to follow along, make sure that you have installed, initialized IPFS, and connected IPFS to the public network.

Understanding IPLD

When you add the photo to IPFS, this is what happens:

When we add something to IPFS, we get something like this:

You can see the final hash here:

QmQgQUbBeMTnH1j3QWwNw9LkXjpWDJrjyGYfZpnPp8x5Lu

But we don’t see anything related to the 2 steps(Raw and Digest). This all happens under the hood.

When we added the image, we converted the image into the Raw data which the computer can understand. Now, to make it content-addressable we have to come up with a method by which we can convert this image data to a label that uniquely identifies it’s content.

This is where hash functions come into play.

How Hashing Functions Work?

Hash functions take the data(any data from texts, photos, whole bible, etc.) as input and give us an output(Digest) which is unique with respect to its output. If we change even a pixel in this image then the output will be different. This shows its tamper-proof property, hence making IPFS a Self-certifying File System. So if you transfer this image to anybody, he/she can check that if the photo received has been tampered with or not.

Also, you cannot tell what was the input(in this case, cat photo), but just seeing its output(the Digest). So, this also ensures a great amount of security.

Now we pass the Raw image data into SHA256 hash function and get the unique Digest. Now, we need to convert this Digest into a CID(Content Identifier). This CID is what IPFS will search for when we try to get back the image. For doing this, IPFS uses something called Multihash.

To understand the significance of Multihash, consider this situation.

You stored this image on the internet and you have its CID, which you can give to anybody who wants to get this image. Now, what if you discover in future that SHA256 is broken(this would mean that this process is NO more tamper-proof and secure) and you want to use SHA3(to ensure tamper-proofing and security) instead? This would mean changing the whole process of converting your photo to CID, and the previous CIDs will be useless…

This above problem may seem a small issue in this context, but you should know that these hash functions secure TRILLIONS of dollars. All the Banks, National security agencies, etc. use these hash functions to operate securely. Even the green lock that you see beside the address of every site on the browser will not work without it.

In order to solve this problem, IPFS uses Multihash. Multihash allows us to define hashes that are self-defining. So, we can have multiple versions of CIDs, according to the hash function used. We will talk more about it below.

Well, now we have added our photo to IPFS, but this was not the whole story. What is actually happening is something like this:

If the files are bigger than 256 kB, then they are broken down into smaller parts, so that all the parts are equal or smaller than 256 kb. We can see the chunks of our photo using this command:

$ ipfs object get QmQgQUbBeMTnH1j3QWwNw9LkXjpWDJrjyGYfZpnPp8x5Lu

This gives us 3 chunks, each of which is smaller than 256kb. Each of these chunks is first converted into a digest and then into CIDs.

{
    "Links": [
        {
            "Name": "",
            "Hash": "QmYN9f4cRGPReJDSi3YoFTt5eTVS2Jo9ePN3wH3TfgbB8u",
            "Size": 262158
        },
        {
            "Name": "",
            "Hash": "QmTJ1rwQQ7FC4HiwmxS1jFe2eJeb6kyxgRWKGyHjf7nYMN",
            "Size": 262158
        },
        {
            "Name": "",
            "Hash": "QmSEuztdUaJNLGhf3Hrpd9f8eHXftusY8QCbqUbzGv7LNX",
            "Size": 210174
        }
    ],
    "Data": "\u0008\u0002\u0018��, ��\u0010 ��\u0010 ��\u000c"
}

IPFS uses IPLD (IPLD uses Merkle DAG, or directed acyclic graph) for managing all the chunks and linking it to the base CID.

IPLD (objects) consist of 2 components:

  • Data — a blob of unstructured binary data of size < 256 kB.
  • Links — array of Link structures. These are links to other IPFS objects.

Every IPLD Link(in our case the 15 links that we got above) has 3 parts:

  • Name — name of the Link
  • Hash — the hash of the linked IPFS object
  • Size — the cumulative size of linked IPFS object, including following its links

IPLD is built on the ideas of Linked Data, which is actually something that folks in the decentralized web community have been talking about for quite some time. It’s something Tim Berners-Lee has been working on for ages, and his new company, Solid, is building a business around it.

Using IPLD has also other benefits. To see this, let’s create a folder named photos and add 2 photos into it (the cat pic and a copy of the same pic).

De-duplication using IPFS

As you can see both the photos have the same hash(which proves that I haven’t changed anything in the copy of the image). This adds De-duplication property to IPFS. So even if your friend adds the same cat photo to IPFS, he will not duplicate the image. This saves a lot of storage space.

Now, as we have understood what is IPLD, how it works and its significance, let’s get our hands dirty!

Playing With IPLD

In IPFS, IPLD helps to structure and link all the data chunks/objects. So, as we saw above, IPLD was responsible for organizing all the data chunks that constituted the image of the kitty🐱.

In this part, we will create a medium.com like publication system, and link the tags, articles, and authors using IPLD. This will help you to get a more intuitive understanding of IPLD. You can also find the complete tutorial on Github.

Let’s get started!

Before creating the publication system, we’ll be exploring the IPFS DAG API, which lets us store data objects in IPFS in IPLD format. (You can store more exciting things in IPFS, like your favorite cat GIF, but we will stick to simpler things for now.)

If you are not familiar with Merkle tries and DAGs then head over here. If you have an understanding of what these terms mean then continue…

Create a folder name ipld-blogs. Run npm init and press Enter for all the questions.

Now install the dependencies using:

$ npm install ipfs-http-client cids --save

After installing the module your project structure will look like this:

Creating an IPLD format node

You can create a new node by passing a data object into the ipfs.dag.put method, which returns a Content Identifier (CID) for the newly created node.

ipfs.dag.put({name: 'vasa'})

A CID is an address for a block of data in IPFS that is derived from its content. Every time someone puts the same {name: 'vasa'} data into IPFS, they'll get back an identical CID to the one you got. If they put in {name: 'vAsa'} instead, the CID will be different.

Paste this code in tut.js and run node tut.js

//Initiate ipfs and CID instance
const ipfsClient = require('ipfs-http-client');
const CID = require('cids');

//Connecting ipfs http client instance to local IPFS peer.
const ipfs = new ipfsClient({ host: 'localhost', port: '5001', protocol: 'http' });

/*
Creating an IPLD format node:
ipfs.dag.put(dagNode, [options], [callback])
For more information see: 
https://github.com/ipfs/interface-js-ipfs-core/blob/master/SPEC/DAG.md#ipfsdagputdagnode-options-callback
*/

ipfs.dag.put({ name: "vasa" }, { format: 'dag-cbor', hashAlg: 'sha2-256' }, (err, cid) => {
    if (err) {
        console.log("ERR\n", err);
    }

    //featching multihash buffer from cid object.
    const multihash = cid.multihash;

    //passing multihash buffer to CID object to convert multihash to a readable format   
    const cids = new CID(1, 'dag-cbor', multihash);

    //Printing out the cid in a readable format
    console.log(cids.toBaseEncodedString());
    //bafyreiekjzonwkqd7vcfescxlhvuyn6atdvgevirauupbkncpyebllcuh4
});

You will get this CID bafyreiekjzonwkqd7vcfescxlhvuyn6atdvgevirauupbkncpyebllcuh4 . We have successfully created an IPLD format node.

Connecting IPLD objects

One important feature of Directed Acyclic Graphs (DAGs) is the ability to link them together.

The way you express links in the ipfs DAG store is with the CID of another node.

For example, if we wanted one node to have a link called foo pointed to another CID instance previously saved as barCid, it might look like:

{
  "foo": "barCid"
}

When we give a field a name and make its value a link to a CID, we call this a named link.

Below is the code showing how to create a named link.

//Initiate ipfs and CID instance
const ipfsClient = require('ipfs-http-client');
const CID = require('cids');

//Connecting ipfs http client instance to local IPFS peer.
const ipfs = new ipfsClient({ host: 'localhost', port: '5001', protocol: 'http' });

/*
Creating an IPLD format node:
ipfs.dag.put(dagNode, [options], [callback])
For more information see: 
https://github.com/ipfs/interface-js-ipfs-core/blob/master/SPEC/DAG.md#ipfsdagputdagnode-options-callback
*/
async function linkNodes() {
    let vasa = await ipfs.dag.put({ name: 'vasa' });

    //Linking secondNode to vasa using named link.
    let secondNode = await ipfs.dag.put({ linkToVasa: vasa });

    //featching multihash buffer from cid object.
    const multihash = secondNode.multihash;

    //passing multihash buffer to CID object to convert multihash to a readable format   
    const cids = new CID(1, 'dag-cbor', multihash);

    //Printing out the cid in a readable format
    console.log(cids.toBaseEncodedString());
    //bafyreigwftw565twi6kw3azu7hgz2lxoev3um6cjpgbn6lunqp6f3ewpve
}

linkNodes();

You can read data from deeply nested objects using path queries.

ipfs.dag.get allows queries using IPFS paths. These queries return an object containing the value of the query and any remaining path that was unresolved.

The cool thing about this API is that it can also traverse through links. Here is an example of how you can read nested data using links.

//Initiate ipfs and CID instance
const ipfsClient = require('ipfs-http-client');
const CID = require('cids');

//Connecting ipfs http client instance to local IPFS peer.
const ipfs = new ipfsClient({ host: 'localhost', port: '5001', protocol: 'http' });

function errOrLog(err, result) {
    if (err) {
        console.error('error: ' + err)
    } else {
        console.log(result)
    }
}

async function createAndFeatchNodes() {
    let vasa = await ipfs.dag.put({ name: 'vasa' });

    //Linking secondNode to vasa using named link.
    let secondNode = await ipfs.dag.put({
        publication: {
            authors: {
                authorName: vasa
            }
        }
    });

    //featching multihash buffer from cid object.
    const multihash = secondNode.multihash;

    //passing multihash buffer to CID object to convert multihash to a readable format   
    const cids = new CID(1, 'dag-cbor', multihash);

    //Featching the value using links
    ipfs.dag.get(cids.toBaseEncodedString() + '/publication/authors/authorName/name', errOrLog);
    /* prints { value: 'vasa', remainderPath: '' } */
}
createAndFeatchNodes();

You can also explore your IPLD nodes using this cool IPLD explorer. Like, if I want to see this CID: bafyreiekjzonwkqd7vcfescxlhvuyn6atdvgevirauupbkncpyebllcuh4 , I will go to this link:

https://explore.ipld.io/#/explore/bafyreiekjzonwkqd7vcfescxlhvuyn6atdvgevirauupbkncpyebllcuh4

Now, as we have explored IPFS DAG API we are ready to work with IPLD and create our publication system.

Creating a Publication System

We will create a simple blogging application. This blogging application can:

  • Add a new Author IPLD object. An author will have 2 fields: name and profile(a tag line for your profile).
  • Create a new Post IPLD object. A post will have 4 fields: author, content, tags and publication date-time.
  • Read a Post using post CID.

Below is the code implementation for the above goals.

/*
PUBLICATION SYSTEM
Adding new Author
An author will have
-> name
-> profile
Creating A Blog
A Blog will have a:
-> author
-> content
-> tags
-> timeOfPublish
Read a Blog
What more we could do with this?
Try Listing all Blogs for an author. 
Send me solution at hi@simpleaswater.com and get SimpleAsWater T-Shirts 
*/

//Initiate ipfs and CID instance
const ipfsClient = require('ipfs-http-client');
const CID = require('cids');

//Connecting ipfs http client instance to local IPFS peer.
const ipfs = new ipfsClient({ host: 'localhost', port: '5001', protocol: 'http' });

//Create an Author
async function addNewAuthor(name) {
    //creating blog author object
    var newAuthor = await ipfs.dag.put({
        name: name,
        profile: "@SimpleAsWater | @TowardsBlockChain, an MIT CIC incubated startup | https://vaibhavsaini.com"
    });

    console.log("Added new Author " + name + ": " + newAuthor);

    return newAuthor;
}

//Creating a Blog
async function createBlog(author, content, tags) {

    //creating blog object
    var post = await ipfs.dag.put({
        author: author,
        content: content,
        tags: tags,
        timeOfPublish: Date()
    });

    //Fetching multihash buffer from cid object.
    const multihash = post.multihash;

    //passing multihash buffer to CID object to convert multihash to a readable format   
    const cids = new CID(1, 'dag-cbor', multihash);

    console.log("Published a new Post by " + author + ": " + cids.toBaseEncodedString());

    return cids.toBaseEncodedString();

}

//Read a blog
async function readBlog(postCID) {
    ipfs.dag.get(postCID, (err, result) => {
        if (err) {
            console.error('Error while reading post: ' + err)
        } else {
            console.log("Post Details\n", result);
            return result;
        }
    });
}


function startPublication() {
    addNewAuthor("vasa").then((newAuthor) => {
        createBlog(newAuthor, "my first post", ["ipfs", "ipld", "vasa", "towardsblockchain"]).then((postCID) => {
            readBlog(postCID);
        })
    });
}

startPublication();

On running this code and it will first create an author via addNewAuthor, which will return the authors CID. Then this CID will be passed to createBlog function which will return the postCID . This postCID will be used by readBlog function to fetch the post details.

You can create more complex applications using IPLD…

Ok. that’s it for this part. If you have any question, then you can shoot them in the comments.

I hope you have learned a lot of thing from this post. In the next post, we will dive into the naming System of the distributed web, IPNS. So stay tuned…