Data Immutability

This post is part of my three part series on the fundamental pillars of Blockchains: start here to read the introduction.

Blockchains get data immutability in the most basic way possible; by making lots of copies, and distributing it to all participants. Of course, that's not sufficient to make sure data isn't changed, but it is a simple solution to data loss.

If you want to protect your data, you should start with loss prevention. So, first, all data in a blockchain is typically copied to all participants. This has some implications for what you will and won't store on a chain -- in particular, bandwidth and storage are real concerns.

Aside: these concerns have hit the Bitcoin blockchain hard in 2015 and 2016, and because of cultural difficulties in the Bitcoin community (see: should I hire a Bitcoiner to do blockchain work?) they continue to cause real problems for network participants.

How do blockchains ensure the data, sent to possibly unreliable participants, isn't changed? They use a bit of simple cryptography called a hash function. A hash function is a bit of math that takes a lot of data in, and produces a small amount of data (call it 64 bytes or so on average) that can only easily be generated by the original data. If you take all the data you want to store, and hash that data, then you can check if you have the original data unmodified as long as you keep the hash. When you request your original data, you apply the hash function, and check that the output of the hash equals what you think it should be. If it is the same, you can rely on the data.

If you think "how do I safely store the hashes and check that I have the right ones?" then you're on the right paranoid thinking track that will make you love Blockchain designs.

But, you also are in need of an expert. If you are responsible for hiring and retaining the experts to specify and build this blockchain, the key takeaways are:

  • Keep careful track of what you need to know forever
  • Consider which data, if any could be 'ephemeral'
  • Plan for significant growth of the chain
  • Have a plan for bootstrapping after success. (As of current writing the bitcoin blockchain exceeds 50 gigabytes of data; an initial sync is a large, large job.)