We’re surrounded by data everywhere. And data is continuing to grow at unprecedented rates. Every day, businesses, individuals, and Internet-connected hardware devices generate huge amounts of data that is pushed onto the Internet.
We now have sophisticated digital plumbing, integration solutions, and analysis engines that allow us to create “Big Data” sets that promise to help us make informed decisions for business, the economy, and the environment.
Big Data? Beware of Big BAD Data
But how do we know all this big data is not just a bunch of Big BAD Data? Consider how “accurate” your own data is. From simple things (errors in your contact list) to important things (erroneous inventory levels or bank account balances), we often tolerate inaccuracies for our own use that would totally frustrate the efforts of the big data analysis engines.
Where does Big Data come from? Can you trust it? Big Data is just aggregated small data, and most small data is, to a greater or lesser extent, BAD data. Yet Big Data has become a powerful business buzzword. There’s a race on today to build, publish, and otherwise exploit Big Data on the part of virtually every large company and government in the world.
The “Internet of Things” is exploding with small sensors, cameras, remote meters, automobile-based telemetry devices, and so on, and these sensors are all transmitting data at an ever-increasing rate. Their data is being captured, and then what? The data is DUMPED into databases—without context and without validation.
Most small data is flawed (databases with inaccurate, missing, out-of-date, miscoded, corrupt, or just plain wrong information). If you look at your own data, maybe in your personal records or even your own accounting system, you’ll probably be able to easily spot an alarmingly large percentage of errors in your data. Some of these errors occur because when the data was recorded, there simply wasn’t enough information to record all of the facts accurately.
So what are we doing to ensure that this powerful capability we have to aggregate terabytes of data is in fact producing correct big data? What about your accounting data, for example—from the AR aging data you report to a credit agency to the credit agency data you use to make business decisions? True, we can and do perform audits to ensure that the original data recorded was correct, and—if it wasn’t—adjust to correct for any errors found.
But wouldn’t it be better if bad data could never even enter the database? Set aside the fact that you may be making a living with accounting or auditing, and imagine a world where we always had GOOD data in the database, and that we could begin with facts from the data and then start working on advisory services, or other high-value services that only humans can do. Let your mind ponder that idea for a while and we’ll come back to it.
Any of the documents on which we rely every day—from patents to contracts to blog posts—all have the following in common. They must be:
- Recorded (on paper in the old world, but increasingly in large, centralized databases—more on that later)
- Securely stored
- Protected from hackers or unauthorized alterations
What happens when this information is wrong? Amending such documents is next to impossible, and because the Internet has no way of erasing incorrect versions, the newer, amended information carries no greater weight than the incorrect, older information.
The Internet can transmit information about documents, but the Internet cannot provide:
- Fraud protection
- Verification of the truth or authenticity of the information
And that’s not all. There is also no way to positively know the order in which information about, or the execution of, each of these items occurred. In every case, the timing of these things matters. What was once true about any one of them may or may not still be true—yet there is no way to definitively know whether the information is still true, or for that matter, whether it was ever true.
While the Internet is a global repository of information (and has changed the world as a result), what we need today is a global, secure ledger of truth—one that is not corruptible by human fraud or subject to manipulation by any group, corporation, organization, or government.
The Promise of Blockchain Technology
Fortunately, some of the best mathematicians, computer scientists, and economists have combined forces to develop this global, secure ledger of truth. It’s called Blockchain technology.
In a nutshell, Blockchain technology is characterized by the following:
- It is a public ledger of transactions between trading partners.
- It is an open source, globally distributed database.
- There is no central authority that controls the blockchain. No bank, government, or company is involved (and so cannot control or even prohibit blockchain). Instead the blockchain is a global network of computers that run open source software to store and update a distributed database.
- Transactions can only be added once they are validated through complex mathematics to prove authenticity of the data:
- This complex mathematical architecture makes the system virtually unhackable.
- The encryption and hashing algorithms secure the data itself and prevent bad data from being added to the database.
But, you may be asking, how could such a public ledger be secure?
Blockchain is an encrypted, distributed public ledger with the following security validations:
- Duplicate copies of the same database among thousands of disinterested, yet incentivized people who solve math problems on their computers to “validate” accuracy and authenticity of the data, as opposed to centralized organizations that can and are being hacked every day.
- By distributing the “workload” of verifying security, we both spread the work and enhance our confidence in the validity of the data. By having multiple, disinterested parties all providing what’s called “proof of work” to validate transactions in the blockchain, we gain huge confidence in the security and accuracy of all data that gets added to the chain.
- “In Math We Trust”—Using encryption and complex hashing methodology, mathematics is the basis of trust instead of humans. Math cannot be corrupted, deceived, or hacked.
For a technical description of Blockchain technology, see Part 1 in this series.
While we may never completely halt the creation and distribution of BAD data, Blockchain technology is one of the most significant developments in decades to combat the big bad data problem by providing a complete “trust layer” for the Internet. Few things in society are more pressing than our ability to record, store, and transmit accurate, trustworthy data.