Blockchain: the end of Big data?

by bold-lichterman

If the production of digital data in the world is on the rise, 90% of this data has been produced in the last two years, Facebook has something to do with it. The largest online social network collected 300 petabytes of personal data since its inception, one hundred times the amount Library of Congress was able to collect in two centuries!

In the age of big data, data is eagerly gathered and analyzed: businesses and organizations use it to personalize services, optimize decision-making processes, predict future trends and more. Today, data is a valuable asset in the global economy.

There is growing concern regarding the privacy of users. Large amounts of sometimes sensitive personal information are stored on the servers of large groups. People have little or no control over this data and fear that they will not be in full control of how it is used. In recent years, the media have repeatedly covered controversial privacy incidents. Among the best-known examples are government surveillance andlarge-scale scientific experiment Facebook which was apparently conducted without explicitly informing participants.

Researchers have developed various techniques aimed at securing privacy and personal data. The arrival of Bitcoin in 2009 and the underlying blockchain provided researchers with a platform naturally geared towards data protection. For example, paradoxically, while Facebook owns registered the most innocuous of interventions on its platform, Bitcoin even allows itself to bypass the rule NYC (Know Your Customer), a must in banking and financial law.

We know the adage: “When a service is free on the web, you are the customer.”

Take the case of any Mobile App. She typically suffers from three flaws in privacy :

  1. data ownership. Very often by agreeing to the terms of the installation, the data becomes contractually property of the application.
  2. data transparency and use control that is made of it. The user loses track of his data.
  3. detailed access control. Upon installation, the user must grant a set of permissions to the app. These permissions are granted indefinitely and the only way to change them is to bulk uninstall the application.

The same mobile application built on a blockchain would have the same interface but a completely different access / control policy for private data.

  1. data ownership. The blockchain app can ensure that users 100% own and control their personal data. The system can recognize users as owners of data and services as guests with delegated permissions.
  2. data transparency and control. Each user sees transparently how data is collected and who has access to it.
  3. detailed access control. At any time, the user can change all permissions and revoke access to previously collected data, even retroactively. This dynamically, without uninstalling the application.

Now imagine having an application Facebook-like on blockchain. It would allow us:

  • to truly own our data

  • to decide which post, image, link we want to display in the public domain, visible to anyone

  • to limit access to our content to certain friends exclusively

  • to know for life who has access to our public data and when.

And yet, if we think about it for a moment, we easily understand that after all, the philosophy of the blockchain is indeed that of tracing.

Blockchains change the rules of the game: less centralization, less authority, more sharing. There is no longer any sensitive data on centralized servers but we always have tracing.

The question is therefore: does the tracing become more acceptable if the surveillance control is carried out by the watched? Just as we have just emerged from a passionate debate on the right to be forgotten, the blockchain emerges supported by a system that can only function to the extent that it forgets absolutely nothing.

In a blockchain, the massive collection of data takes place according to a strictly time-stamped logical sequence, a sequence of blocks that remains indelible for life.

Why therefore freezing data for life in a blockchain would be better for privacy than leaving it on a proprietary server?

Because in a blockchain we can protect user data through complex but feasible software development. In their current design, blockchains cannot deal with privacy at all. Because of their public nature, possibly private data would pass unencrypted through each full node of the blockchain, fully exposed. In addition, blockchains do not tolerate heavy calculations, which reduces the number of conceivable implementations.

Despite these obstacles, researchers already have several ideas on how to add privacy within a blockchain. These solutions aim to make any app as robust as Bitcoin. Bitcoin has been around since 2009 and in 6 years, no one has ever been able to steal a thousandth of a euro while everything is public. For the same reason, it will one day be impossible to read a post / comment that would have been classified as private at the time of configuration.

There is a caveat here: they say that bitcoin is not 100% anonymous. Indeed, as long as we are logged in, for example, to Google with an account that identifies us, we constantly leave traces. Google can launch a data mining algo on our profile and be able to deduce our Bitcoin identity based on the round trips we make to Bitcoin sites / wallets.

So let’s come back to the question of the article: does the blockchain have the possibility of signing the end of big data? Let us first remember that in this article we are talking about data nominative that big data uses to classify users into categories: potential customers, political orientation, tastes, habits … and not anonymous data like those of certain IoT sensors.

The answer is yes, provided as soon as the researchers have a fairly stable solution and Internet users adhere en masse to the next applications. blockchain-based.

Block Apps

Knowing that identities are always protected by private key anyway, it remains to be seen how we store user data. For efficiency reasons, some applications might want to avoid cumbersome encryption / decryption.

We must distinguish two cases:

  1. the data is in clear on centralized servers and the blockchain just keeps references. In this case, the blockchain guarantees that the data will never be nominative. However, the owners of the apps and servers will still have access to them as well as potentially hackers. Big data will therefore always be possible, anonymously.
  2. the data is encrypted, then stored, either on centralized servers, or on systems like IPFS, Swarm, Enigma… In this case, the data is unattainable and big data inconceivable.

In the second case, it would be impossible to access the content marked everywhere as private. So impossible that in the case of the blockchain it is rather the opposite problem that could worry: a pedophile or terrorist group who would hide behind the blockchain would be inaccessible even by the highest authorities of control of a State.

The privacy of data in the blockchain is a pivotal point of their success. Certainly, today the development for privatize the data of an Ethereum style application is quite complex (we know something about it …) but turnkey solutions will undoubtedly arrive.

You can for example take a look at this project of MIT.

Then there is the question of traceability: if in 30 years a quantum computer will make it possible to force private keys that are today inviolable, it will be possible to access the content globally to retroactively read the history of any user.

David TeruzziDavid Teruzzi is a blockchain consultant. He is co-founder of Blockchain Consulting and blogger on Blockchain Café. Still in the blockchain, he is an affiliate developer of the project Decred, a new currency inspired by Bitcoin. In addition, he is also an expert programmer in applied mathematics.

Article originally published on Blockchain Café.

Photo credit: Fotolia, royalty-free stock images, vectors and videos