The Digital Data Hike
Stroll forward from the birth of computing through the mists of time, and you witness more and more data making its way onto computers with every passing year. At first, it was just structured data, then text, then graphical data and images and then sound and eventually video. The capacity to create, capture and store digital data kept on increasing as almost everything spiraled down into the black hole that is the digital world. A vast amount of personal data ended up digitized.
Few people thought there was a need to stake a claim on their personal data. As a consequence, it was generally assumed by the business world that if they captured data or bought data from data brokers, they had a right to use it in whatever way they pleased. It was their data, not your data.
Nobody in the US cared much until various bad actors, from the denizens of the dark web to businesses like Equifax and Facebook, began to act badly; mismanaging your data, abusing your data, losing your data. Personal data was suddenly news.
And then came the slow-motion GDPR earthquake. It came as a shock for the hundreds of thousands of businesses, who hadn’t been keeping their eyes on Europe — by which I mean American firms.
“Gosh,” they said, “people can regulate for data privacy.”
And “Dammit,” they said, “these regulations apply to me.”
So here we are. Let me share some thoughts about personal data.
The Personal Data Situation
Personal data comes in three flavors. These are the first two:
- Data that can identify who you are: This includes official documentation (driving license, birth certificate, etc.), biographical information, your living situation, your looks, and appearance — so also photos and videos you appear in, your education and employment, your health and genetic data.
- Data involving you: This is any data that includes you but does not relate directly to identity. For example, data is gathered when you visit a website, you eat at a restaurant, or you vote in an election.
Many things you do are captured digitally. You may volunteer some of this data via Facebook or Twitter. Also, your data is captured by cell phone apps or by websites or by surveillance cameras, often without your knowledge.
Let’s imagine you could capture all that data for yourself. It will probably include data that you would not want others to see. Many people would not want their medical data to be available. Many would not want their location tracked. Some might not want their ownership of particular investments to be known. And so on.
There may be many reasons you want your data to be private, but in the end, it doesn’t matter why. It’s your data, dammit!!
The Deductive Data Conundrum
Personal data exploitation is not the result of a massive conspiracy. It wasn’t as though a group of super villains got together and plotted to steal people’s data from under their noses and squeeze a fortune out of it. Companies like Google, Facebook, LinkedIn, Equifax and so on, had business ideas (ad-based search, ad-based social network, business network, credit scoring) and they gathered the data they needed to pursue the business.
They hired smart guys, and the smart guys found ways to improve the profit that these businesses could make. And if any of the improvements involved gathering more personal data, then naturally — as there was no law against it — they did that.
They never assembled anything like a full collection of your data; they just grabbed what their business could use — and, of course, some of them sold your data to others. That was the data you know is yours.
But there is also data that you do not know is yours. This is deductive data.
Deductive Data Is The Third Type of Personal Data
When someone has a collection of your data, they can deduce many things about you that are not explicitly in your data and which you maybe never knew. The biggest area of concern here involves the application of psychometrics (which means “measuring people’s minds”).
An expert in the field, Michal Kosinski was behind the psychometric analytics that was deployed by the Trump campaign, courtesy of Cambridge Analytica, to influence the 2016 election — using your data from Facebook.
Let’s examine the Facebook data story.
In 2012, Kosinski demonstrated something surprising. Using an average of just 68 Facebook “likes” by a user, you could predict their skin color (95% accurate), their sexual orientation (88% accurate), and their affiliation to US political parties (85% accurate). It was also possible to predict: intelligence, religious affiliation, alcohol, cigarette and drug use, with high levels of accuracy. It was even possible to deduce whether someone’s parents were divorced.
The way the analytics works is that the data analyst builds psychographic profiles based on your preferences and these profiles are used to predict inclinations, opinions, and facts about the individual.
A worrying aspect of psychographic modeling is that people rarely know what their psychographic profile is. It is data about them that they do not know. You might think “Well that data is only predictive,” but the predictions are disturbingly accurate.
If there is data you know about yourself that you do not want revealed, there is almost certainly data about yourself that you do not know that you would not want revealed if you knew it.
And that should be a gargantuan data privacy issue.
The Cold Turkey Dynamic
Psychographic modeling is a relatively new field, so it will probably become more accurate as it grows. What is already claimed for its ability leads me to believe:
- We should never let any business that analyzes our data retain our data. In fact, we should be the sole repository of our personal data and in full control of access to it.
- Where we allow access and analysis of our data, we need a contract (smart contract) that fully controls what is done with it. In particular, all deduced data, if we allow its creation, must be given to us and never retained.
The good news is that this is possible using blockchain technology. The bad news is that it is not available right now.
As for those businesses whose foundation is data exploitation. They will have to change their business dramatically; they cannot have the data.
Cold turkey will be the cure.