The General Data Protection Regulation, or GDPR, the European Union's most recent efforts to combat the rise of both "fake news" and identity theft, has come from obscurity to becoming one of the biggest issues that anyone involved in data management has faced in years. The ideas behind it are relatively simple - by unifying a set of requirements on data management, the EU hopes to staunch the abuse of data about people that's collected for one purpose then sold for something very different.
The recent scandals with Cambridge Analytica and Facebook very clearly illustrate what GDPR is intended to stop. Before the 2016 US Elections, CA had created a set of game/quizzes that were seemingly innocuous - "Who Would Your Star Wars Character Be?", as an example - but which could, with some processing, correlate a Facebook account with a likely political alignment. A flaw in the services architecture also made it possible to get access to the friends of the individual who took the quiz, which could then be cross correlated to infer the political alignment of others (as well as open them up to chatbots and similar low-level AIs). The nature of social media makes it especially ripe for this kind of abuse, as these "graphs" of related links are central to the way that most social media networks function.
Similarly, as more health and credit records move into the digital realm and the Internet backbone, such records have also ended up being hoovered up by nefarious actors - from organized crime to unscrupulous companies to repressive governments, used for blackmail, character assassination, electoral fraud or outright theft.
GDPR's timeline was begun in 2015, but on May 25th, 2018, the final piece - the adoption of all state's in the EU of GDPR standards - will test the concept and determine whether such a broad form of data governance is even enforceable. GDPR assumes the following mandates (from the GDPR Wikipedia page):
- The data subject has given consent to the processing of personal data for one or more specific purposes.
- Processing is necessary for the performance of a contract to which the data subject is party or to take steps at the request of the data subject prior to entering into a contract.
- Processing is necessary for compliance with a legal obligation to which the controller is subject.
- Processing is necessary to protect the vital interests of the data subject or of another natural person.
- Processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.
- Processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party unless such interests are overridden by the interests or fundamental rights and freedoms of the data subject, which require protection of personal data, in particular if the data subject is a child.
The key to this is the notion of informed consent. Personal data cannot be used without consent, not just for the initial acquisition of content, but for every subsequent use of that content. The ramifications of this are huge. First of all, this means that personal data cannot be "mined" without having specific notification by each person in question that the use of the data is considered acceptable by that person.
The immediate impact will be relatively minimal, because the Internet is already awash in information about everyone, but after the 25th of May, those records, whether of purchases on Amazon or the latest medical report you receive from your providers, will start to become out of date. After a year or two, that information becomes so stale as to be useless. Advertising keywords, tied into your search persona, can no longer be used without your permission, which puts a major crimp in the effectiveness of targeted marketing. In effect, this data must be put behind a firewall, and can only be released if the owner provides a (potentially one-time) key (or, put another way, the data is escrowed).
The first thing that this does is to effectively establish a provenance trail - if you do not have permission to use data, you will not have access to the relevant keys, and as such you become legally liable to lawsuits brought if you abuse that data. Not only will this make corporate data actions such as those that Cambridge Analytica pursued far more expensive, it also will resolve another major problem in the data world - the lack of golden records. A golden record is considered definitive, because it has a clear provenance chain, and represents the most up-to-date snapshot of a person's personal information. Often times, credit records contain incorrect information or fail to contain information that overrules existing data (a person is out of the seven year window associated with bankruptcy liability, as just one of many examples).
One other likely effect of this (something that is already happening at this stage) is that credit card usage is controlled by wallets with browsers. The wallet performs the transaction on the part of the user through a trusted intermediary (such as a bank) without ever letting the merchant have access to the credit card information.
For data that is collected, one of the other aspects of GDPR that has impact is the anonymization clause. This effectively means that data that is made available needs to lose sufficient information so as to protect the identity of the people involved, areas often referred to as PPI (Personal privacy information) or PHI (Personal health information). This is more complex than it sounds, as the data has to be scrubbed in such a way that, given a data set, the identity of that owner cannot be absolutely inferred from that data. If you have a nine digit zip-code, you can usually infer identity to within a house or two, while a five digit code makes identification much harder (typically within 1 of 10,000 people or so).
In effect, what GDPR is intended to do is to shift the control of data from the aggregators of that data to the subjects of that data, and the leverage that it holds is the fact that companies cannot do business in Europe (even if located elsewhere) without following consistent GDPR requirements. This has a major impact on American data collection projects, as it either requires going with the far more restrictive European requirements there or writing off a huge market consisting of millions of Europeans.
Realistically, this is also likely to be one of the areas where distributed ledger technologies (blockchain and its ilk) really come into play, likely even more so than ICOs (which are primarily pure speculation plays). A key part of GDPR is the ability to conclusively confirm identity, as that identity token is essential towards managing transaction histories. I'm not completely convinced that the blockchain architecture by itself is sufficient to this task (and its computational requirements tend to limit its utility) but this is something where a triple store ledger structure could be built in conjunction with the blockchain to actually manage both transactional information and authorization auditing.
It is unlikely, given the current shift away from regulatory control within the US, that there will be similar legislation in that country soon, but especially as most data-centric companies are transnational in scope, this will likely only slow the adoption of a stronger data privacy regime in America, not stop it. Ironically, one thing that will happen within the EU is that while data farming applications will drop, data breach occurrences will also drop (the data is no longer centralized) and data quality will improve. The biggest losers will be advertisers and marketers, who will now need to rethink their business models when consumer data is no longer effectively free to them.
Leave your comments
Post comment as a guest