I. The problem(s)
Data security represents one of the main problems of this data yield generation, since a higher magnitude of data is correlated with a loose control and higher fraud probability, with a higher likelihood of losing own privacy, and becoming targets of illicit or unethical activities. Today more than ever a universal data regulation is needed — and some steps have already been taken toward one (OECD, 2013). This is especially true because everyone claims privacy leakages, but no one wants to give up on the extra services and customized products that companies are developing based on our personal data.
It is essential to protect individual privacy without erasing companies’ capacity to use data for driving businesses in a heterogeneous but harmonized way. Any fragment of data has to be collected with prior explicit consent, and guaranteed and controlled against manipulation and fallacies. A privacy assessment to understand how people would be affected by the use of data is crucial as well.
II. Fairness and Data Minimization
There are two important concepts to be considered from a data protection point of view: fairness and minimization.
Fairness concerns how data are obtained, and the transparency needed from organizations that are collecting them, especially about their future potential uses.
Data minimization regards instead the ability of gathering the right amount of data. Although big data is usually intended as “all data”, and even though many times relevant correlations are drawn out by unexpected data merged together, this would not represent an excuse for collecting every data point or maintain the record longer than it is required to. It is hard to distinguish in this case the best practice, and until a strict regulation will not be released, an internal business practice related to industry common sense has to be used.
III. Shared and Sharing Data
People may not be open to share data for different reasons, i.e., either lack of trust or because they have something to hide. This may generate an adverse selection problem that is not always considered because clients who do not want to share might be perceived as they are hiding something relevant (and think about the importance of this adverse selection problem when it comes to governmental or fiscal issues). Private does not mean necessarily secret, and shared information can still remain confidential and have value for both the parts — the clients and the companies.
This is a really complicated matter, especially if we ponder for the possibilities of a stricter regulation to raise excessive data awareness in individuals. In the moment in which customers will understand the real value laying in their personal data, they will start becoming more demanding, picky, selective, and eager to maximize their own data potential. This will probably impact some businesses and industries, which are currently based on the freeconomis model — the clients incur in zero costs in exchange of alternative sources of revenue for the firm (“if it is free, it means you are the product”).
IV. Blockchain and Data Security
An important achievement regarding this matter could be reached for instance using blockchain — and the auditing use case is a great example of that.
The blockchain is basically made by three components, i.e., a distributed database, an append-only structure, and a cryptographic secure write permissions system.
This would be translated into allowing two people to have a conversation
(i) without needing a server;
(ii) without knowing each other and having to verify if they are who they claim to be; and finally
(iii) to make the conversation public without unpleasant consequences.
In other words, the blockchain technology will secure the data, it will confirm the parties’ existence, and provide tailored access to specific piece of information to different actors. The important effect of the use of blockchain in relation to big data is that the data could be verified once-for-all at the beginning, and therefore is often not necessary to transfer them anymore — even within virtual spaces such as the clouds. In this way, some verified personal data will not rely anymore on the original maker, but they will become ‘truth in stone’. This would improve the configuration of the trust relation between data providers, users, and final clients, eliminating the loop structure that would consider a control mechanism.
V. New Technologies and Bigger Repositories
A special attention has to be put as well on new technologies, as for instance Hadoop. Hadoop was not designed taking into consideration a high-security level, and this is the reason why it could be highly subjected to unauthorized access and why it has generated problems related to the data origin.
But this is also why organizations have to adopt a security-centric approach and focus on increasing the security of their infrastructures (especially in distributed computing environments), protecting sensitive information and implementing a real-time monitoring and auditing process. In particular, it may be necessary to structure accesses on layers, asking for authorization and guarantying privileged access for specific users.
Bigger repositories increase the risks of cyber attacks because they come with higher payoffs for hackers or tech scoundrels. Each different source contributes to big data repositories, which means different points of access to be secured, and therefore a good infrastructure should be able to balance a flexible data extraction and analysis with a restricted unauthorized access technology. Besides, the cloud is more likely to be attacked. The server configurations may not be consistent, and gaps can be found in them, so an extra care in distributed servers is recommended.
Several solutions exist to some of these problems, and many others are being studied during these days. It is a vicious circle, though, because for each problem a countermeasure can be found, but is never fully conclusive and always subject to new gaps. Current working solutions for enhancing a data security are monitoring as much as possible the audit logs; establishing preventive measures (inactive accounts deactivation, maximum failed login attempts, stronger passwords, extra secured configurations for hardware and software etc.); using only secure tested open-source software.
VI. Data Ethics
A big role will be played by the ethic behind big data, i.e., to what extent companies are going to push the data boundaries and to dig into people lives.
A lot can be said about the philosophical implications of big data and their relationship with human ethic (Zwitter 2014), but from a practical point of view, every company should create data ethic internal guidelines, which should be displayed on the company website as well. They should develop a data stewardship, a code of professional conduct, which has to convey few main aspects: transparency (what data are used and how); simple design (simple adjustments to privacy settings if wanted); win-win scenario (make sure the customers get value from the data they provide).
But above all, the golden rule — that may sound biblical — is “don’t collect or use personal data in a manner you wouldn’t consider acceptable for you”.
VII. Data Ownership
Finally, the million dollars question is:
Does a customer keep the ownership of his own data, or he loses it in the moment he accepts the company’s terms and conditions?
There is not a straightforward answer to this question, but my proposal is the following:
- Customers should retain the ownership of raw data initially provided; the use should be granted to the company, and cannot be withdrawn unless specific circumstances that could negatively impact the individual occur.
- The company keeps the ownership of “constructed data”, i.e., data obtained manipulating the original one, and that cannot be reverse-engineered to infer the raw data.
OECD. (2013). OECD guidelines on the protection of privacy and transborder flows of personal data, (pp. 1–27).
Zwitter, A. (2014). “Big data ethics”. Big Data & Society, 1(2), 1–6.
Note: the above is an adapted excerpt from my book “Big Data Analytics: A Management Perspective” (Springer, 2016).