If you missed the first two parts, I have previously proposed some tips for analyzing corporate data as well as a data maturity map to understand the stage of data development of an organization. Now, in this final article, I want to conclude this mini-series with final food for thoughts and considerations on big data capabilities in a company context.
I. Where is home for big data capabilities?
First of all, I want to spend few more words regarding the organizational home (Pearson and Wegener, 2013) for data analytics. I claimed that the Centre of Excellence is the cutting-edge structure to incorporate and supervise the data functions within a company. Its main task is to coordinate cross-units activities, which include:
- Maintaining and upgrading the technological infrastructures;
- Deciding what data have to be gathered and from which department;
- Helping with the talents recruitment;
- Planning the insights generation phase and stating the privacy, compliance, and ethics policies.
However, other forms may exist, and it is essential to know them since sometimes they might fit better into preexisting business models.
The figure shows different combinations of data analytics independence and business models. It ranges from business units (BUs) that are completely independent one from the other, to independent BUs that join the efforts in some specific projects, to an internal (corporate center) or external (center of excellence) center that coordinate different initiatives.
II. Data Startups vs Data Incumbents
In spite of everything, all the considerations made so far mean different things and provide singular insights depending on the firm’s peculiarities. In particular, the different phase of the company life cycle deeply influences the type of strategy that needs to be implemented.
Although many times smaller companies have structural competitive advantages over bigger players, there is no strong correlation between data maturity and company’s life cycle (e.g., some startups are better than big pharma companies in managing their data and vice-versa). However, startups are obviously more rapid in advancing data maturity steps because they are more agile and because of the different organizational scale.
The important aspect here that I want to highlight is that startups and incumbents need to look at the data problem under two completely different approaches (although with a same final goal). I call these two approaches the retrospective approach and the prospective approach.
The prospective approach concerns mainly startups, i.e., companies that are in business since not that long and that are not producing a huge amount of data (yet). They will produce and gather a lot of data quite soon, though, so it is extremely relevant to set an efficient data strategy from the beginning.
The retrospective approach is instead for existing businesses that are overwhelmed by data, but they do not know how to use them or they may face specific problems (e.g., centralized integration).
The prospective approach (Startups)
A startup is completely free from any predetermined structure, and it can easily establish a strong internal data policy from the beginning adopting a long-term vision, which would prevent most of the data-related future issues. This should not be underestimated, and it requires an initial investment of resources and time: if the firm does it well once, it will get rid of a lot of inconveniences later on.
A well-set data policy would indeed guarantee a lean approach for the startup throughout any following stages. Moreover, young companies are often less regulated, both internally (i.e., internal bureaucracy is lower) and externally (i.e., compliance rules and laws). They do have a different risk appetite, which pushes them to experiment and adopt forefront technologies. Nonetheless, they always have to focus on quality data rather than quantity data to start with.
The retrospective approach (Incumbents)
Bigger companies instead usually face two main issues:
i) They have piles of data and they do not know what to do with them;
ii) They have the data and a specific purpose in mind, but they cannot even start the project because of poor data quality, inadequate data integration, or shortage of skills.
In the first case, they are in the Primitive stage, meaning that they have data but no clue on how extracting any value from them. Since big institutions usually have really tight and demanding job roles, it is sometimes impossible to internally innovate — in other words, they are “too busy to innovate”. Some sector is more affected by this problem (banking/fintech sector for instance) with respect to others (biopharma industry).
I believe a good starting point for this issue is hiring a business idea generator, an experienced high-level individual who becomes a sort of data evangelist and provides valuable insights even without owning a strong technical computer science background. After that, a proper data scientist is essential.
For the second scenario (they have data but cannot use them) I see mainly two solutions:
i) The firm implements from scratch a new data platform/team/culture;
ii) The firm outsources the analysis/problem.
The first scenario is, of course, more robust (if succeeds) and revolutionary for the organization but also much more expensive. If the firm goes with implementing from scratch a new platform/team/culture, it needs to consider a simple cost-benefit analysis:
What is the marginal utility of the new data platform/team/culture with respect to the implementation and running costs?
But, most of all, never forget that it is usually a single individual (or small group of people) who takes this decision in an uncertain and unlikely scenario.
“I am investing a ton of money in something that can — but also cannot with a good probability — have a return in five years time”.
III. Data Science Outsourcing
This brings us to the second solution: outsourcing.
When it comes to choosing whom to outsource to, universities often represent a preferred avenue for big corporations: universities always need funding and they need data for running their studies (and publishing their works). They cost far less than startups, they have a good pool of brains, time, and willingness to analyze messy datasets.
Startups are instead revenue-generating entities and they will cost more to big incumbents, but they often gather the best minds and talents with good compensation packages and interesting applied research and datasets that universities cannot always provide.
In both cases, the biggest issue is anyway about data security, confidentiality, and privacy: what data the company actually outsources, how the third parties keep the data secured, how do they store them, how the decision-making process is structured ( data-driven vs HiPPO, i.e., highest paid person’s opinion), are few of the most common issues to deal with.
Another relatively new and interesting way for big corporations to get some analysis virtually for free and potentially selecting vendors are meetups and hackathons, window-dressings for the firm but a good way to scout people and experiment pilots.
IV. Other Alternatives
There exists also a middle way between complete outsourcing and in-house development called buy-in mentality, which looks at buying and integrating (horizontally or vertically) anything that the company does not develop in-house. It is definitely more costly than other options, but it solves all the problems related to data privacy and security.
Incubators and accelerators can also offer a substitute way to invest less in more companies of interests and dealing with several useful subjects without fully buying any company. The disadvantage of this fragmented investment business, however, is that new companies have a high-risk of failing — and the ‘failing culture’ is not well seen and deeply-rooted within big organizations — and companies need to invest in a team dedicated to select and support the on-boarded ventures.
Finally, it is possible to also design hybrid solutions and an example is given by this two-steps approach: in the first phase, universities can be used to run a pilot or the first two to three worthy projects that can drive the business from a Primitive stage to a Bespoke one. Then, the results can be used to persuade management to invest in data analytics and either build a proper internal team or pursue a different acquisition strategy.
V. Why big data projects fail?
Simple answer: a ton of different reasons.
There are though some more commons mistakes made by companies trying to implement data science projects:
- Lack of business objectives and correct problem framing;
- Absence of scalability, or project not correctly sized;
- Absence of C-level or high management sponsorship;
- Excessive costs and time, especially when people with wrong skill sets are selected (which is more common than you think);
- Incorrect management of expectations and metrics;
- Internal barriers (e.g., data silos, poor inter-team communication, infrastructure problems, etc.);
- Think the work as a one time project rather than a continuous learning;
- Data governance, privacy and protection.
Pearson, T., & Wegener, R. (2013). “Big data: the organizational challenge”. Bain & Company White paper.
Note: the above is an adapted excerpt from my book “Big Data Analytics: A Management Perspective” (Springer, 2016).