Fantastic (data)-Beasts and Where to Find Them: Data Scientists and Data Engineers
What it takes to be a good data scientist (and how to become one)

I. A Philosophical Introduction
There are a great confusion and vagueness around what big data and AI really are, and the technicalities of the data black box have turned the people who analyze huge datasets into some kind of mythological figures. These people, who possess all the skills and the willingness to crunch numbers and providing insights based on them, are usually called data scientists.
They have inherited their faith in numbers from the Pythagoreans before them, so it may be appropriate to fancily name them Datagoreans. Their school of thinking, the Datagoreanism, encourages them to pursue the truth through data and to exploit blending and fruitful interactions of different fields and approaches for postulating new theories and identifying hidden connections.

However, the general consensus about who they are and what they are supposed to do (and internally deliver) is quite loose. By simply browsing job offers for data scientists one understands that employers do not often really know what they are exactly looking for, and this is probably one of the reasons of the apparent shortage of data scientists in the job market (Davenport and Patil, 2012).
II. Data Toolbox and Skill Set
In reality, data scientists as imagined by most do not exist because it is a completely new figure, especially for the initial degrees of seniority. However, the proliferation of boot camps and structured university programs on one hand, and the companies’ increased awareness about this field on the other hand, will drive the job market towards its demand-supply equilibrium: firms will understand what they actually need in term of skills, and talents will be eventually able to provide those (verified) required abilities.
It is then necessary at the moment to outline this new role, which is still half scientist half designer, and it includes a series of different skills and capabilities, akin to the mythological chimera. An ideal profiling is then provided in the following table, and it merges basically five different job roles into one: the computer scientist, the businessman, the statistician, the communicator, and the domain expert.

Clearly, it is very cumbersome if not impossible to substitute five different people with a single one. This consideration allows us to draw several conclusions. First, collapsing five job functions has a controversial effect on productivity because it might be:
i) efficient because the entire value and product chain is concentrated and not dispersed;
ii) risky because a single individual can sometimes be less productive than five different people working on the same problem at the same time.
Second, hiring one specialist should cost less than hiring five semi-specialists, but much more than anyone of them singularly considered (because of his specialization, high-level knowledge and flexibility). Looking at some numbers, though, this does not seem to be reflected in the job market.
III. A Toy-Model for Data Jobs
Using Glassdoor.com, it is possible to notice that on average in 2015 in the United States (i) a computer scientist annually earns around $110,000, (ii) a statistician around $75,000, (iii) a business analyst $65,000, (iv) a communication manager $80,000, and finally, (v) a domain expert about $57,000. On the other side, a data scientist salary median is around $100,000 according to the survey run by O’Reilly the same year (King and Magoulas, 2015).
From the survey it is possible to also notice that an average working week lasts often 40 hours, and they spend twice the time on ETL and cleaning data rather than running proper analysis or creating models.
According to these statistics, and roughly (maybe incorrectly from a practitioner’s point of view) assuming that the rest of their time is equally divided into the other three activities, a data scientist should earn around $92,000. This is, of course, a very approximate estimate, which does not take into account any seniority, differences across industries, etc., and where the domain expertise is computed as the average of marketing ($55,000), nance ($65,000), database ($57,000), network ($64,000), and social media ($41,000) specializations.
But it does convey a broad concept: data scientists seem to be (almost) fairly compensated in absolute terms, but their remuneration is definitely lower if compared to the cost structure they face to become such specialized figure.
It is really expensive in terms of education, effort, and opportunity costs to become a data scientist, and the average job market does not compensate enough a candidate for it.
Well, truth be told, the market is quickly becoming polarized: either you are a top scientist employed by huge companies (and you get paid a ton of money) or you don’t get fairly compensated for the incredible work it took you to enter the data world.
IV. Final Considerations
All the considerations drawn so far point to a few suggestions for hiring data scientists: first of all, data science is a team effort, not a solo sport. It is important to hire different figures as part of a bigger team, rather than hiring exclusively for individual abilities.
‘Data science is a team effort’
Moreover, if a data science team is a company priority, the data scientists have to be hired to stay and not simply on a project-base because managing big data is a marathon, not a 100 metres.
Data scientists have two DNAs
Second, data scientists come with two different DNAs: the scientific and the creative one. For this reason, they should be let free to learn and continuously study from one hand (the science side) and to create, experiment, and fail from the other (the creative side). They will never grow systematically and at a fixed pace, but they will do that organically based on their inclinations and multi-faceted nature. It is recommended to leave them with some spare time to follow their ‘scientific inspiration’.
Big Money is not all that matters
Finally, they need to be incentivized with something more than simply big money. The retention power of a good salary is indeed quite low with respect to interesting daily challenges, relevant and impactful problems to be solved, and being part of a scientific bigger community (i.e., being able to work with peers and publishing their research).
I am also aware I did not spend much time in this post discussing the differences between Data Scientist and Data Engineers because for the sake of this article I considered them as declinations of the same job-paradigm. Nonetheless, you might want to check this post to know more about those two different roles.
References
Davenport, T. H., & Patil, D. J. (2012). “Data scientist: The sexiest job of the 21st century”. Harvard Business Review, 90(10), 70–76.
King, J., & Magoulas, R. (2015). “2015 data science salary survey”. United States: O’Reilly Media, Inc.
Note: the above is an adapted excerpt from my book “Big Data Analytics: A Management Perspective” (Springer, 2016).