Big Data Strategy (Part I): tips for analyzing your data

Francesco Corea
4 min readSep 21, 2016
Image Credit: a-image/Shutterstock

We have seen in a previous post what are the common misconceptions in big data analytics, and how relevant it is starting looking at data with a goal in mind.

Even if I personally believe that posing the right question is 50% of what a good data scientist should do, there are alternative approaches that can be implemented. The main one that is often suggested, in particular from non-technical professionals, is the “let the data speak” approach: a sort of magic random data discovery that should spot valuable insights that a human analyst does not notice.

Well, the reality is that this a highly inefficient method: (random) data mining it is resource consuming and potentially value-destructive. The main reasons why data mining is often ineffective is that it is undertaken without any rationale, and this leads to common mistakes such as false positives; over-fitting; neglected spurious relations; sampling biases; causation-correlation reversal; wrong variables inclusion; or eventually model selection (Doornik and Hendry, 2015; Harford, 2014). We should especially…

--

--

Francesco Corea
Francesco Corea

Written by Francesco Corea

Data science @ Greycroft. Previously @Balderton @Anthemis @UCLA. All opinions are my own.