My learning journey: Strata

Takeaways from the Strata Data conference (London, May 2017)

Image for post
Image for post
Image for post
Image for post
Image Credit: Jackie Niam/Shutterstock
  • They know how to avoid (and understand the importance of) the model overfitting (this is their main priority when it comes to modeling);
  • They use version control (at the end of the day they are scientists and manage their data as scientific logs and notebooks about their experiments).
  • A data engineer: this is the guy maintaining the architecture and making the data available for the data scientist to be used straight away;
  • A business intelligence analyst: this is the liaison between executives & other teams and the data team;
  • A customer intelligence analyst: this guy is in charge of increasing customers satisfaction through the use of data models and communicating with the final users.
Image for post
Image for post
Kim Nilsson from Pivigo presenting her data science journey
Image for post
Image for post
Stages of data maturity
  • Absence of scalability, or project not correctly sized;
  • Absence of C-level or high management sponsorship;
  • Excessive costs and time, especially when people with wrong skill sets are selected (which is more common than you think);
  • Incorrect management of expectations and metrics;
  • Internal barriers (e.g., data silos, poor inter-team communication, infrastructure problems, etc.);
  • Think the work as a one-time project rather than a continuous learning;
  • Data governance, privacy and protection.
  1. Maintain Agility throughout the entire project;
  2. Select the right skills you need. You don’t need by default a machine learning superstar, but you need a stellar team;
  3. Manage the expectation correctly. This will really make your team becoming essential in the company or being discarded in a few months;
  4. Convince the skeptics. Data have value, so convince who does not think so.
Image for post
Image for post
Harvinder Atwal’s analysis (@HarvinderSAtwal)
  • Github accounts and packages look better and have a greater impact than a well-written resume in this world;
  • It creates a competitive advantage in data creation/collection, in attracting talents (because of higher technical branding), and creating additive software/packages/products based on that underlying technology;
  • Data scientists and developers are first of all scientists with a sharing mindset, and part of the industry power to attract and retain talents come from augmenting the academic offer (i.e., better datasets, interesting problem, better compensation packages, intellectual freedom);
Image for post
Image for post
Analysis by Prasad Pore and Gregory Piatetsky-Shapiro (https://www.linkedin.com/pulse/top-20-python-machine-learning-open-source-projects-2016-prasad-pore).
  • It increases the overall value, easy of integration and reliability of internal closed source systems;
  • It lowers the adoption barrier to entry, and gets traction on products that would not have it otherwise;
  • It shortens the product cycle, because from the moment a technical paper is published or a software release it takes weeks to have augmentations of that product;
  • More importantly, it can generate a data network effect, i.e., a situation in which more (final or intermediate) users create more data using the software, which in turn make the algorithms smarter, then the product better, and eventually attract more users.
  • Making codes that others can read and understand is what makes you better developer and scientist. This is something you know only if you have ever done it.
Image for post
Image for post
  • What’s the solution to our broken data landscape? (spoiler: Metadata!)
  • How much is important the causal impact relationship in machine learning modeling and how do we measure it?
  • More data beat a better model? (spoiler again: a big YES!)
  • Instead of using a single big deep learning network, can we use more networks?
  • Can data be useful and perfectly anonymous at the same time?
  • Are big data technologies ethically neutral?

Written by

Research Lead @Balderton. Formerly @Anthemis @UCLA. All opinions are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store