# Data Science Skills Set

## Detailed list of skills for data scientists

This does not want to be an exhaustive list of skills for data scientists because the field is moving at a stellar speed (and a tool that is relevant today might not be relevant in six months). It is rather an attempt to provide an extensive list of skills and tools that are useful in developing data science projects, and of course not owning one of those skills do not preclude a data scientist to be identified as such.

**Programming: **R, Python, Scala, JavaScript, Java, Ruby, C++, C#.

**Statistics and Econometrics: **probability theory, ANOVA, MLE, regressions, time series, spatial statistics, Bayesian Statistics (MCMC, Gibbs sampling, MH Algorithm, Hidden Markov Model), Simulations (Monte Carlo, agent-based modeling, etc.)

**Scientific approach: **experimental design, A/B testing, technical writing skills, Randomized Controlled Trial

**Machine Learning: **supervised and unsupervised learning, CART, algorithms (Support vector Machine, PCA, GMM, K-means, Deep Learning, Neural Networks), machine learning packages (Pandas, NumPy, SciPy, etc.) and artificial intelligence packages (Tensorflow, H2O, etc.)

**Mathematics: **Matrix algebra, relational algebra, calculus, optimization (linear, integer, convex, global)

**Big Data Platforms: **Hadoop, Map/Reduce, Hive, Pig, Spark, Storm, Cassandra

**Text mining: **Natural Language Processing, LDA, LSA, Part-of-speech tagging, Parsing, Machine Translation

**Visualization: **graph analysis, social networks analysis, Tableau, ggplot, D3, Gephi, Neo4j, Alteryx

**Business: **business and product development, budgeting and funding, project management, marketing surveys, domain/sector knowledge

**Systems Architecture and Administration: **DBA, SAN, cloud, Apache, RDBMS

**Dataset Management:**

• **Structured Dataset: **SQL, JSON, BigTable

• **Unstructured Dataset: **text, audio, video, BSON, noSQL, MongoDB, CouchDB

• **Multi-structured Dataset: **IoT, M2M

**Data Analysis: **feature extraction, stratified sampling, data integration, normalization, web scraping, pattern recognition

*Note: the above is an adapted excerpt from my book “Big Data Analytics: A Management Perspective” (Springer, 2016).*