Becoming a Data Scientist: How to Structure Your Knowledge


Here are two common questions…:

  1. If I want to work in data science, should I aim for depth of knowledge or breadth?
  2. On which subjects should I concentrate?

…and an answer:

In general, we can describe the structure of someone’s knowledge (their skills profile) using one of the following shapes:

  • I shape: a true specialist; one skill to rule them all.
  • T shape: a specialist in one area, yet familiar with other relevant areas.
  • Pi shape: a couple of key areas of interest; skills are fairly widespread.
  • Comb shape: a generalist; multiple areas of interest; typically geared towards collaboration.

The shape to aim for depends on what kind of data scientist you’d like to become. With that in mind, here are the skills profiles of four common types of data scientist:

  1. Data Researcher: T Shape – a focus on statistical techniques with additional skills in mathematics and big data analysis.
  2. Data Developer: Pi Shape – strong programming and machine learning/big data skills. A familiarity with mathematics is also beneficial.
  3. Data Business person: T Shape – strong business skills and a familiarity with statistical techniques.
  4. Data Creative: Comb shape – a broad skill set including programming, statistics, business and machine learning/big data.

(Harris, H., Murphy, S. and Vaisman, M. (2013): “Analyzing the Analyzers”, O’Reilly Media, Inc.)

In broad terms, the more technical, the more T-shaped. If you’re looking for a developer role, go for programming and machine learning/big data techniques. If you’re interested in research, concentrate on machine learning/ big data and practice using statistics in R (or even Python).

It’s interesting that data developers tend to have a Pi shaped skills profile. This probably owes itself to the rapid rise in demand for machine learning/big data manipulation across the industry, in addition to core programming skills.

Getting more specific, the top five technical skills for data-scientists in 2016 are SQL, Hadoop, Python, Java and R:

Skills for 2016

What about the future?

Demand for machine learning skills will undoubtedly continue to rise. It’s a fascinating field and right on the cutting edge. If you’re looking to gain a new skill, machine learning would be a good investment of your time. There is an excellent course on Coursera which can give you some sound training.

Whichever shape and skill-set you go for, data-science is a rich industry with an extremely promising future. It’s perhaps little wonder ‘data scientist’ is the world’s sexiest job.

Quick Summary:

Skill’s profiles take one of four shapes. Ordered from specialist to generalist they are: I, T, Pi and Comb.

Within data-science, the more technical one’s job, the more one’s skills profile approaches a T shape.


  • Researchers: T shape– focus on stats
  • Developers: Pi shape– concentrate on programming and machine learning/big data.
  • Business-people: T shape– business skills and familiarity in technical areas.
  • Creative: Comb shape – broad skill-set: programming, mathematics, stats, business and big data.
  • In future, machine learning skills will be a priority. Currently the most demanded skills in data-science are SQL, Hadoop, Python, Java and R.