What is Data Science?

A multidisciplinary field that combines statistics, mathematics, computer science and domain knowledge to extract insights and knowledge from data.

Unfortunately data science is full of new terminology that often has no agreed meaning, so we see lots of confusion around things like AI, Big Data, Data Mining.

We do things like Exploratory Data Analysis in the company I work with, we do that as a way of investigating the data, digging in, trying to find correlations but when a customer first reaches out to us, we find that they confuse this with other buzz words. So we may get something like a request to build a dashboard to help show what the results of a survey look like in a way that communicates a targetted message.

The request often comes across to us as something like this;

Can we use artificial intelligence to mine the results from a survey so that we can create a dashboard for senior management?

This is a bit of a mouthful, but we've seen things similar to this on a number of occasions, we don't blame the customer, we blame the media and the industry for not using common agreed terminology.

data-science

In a situation like this, we'd reach out to the customer for a follow-up meeting where we don't go in telling them they have the terminology wrong of course, but we ask what it is they have and what it is they want to acheive.

Understanding the problem and the domain are equally important

One thing that is often overlooked is that the domain knowledge is of significant importance to the successful delivery of a data project. If we're blindly given access to a large volume of data we could go to work and produce statistics showing how the data correlates, but without domain knowledge we could be missing some important insight.

Data is nearly always valuable, but with domain knowledge, its value increases.

Equally by spending time with the client, we are better able to understand the end goal. Sometimes the goal is quite simple, they just need a way of condensing a large volume of data into a format that works well for the upper-management, but we see as many requests where the aim is to get the data and look for ways in which it supports a hypothesis.

The hypothesis

Companies run surveys all the time, a yearly internal survey can be used to get a feel for the mood within the company, some companies want the honest raw feeling and some want that meaning to be there but to be massaged so that it shines less light on the negatives.

That's not to say they want to lie, but they want to use visualisations that draw the eyes to the positives more so than the negatives.

We've even seen surveys that have been run from different divisions within a single organisation that once you understand a little more about the aim of the survey and see the responses, it seems like a thinly vieled attack on a different division of the same company.

Most companies don't use external consultants for Data Science

It should be noted that my employer will normally be involved more in the Data Analysis scope of work rather than data science because most companies would use their own internal people for data science.

As I said earlier, domain knowledge is key to successful data science, so what we often find is that companies may bring external parties in for the data analysis part of the data pipeline. The pipeline itself is what the data scientist came up with.

So what is Data Science?

A multidisciplinary field combining;

  • Statistics
  • Mathematics
  • Computer Science
  • Domain Knowledge (or Domain Expertise)
  • Extracts relevant insights from data

Data Science is about predicting the future.

We usually work with clients in the Data Analysis phase of the Data Pipeline.

A Data Pipeline

A data pipeline might look something like this;

data-pipeline

Data Analysis

This is where we look at how things are connected, we base this on historic data. We ask How?

We rely on historical data to see what has happened, from this we can make more informed decisions.

Why

Why is the question that Data Scientists want to answer, again relying on historic data, but asking why instead of How and using that to make predictions for the future.

Both are valid and often take place in conjunction.