10 Oct 2019
Data science is a relatively new field in the business industry and, thus, it's typical that most businesses who are looking to get into and take advantage of the rising field may not quite know what they want exactly.
"Do I want data science or data analytics in my company?" is a question that is asked most frequently by business owners, so hopefully this article answers that question definitively.
Data science is the process of combining computer science, math and statistics, and domain knowledge to derive insights from data to help businesses and other entities to understand their customers, understand their competition and to make decisions.
Data analytics is the same as data science except for one important detail: data analytics zeroes in on getting their insights based on predefined knowledge and goals by the data analyst.
To differentiate between data science and data analytics, it quite simply comes down to the scope of the issue; data science covers a wider scope than data analytics.
For data analytics as mentioned, it focuses on getting insights based on predefined knowledge and goals. For data science, they go further in their insights. They explore more in-depth and ask more questions to come up with new knowledge and new goals.
So a data analyst would be content analyzing data from one dataset (for example: real estate markets), whereas a data scientist might combine multiple datasets from different sources together to come up with newer insights that a data analyst may not have generated due to the fact that it was limited to only one dataset.
Data science is not limited to predefined knowledge and goals, so they are very much in line with scientists; seeking to discover new knowledge and insights that have not yet occurred to a business.
A short example of this difference is classifying whether it will rain tomorrow or not. Data analytics will gather weather data and determine whether it will rain tomorrow or not.
Straightforward, right? For data science though, it goes further. They may think that rain may affect a business, so they take financial data along with the weather data to see if the rain does affect business performance.
As such in a data science project, (or a data analytics project; at this point it just comes down to preference) there are certain questions that need to be answered, especially for businesses.
Both data science and data analytics answer the "What" question; preliminary questions that show different assumptions, situations and correlations about the dataset. Examples of these kind of questions include: "What was the general trend of sales the past 6 months?" or "What is the total sales number for a specific month in Hong Kong?"
This is because both data science and data analytics need to do exploratory data analysis (EDA) to have an overview of the data. This is important so as to get a better grasp of the dataset and in turn generate insights that are more, well, insightful.
Both data science and data analytics cover the "Why" question as well. These are the investigative type of questions that when answered, can be turned into actionable insights.
As an example, we can ask "Why was this the sales number for a specific month in Hong Kong?" We then use data analysis techniques to find the answers.
For example, looking at the correlations for each feature with the sales numbers to figure out any positive or negative correlations. Both data science and data analytics do this as well.
This is where data science and data analytics start to differ. For data analysis, their questions stop at the whys. Data science covers what's next like, "What would be the sales number in this specific month in Hong Kong for next year?" Data science explores questions that are "out of the syllabus" so it uses more advanced statistical techniques to find out insights and goals that may have not occurred yet to a data analyst.
To illustrate this a bit further, let's create a simple dataset for a supermarket to do some simple data analysis. Let's leave big data analytics for another article.
| Month | Total Sales | Apples sales | Mangoes sales | | ------------- | --------------- | ---------------- | ----------------- | | October 2019 | $100000 | 60000 | 40000 | | November 2019 | $130000 | 65000 | 20000 | | December 2019 | $120000 | 62000 | 30000 | | January 2020 | $90000 | 58000 | 10000 |
In this stage, I want to visualize data so that I can have a better understanding of the data. An example of this is finding out the trend of sales over the months:
Okay, forgive me if the visualizations may not look as nice but if you're more on the artistic side of things, Tableau can help tremendously in making your visualizations beautiful.
I notice that the total sales trend is decreasing. So I start to look at correlations with the other features and I notice that the total sales has a similar trend as the apple sales.
As it turns out, there is a positive correlation between total sales and apple sales. From this if we want to increase total sales, we can reason that we can attempt to invest in marketing for apples to increase apple sales because of the observed positive correlation.
To do sales prediction, we can create a regression model. A regression model can easily be built with Python and find metrics that support our hypothesis. We can also use this model to predict sales numbers.
Coefficient of determination is a measure that determines how well a regression model predicts future outcomes. Since the coefficient of determination is around 0.957, whenever we try to predict a value, the predicted value would be relatively accurate. As such according to the model given that the total sales is $150000, the model predicts that the apple sales would be $67650.
Python code and result to find the coefficient of determination and prediction.
Given its relatively small and simple dataset, this prediction was generated quickly with only a few lines of code. I'd say it's more efficient than doing data analysis in Excel with its multiple clicks and keyboard types.
Hopefully, this article helped you gain a better understanding of the difference between data analytics and data science. Whatever the case may be, the demand for people who can work with data is on the absolute rise. So if you're looking to learn something worthwhile, head over to Xccelerate to find our data science and Python courses in Hong Kong.
10 Oct 2019