Tutorials
Blog
About

Data Analysis

What Data Science is All About, and How to Get Started?

by Martin D. Maas, Ph.D
@MartinDMaas

Last updated: 2021-11-10

In this post, I will give a general view of what I think are the basic skills required to get into the data world, and digress about the many ways to acquire them.

Data Science is Coding, Statistics, and Domain-Specific Knowledge

“When we are in front of a blackboard, we call it statistics; when in front of a computer, it becomes machine learning; and in a business presentation, we refer to the process as artificial intelligence”.

Honestly, let’s stop making things seem more complicated than they actually are. Data Science is about coding, statistics, and domain-specific knowledge, and that’s basically it.

By the way, my formal education has been an MSc and a PhD in Applied Mathematics. I acquired what we now call Data Skills mostly in Computational Statistics courses, using the old-fashioned method of reading books, and doing research. Most of the work I’ve been recently doing lies in the field of remote sensing, trying to make sense of vasts amounts of satellite data.

Basic Professional Roles in the Data Industry

“The world’s most valuable resource is no longer oil, but data”, claimed The Economist a few years ago. Indeed, companies continue to collect increasing amounts of data to serve business needs. And just like oil, raw data isn’t as valuable as “refined” data. Consequently, companies require an increasing number of data-savvy professionals.

Among those data professionals, the most well known are the “Data Scientists”. But new technical roles have emerged as well, such as Data Analysts or Data Engineers.

Data Analyst: Make Sense out of Data

From a technical point of view, this is probably the most basic role out there. However, even basic data analysis skills, combined with sensible domain expertise, can be a very powerful tool in today’s environment.

The skills you should probably look forward to learn are:

Data Science: Make Predictions Based on Data

After data analysis and visualization, the next step is the ability to create predictive models.

Indeed, predictive models have a long history in statistics. They are often referred to as “forecasts”, and statisticians have been making them for well over a century. But of course, it’s the modern tooling what makes this field so exciting.

The required skills here can be probably classified into two categories: mastering classical statistical techniques, and approaching the more advanced computational tools. Important fact: if you don’t know where the classical techniques fall short, you won’t know why you are applying more sophisticated ones, and that’s not a good place to be.

Data and Cloud Engineering: When Data Applications Get Real

Data engineers, like all engineers, are called upon when things “get real”. That is, when models have to be implemented on a large scale, with massive amounts of data, and run in real time.

In order to do this, we have “the cloud”.

The cloud is this amazing technical and commercial innovation that enables the possibility of renting out massive server infrastructure “by the second” (or is it by millisecond already?). This has created so many possibilities that I can’t even get started. Interacting with Data Science applications is just one of this new possibilities.

The Importance of Domain Expertise in Data Science

Arguably, domain-expertise in areas like business, finance, or science and engineering, combined with solid data-skills, are far preferable than purely data-handling knowledge, no matter how sophisticate.

With domain expertise you can:

These matters are undoubtedly crucial when working for an organization.

Also, we should consider the level of maturity of the field of Data Science. Today it is something still new, but there are many ongoing efforts to improve the infrastructure and automate many software development tasks.

In a few years, much of the technical heavy-lifting might well disappear. With domain knowledge, on the other hand, you will be able to target the right business problem to solve with the available data-science techniques, whatever those might be. Demand for such skill will never go away.

So What is the Mathematics of Data Science?

As a mathematician myself, it’d be rather odd not mention the mathematics involved in this post.

However, frankly, I don’t think math should be the main concern of someone how wants to enter the field. As discussed in the previous sections, acquiring so-called “data-skills” (coding and statistics), and combining them with domain-specific knowledge looks far more valuable to me, than trying to focus on the theoretical heavy-lifting. This is specially so as the field matures, and more automatic tools are available for an increasing number of practitioners.

On the other hand, if you already have some background in math, and are looking just to brush-up on what’s most useful, that’s a whole different story.

How to Learn Data Science with Free Content?

There are many ways of learning the data skills we have been discussing. Options vary wildly: there are undergraduate degrees, master’s, online programs, and even lots of free material out there.

However, it’s easy to feel overwhelmed and lost with so many options.

Before moving on, let me just share two free resources on how to learn beginner data-science material for free on Youtube:

Let’s now move on to discuss the elephant in the room, when it comes to online learning:

Why Pay for Online Courses with So Much Free Content?

In the world we live today, most of the existing information and knowledge can be found online for free, in places like Youtube or blogs (like this one!). So, why it could be worth paying anything for content you can get for free? There are actually a few reasons:

What Massive Online Programs Lack

Don’t get me wrong on this one. I don’t think there is any fundamental problem with online learning. I have been an online teacher myself, and I’m absolutely enthusiastic about where digital materials might lead us.

I think, however, we have to acknowledge that the massive scale that commercial online learning companies are striving for comes with certain unavoidable limitations, when compared with a more personal, one-on-one, experiences we have in a traditional classroom.

Conclusion

Starting a career in the Data Industry can be quite competitive. But as the data usage by companies continues to grow, so will the demand for data-savvy professionals.

So whether you choose a DIY approach based on free online material, enroll in online training programs, or even go for a full University degree, I hope the experience will be rewarding for you!

Of course, if you want to move past intermediate-level, and you want to move to more advanced roles withing the Data industry, you’re probably going to switch back to a DIY approach and read books, and/or consult free online content in blogs like this one – for example, if you are interested in Julia, an up-and-coming programming language for Scientific and Statistical Computing (see my tutorial series).

Ask me a question or send me your comments!

Don't hesitate to ask me any question about the topics I cover on this blog!

Click here to reach out!


Selected Posts