Data Analysis Without Using Math: An example workflow
In this post, we'll discuss how you can indeed perform data analysis tasks without using anything beyond basic high-school math.
In my previous post, I discussed what kind of mathematics you need to know for the different roles in the data industry, like data analyst, data scientist, and data engineer. Indeed, there are roles and some cases where quite a lot of math is required, but actually, there are also many cases where no math is needed at all (or at least, not much beyond basic high-school math).
In fact, data analysis can be approached without relying heavily on mathematical techniques, especially when dealing with exploratory analysis or initial insights. Let’s consider an example workflow in data analysis to illustrate this point.
An example workflow
So let’s say you have a dataset containing information about customer purchases at an online store. The dataset includes columns such as customer ID, purchase date, product category, and purchase amount.
The steps of a typical workflow in analyzing this data could be the following:
Data Cleaning and Exploration. Start by examining the dataset’s structure and identifying missing values, duplicates, or inconsistencies. Clean the data by removing or filling in missing values and resolving any discrepancies. Explore the dataset visually using charts and graphs. For instance, create a bar chart to visualize the distribution of purchases across different product categories or a line chart to show the trend of purchases over time.
Descriptive Statistics. Calculate basic descriptive statistics to gain insights into the data. For example, determine the average purchase amount, the most frequent product category, or the total number of purchases. Use simple aggregations to generate summary statistics like counts, frequencies, or percentages. This can help identify patterns or trends within the data.
Data Visualization. Utilize data visualization techniques to uncover meaningful patterns or relationships. Create visual representations such as scatter plots, pie charts, or histograms to understand the distribution of data points, identify outliers, or spot any correlations.
Qualitative Analysis. Incorporate qualitative analysis by examining text-based data. For instance, analyze customer feedback or reviews to identify common sentiments or recurring themes. Use text mining techniques such as word clouds or sentiment analysis to gain insights from textual data and understand customer opinions or preferences.
Comparative Analysis. Perform comparative analysis between different groups or segments in the dataset. Compare purchase patterns between different customer demographics (age groups, geographic locations, etc.) or analyze differences in purchasing behavior before and after a promotional campaign. Try to find any obvious or very large differences between the numbers.
Create different tables. One way to start analyzing relationships between different variables is to create contingency tables, or pivot tables. For instance, cross-tabulate the product category with the customer ID to see which categories (like, say, electronics and beauty products) are popular among specific customer segments (based on age groups and gender). Explore insights such as average purchase amount per product category or the total revenue generated per month.
Hypothesis Generation. Based on the insights gained from the analysis, generate hypotheses or possible explanations for observed patterns or trends. These hypotheses can later be tested using more rigorous statistical methods if desired.
Conclusion
As you can see, a data analyst is a generalist which can utilize different data skills like those mentioned above, and combine with suitable domain knowleadge (see also my post about domain knowledge in data science) to yield valuable business insights.
A specialist can rely on more sophisticated mathematics for additional rigour (for example, to answer the question what exaclty is a large difference between given numbers when it’s not obvious, or for hypotheses testing), or to produce forecasts with confidence intervals.
Depending on the type and size of the buisiness you are working on, this could be more or less valuable. Maybe once you see for yourself the value that even a few basic courses on statistics could bring to the table, you can have the motivation to then learn statistics. With the proper motivation, learning becomes much easier.