Mean
Median
Mode
Min and Max
Variance
Standard Deviation
Descriptive statistics are general statistics used to describe data. Statistics such as average customer value and average sales amount are typically used as Key Performance Indicators for the business.
In the graph below we plot each sales amount for a fictitious electrical store. It could also represent a supermarket’s sales, charitable donations, purchases from a manufacturer, or any other business where you have this kind of data.
Here is the basic spreadsheet for the fictitious Electric Now shop I used to create this data.
This is just a graph created from Excel using a list table of 115 sales amounts. Looking at the data visually, we can make some observations. The average sales look to be around £60 at a guess, there are a few high sales amounts that look like outliers. Outliers lie outside the normal range and so we need to calculate what that normal range is, to determine what to exclude.
Mean
Mean or average is probably the most commonly used descriptive statistic. Everyone understands averages, so they are commonly used throughout analysis and reporting.
In the example above the mean is £74.18, so the guess wasn’t far out. The means is just the total sum of all the sale amounts divided by the count of sale amounts. I’m lazy so I just selected the column of sales amounts in Excel and looked at the average at the bottom of the sheet. This also gives you the total and the count, used to calculate the sum. The formula in Excel looks like =AVERAGE(A2:A116) where A2:A116 is the range of cells holding the range of sales amounts.
Examples of using the mean are as follows:
Average Order Value (AOV) – A commonly used statistic to gauge the average value of orders. Businesses will try and improve this number by increasing prices or selling more to existing customers. Typically the value you be presented along with other KPIs such as Revenue, Number of customers, etc.
Average Customer Value (ACV) – A measure that can be used over time, such as a month, quarter, or year to determine the the average value of customers. This helps planning, in that to achieve x amount of revenue a business may need to sell to x amount of customers with an AOV of x.
Average Open Rate, Average Clickthrough Rate – Common statistics used in email marketing.
Average Daily Revenue, Average Monthly Revenue – Overall averages that a business can use to forecast its monthly revenue. e.g. Monthly revenue = Average daily revenue x 30/31 (assuming a flat sales pattern, which is often not the case). Using trend lines or phasing revenue is a better way of forecasting or breaking down budgets by day, week, or month.
If you use the average to predict the sales for the rest of the month, you can see due to the outlier, you run the risk of grossly overestimating.
Total sales for the week commencing 1st November is £8531, which you can see in the screenshot above. So the daily run rate is £1,219 for the first week in November (£8531/7).
So to forecast the month I could do something like £1219 * 30 as there are 30 days in November. So that gives me £36,560.
Businesses love talking about run-rate as they are always focused on revenue. Using historical data, you should be able to get a picture of the historical run rate, to compare how you are currently performing. Businesses will want to look at this vs. the previous year as seasonality impacts a lot of businesses. November is different from October.
Note: for newbies, you might wonder how you get all the sale amounts. Well, essentially what you’re doing is picking up all the sales receipts and entering the totals onto a list from which you can perform your calculations. In the modern world of cause, this is all captured in a database. Organizations typically develop data warehouses, which take a copy of this information into a safe non-live database. SQL is used to report on the data or to hook the data into platforms such as data cubes or Power BI workflows, where analysts can access the data easily. Note, that you may not have to do all the nitty gritty work to get your data. It all depends on how the business is organized.
Median
Median is less used, but can still be useful when displayed alongside the Medium. An average can be influenced more by high or low outliers, which can distort the picture.
In our Electric Now sales store example, the medium is £61.44. It’s quite a bit less than the average of £74.18 and as we thought, the outliers are skewing the data up a bit. Having flagged the medium you may want to remove outliers to give a forecast of to give you a forecast range.
In Excel, the formula is: =MEDIAN(A2:A116) where A2:A116 is the range of sales amounts.
To manually get the median, sort your sales amounts from lowest to highest and find the number in the middle.
Mode
The mode is useful when you want to know the most frequent points in a set of data.
For example, if you had a subscription option to a website with variable rates available. Looking at the mode will tell you the most common subscription amount.
If your data varies significantly, you might find it helps to group your data into bands in order to create more frequent groups.
For example, if you had 1000 sales amounts with all varying pence amounts, these could be banded into groups of £10-15, £20-25, £30-35, etc. This would also be much easier to graph.
If we pivot the data from the Electric Now shop in the spreadsheet and graph the data, we can see that £39.44 is the most common sales amount, but not by much.
I can also just use the formula =MODE(A2:A116)
As this isn’t very informative we can choose to group, bucket, or band the data together as in the graph below.
From this we can make some better higher-level observations, such as sales amounts are evenly distributed from 25 to 100.
We don’t have many sales below 25.
But to be more accurate, we want to say what % of sales are in these ranges. As in the table below.
Min and Max
Depending on the data in question, the Min and the Max can be useful, when used in combination with the other statistics.
The Min is just the minimum value in your data range and the Max is the Maximum.
In the Electric Now example, the Min is £2.12 and the Max is £412.
What it does tell you is that some orders can be quite small amounts. If you wanted to have a report on the maximum sale amount that you wanted to beat, it would be useful to display the MAX sale amount.
Once you’ve understood all these descriptive statistics work, you can skip ahead by using the Excel data analysis tool-pack (File > More > Options > Add-ins > Analysis Toolpack > Go. You can then summarize the data using the descriptive statistics tool in a table like the one below.
Note the range is here is the difference between the Min and the Max. The standard deviation we can look at next.
Standard Deviation
The standard deviation helps describe how data points are distributed relative to the mean. Are they all close to the mean, do they distribute evenly from the mean in the form of a normal distribution? In a normal distribution, 68% of data lies within 1 standard deviation of the mean, 95% of data lies within 2 standard deviations of the mean and 97.5% of data lies within 3 standard deviations of the mean.
As we saw in the description statistics table, the standard deviation is 59.6, so with our mean of £74.2,
The range of values between £44.4 to £104 are within the first standard deviation of the mean and include 68% of our data.
Likewise, values between £14.6 and £134 are within two standard deviations of the mean and include 95% of our data.
-Lastly, values between -£15 and £163 are within three standard deviations of the mean and include 97.5% of our data.
Manually calculations of the Standard deviation go like this:
1. Sum up all the differences of the values from the mean and square the result. In this example that equals: 404479
2. Next Divide that number by Count -1 or 114. In this example we get 3548 this is the variance.
3. The Standard deviation is the square root of the variance which is 59.6
Plotting all the sales amounts on a single axis and overlaying the standard distribution points (blue is 1st, red is 2nd, and yellow is 3rd – each including the whole of the range in between, we get a graph as below.