Chance Insight – Analytics Blog (AI Enhanced)

Author: CB

How to Build a KPI Report with Power BI
The Power BI KPI demo report is a demonstration of how a typical KPI report will look. The report uses data from the fictitious Microsoft Contoso Retail Data Warehouse, which is stored in a SQL Server Database. I’ve renamed it to rn retail for demo purposes.

The report is currently divided into 5 pages.
1. The first page is a typical style KPI report, leading with the financial KPI, then breaking down to the store type, and then some online KPI. As the database is quite old (going back to 2009), it doesn’t have a lot of person-level information, but it does for online (hence online KPIs are included here). The KPIs are presented here in comparison to the previous year’s performance, but targets would also typically be included.
  The month selected allows the user to go back to any point in time in the year (or years if more data was included).
  
  2. The second page covers sales MTD, QTD, and YTD vs. the previous year there are graphs showing the YoY breakdown by store type and a table allowing more drill down into the individual store performance YoY. This interactive tool allows users to quickly pinpoint some of the reasons responsible for higher or lower YoY performance. The measure calculations can be found lower down this page.
  
  3. The third page of the report goes into more detail on stores broken down by country and city and includes the sales per square feet calculation. Again this helps the user explore the data to identify best and worst-performing stores in more detail.
2. The fourth page, shows sales by product category and product, allowing you to drill down from the top category levels. There are then graphs for Monthly and yearly performance.
  
  5. The fifth page focuses on activities. This can be developed further to show actives by activity segment (e.g., new, active, reactive, lapsed, etc. once activity segmentation is added to the model.
  
  The Data Model:
  
  Time Intelligence DAX measures
  The DAX measures for MTD, QTD and YTD and comparisons to the previous year are below:
```
//Month to Date Sales
Sales MTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESMTD('Date'[Date]))

//Quarter to Date Sales
Sales QTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESQTD('Date'[Date]))

//Year to Date Sales
Sales YTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESYTD('Date'[Date]))

//Last Year Month to Date Sales 
Sales LY MTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESMTD('Date'[Date])))

//Last Year Quarter to Date Sales
Sales LY QTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESQTD('Date'[Date])))

//Last Year Year to Date Sales 
Sales LY YTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESYTD('Date'[Date])))

//Variance Month to Date vs. Last Year Sales 
Sales VAR MTD vs. Last Year = [Sales MTD]-[Sales LY MTD]

//Variance Quarter to Date vs. Last Year Sales 
Sales VAR QTD vs. Last Year = [Sales QTD]-[Sales LY QTD]

//Variance Year to Date vs. Last Year Sales 
Sales VAR YTD vs. Last Year = [Sales YTD]-[Sales LY YTD]

//Variance % Month to Date vs. Last Year Sales
Sales VAR% MTD vs. Last Year = ([Sales MTD]-[Sales LY MTD])/[Sales LY MTD]

//Variance % Quarter to Date vs. Last Year Sales
Sales VAR% QTD vs. Last Year = ([Sales QTD]-[Sales LY QTD])/[Sales LY QTD]

//Variance % Year to Date vs. Last Year Sales
Sales VAR% YTD vs. Last Year = ([Sales YTD]-[Sales LY YTD])/[Sales LY YTD]
```
May 5, 2024
Excel Analysis and Forecast of the UK House Pricing

Analysis: UK House Price Index
Data Source: Office Of National Statistics
Period: Monthly
Graph: Showing December Yearly Prices

Linear Trendline
Polynomial Trendline (order 3)

Observations (from the raw data):
Over the 19 years between YE 2005 and YE 2023, average UK house prices have increased from:
£160,209 in 2006 to £284,681 in 2023 or 78% with an average yearly increase of £6,551

The maximum Average House Price was: £288,784 in December 2022.

The drop from 2007 to 2008 of £28,239 is evident and goes against the positive trend relating to the 2007 financial crash. The drop from 2022 to 2023 of £4053 relates to the higher interest rates in the UK and the cost of living crisis.

Trendlines

If we add a linear regression line (left graph), to the data points, we get an R-squared score of 0.85%, meaning 85% of the variability of average house prices is accounted for by year, which is a pretty strong correlation as we would expect.

If we add the forecast based on the linear regression line equation (not included here), we get £303,618 in 2028.
(I’ve used the FORECAST.LINEAR() Excel function to generate the forecasts).

An alternative is to use the polynomial regression line which is used when there are peaks and valleys in the data. In this example, we are using an order 3 polynomial trendline which accounts for 2 peaks or valleys. This gives an R-squared value of 0.96, which appears to be a very good fit, but I’m somewhat hesitant as we’re right at the point of a 2nd drop. One could look back at the years following 2007 and apply the growth to the years following 2022. This would be similar to a seasonal adjustment, but I’m not sure what tool will pick this up at present (and of course, it’s not the same situation).

If we apply the forecasting from the Polynomial regression trendline, we get £389,243 for 2028.

So the linear regression forecast gives £303K for 2018 and the Polynomial regression method gives £389K, that’s a pretty big difference.
As you can see the polynomial regression is less impacted by the peaks and troughs than the regression line.

If we use the Excel forecast tool which uses the ‘AAA Exponential Triple Smoothing (ETS) algorithm’, for the next 5 years, we can see the forecast of £319,647 for 2028, however, the upper and lower (non-bold) lines indicate the confidence levels, which indicate we can the estimate will lie between £204,369 and £434,925 with 95% certainty, which is very uncertain.

Some of the reasons for this are that we only have 18 years (18 data points) to work with, we have 2 dips in the data, and a somewhat flat period, this doesn’t help the model and hence the confidence limit is less certain.

The algorithm is best used for linear seasonal fluctuating data, which isn’t quite what we have here.

To create a better model, we could create a multi-variate model, including additional features such as interest rate, population, housing supply, and region. We can also see if there is more data available pre-2005.

January 1, 2024
How to Create a Power BI Top Customers Report

Most businesses will have sales and marketing departments as well as top-level C-Suite/director levels. They are all going to be interested in a bunch of similar reports from organization to organization.

One of the most important reports, that sales and marketing teams are going to be interested in is:
Who are our best customers?

Top customers often contribute a significant amount of revenue to any organization. The 80/20 rule, that 80% of the revenue is made up of 20% of customers is quite common. Identifying top customers allows teams to segment their audience. Top customers can be given special treatment, such as dedicated account managers and special offers.

In the model from the Contoso database we use here, the DimCustomer table is joined to the FactOnlineSales table, and the FactOnlineSales table is joined to the Date table. We use the FactOnlineSales table rather than the FactSales table as the FactOnlineSales table as it includes a CustomerKey we can join to the DimCustomer table and pull out customer data at the person level (the FactSales table appears to include store-level data only).

To start creating the report, we can drag in the SalesAmount field from the FactOnlineSales table and the FirstName, LastName, and CustomerKey from the DimCustomer table as well as DateFirstPurchased. These are some of the fields a sales team is going to be interested in and they will likely ask for more to be added as time goes by.

Adding the Year field as a filtered visual from the data table, which is already joined to the FactOnlineSales table by the unique date key, allows us to filter the sales amount by year (and we could add months for example, as required).
We filter the report at the person level by adding the CustomerType field in the Filters panel:

You will notice here I just dragged in the SalesAmount field from the FactOnlineSales table, but that’s a bit lazy and we should create a measure for the field, like this:

OnlineSales = SUM(FactOnlineSales[SalesAmount])

The sales amount filter is useful, but teams are likely to be interested in several different sales calculations, which we can calculate using additional measures. For example, we might want to see the current year-to-date sales and the lifetime sales of a customer in the same report as follows:

Calculating Lifetime Sales with DAX

OnlineSales Lifetime = CALCULATE(SUM(FactOnlineSales[SalesAmount]), REMOVEFILTERS('Date'))

Using REMOVEFILTERS(‘Date’), removes all filters applied from the Date table, such as the year that is being used in the drop-down in this report, so the SalesAmount will be calculated for the customer’s lifetime. This now makes the report more interesting. The year filter still applies to the OnlineSales Measure, but does not apply to the OnlineSales Lifetime measure as below:

Comparing year-to-date vs. previous year-to-date is another common request.
To calculate year-to-date sales amount, we use the CALCULATE() function again, but this time combined with the DATESYTD() function:

Calculating Year-to-Date and Previous Year-to-Date Sales with DAX

To calculate Year to date sales we can use the DATESYTD function with CALCULATE:

OnlineSales YTD = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESYTD('Date'[Date]))
and to calculate the OnlineSales YTD Prev Year, we can wrap the DATESYTD() function with the SAMETIMELASTYEAR() function as follows:

OnlineSales Prev YTD = CALCULATE(SUM(FactOnlineSales[SalesAmount]), SAMEPERIODLASTYEAR(DATESYTD('Date'[Date])))

Adding these measures to our report table, we should now have something looking like this :

Here we now have a top customer report for 2009, we can see which customers have brought in the highest revenue this year vs. last year to date, so we are comparing equal periods. We can also see customers’ lifetime value. In this example, Jon Yang is our best online customer, with $61K in online sales. This is equal to his lifetime sales. As you will notice, their first Date of Purchase is 22nd July 2001, which is a long time ago, but the database doesn’t have a record of this transaction as 2009 YTD sales equal lifetime sales. Presumably, this data was not recorded historically, which is often the case with older databases. The data you work with is often not perfect and we need to be aware of these specifics.

Now we have this year’s YTD measure and the previous year’s YTD measure, it is easy for us to calculate the variance % (or difference), between the two years.

Online Sales Var YTD% = ([OnlineSales YTD]-[OnlineSales Prev YTD])/[OnlineSales Prev YTD]

We have an infinity number for the first customer as they had no sales in the previous year, we can tidy that up using the DIVIDE() function, which automatically handles errors and infinity to tidy up the code.
Online Sales Var YTD% = DIVIDE(([OnlineSales YTD]-[OnlineSales Prev YTD]),[OnlineSales Prev YTD])

Let’s now add the month as an additional filter. As you can see the YTD and Prev YTD, are recalculated to show the online sales amount up until the end of the month selected. That’s useful and pretty powerful.

In a real-world scenario, sales teams are going to want the most up-to-date information, so we want to look to add previous days’ sales, week-to-date sales, and month-to-date sales. The issue here is that our data is old, so we need to do some juggling first to bring the Contoso sales data into the future to use for experimentation.
The process of adding a future date is outlined here.

DAX Previous Day Sales Calculation

Once we have the future date in our FactOnlineSales table as in the link above and join it to the date table, our report is brought into the future and we can create a measure calculating yesterday’s sales amount:

OnlineSales Yesterday = CALCULATE(SUM(FactOnlineSales[SalesAmount]), PREVIOUSDAY('Date'[Date]))

As you can see I can now filter on the current year, which is 2023, and remove the month filter.
I can now see the top customers for yesterday if I sort by the OnlineSalesYesterday field.
This is now giving the sales teams up-to-date, daily reporting, so they can keep their finger on the pulse of the business:

Now we’re using more ‘current’ data, we can now see we need Week-to-Date and Month-to-Date measures.

Calculating Week-to-Date Sales can be found here

December 16, 2023
How to calculate Week-to-Date Sales in a DAX Measure

If you look around the DAX functions, you are going to be disappointed. There is no week-to-date function like there is in Tableau. There are different methods you can google, but let’s create our one here from scratch.
In this example, we are using the FactOnlineSales table from the Contoso Retail DW.
Before we start, want can as an option bring the old data into the future, by using the method outlined here. This is not a requirement, but it means we can create live-looking scenarios with sales data up until yesterday.

To verify the measure we are going to create, we can create a fixed date range measure that we can compare our dynamic measure to:

OnlineSales WTD check = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESBETWEEN('Date'[Date], DATE(2023, 12, 11), DATE(2023, 12, 15)))

This measure uses the DATESBETWEEN() function as the filter for the CALCULATE() function. Note, that sales will be inclusive from the start to the end dates. You can change the date range to reflect the current week you are in. The dates used in this example are reflective of the time when I created the function.

To create a dynamic measure, we can start by adding a new measure OnlineSales WTD (week to date).
Let’s start by adding in the TODAY() function

OnlineSales WTD = TODAY()-1
We are going to use the TODAY()-1 as we usually want the data up until the end of the previous day and not to show any partial data for the current day that may be imported to the database.
If we add it to a card visual, we can verify the output as today-1 (at the time of writing):

Next, we can use the WEEKDAY function to get the day number from today. We also use variables here, to tidy up the code:

OnlineSales WTD = VAR ThisDay = TODAY()-1 VAR ThisDayNum = WEEKDAY(currentday) RETURN currentdayNum

The measure now displays 6. Today is the 16th of November 2023 (at the time of writing), which is a Saturday, but we added the -1, which is Friday, so we want it to display 5. To get the WEEKDAY() function to start on a Monday, we can add 2 as the second argument, which sets the first day of the week (1) to Monday and the last day of the week to Sunday (7). You can change this according to your needs.

OnlineSales WTD = VAR ThisDay = TODAY()-1 VAR ThisDayNum = WEEKDAY(currentday, 2) RETURNThisDayNum

This now gives us 5, which is what we want:

Next, we want to calculate the first day of the week, which we can do as follows:

OnlineSales WTD = VAR ThisDay = TODAY()-1 VAR ThisDayNum = WEEKDAY(currentday, 2)
VAR FirstDayOfWeek = thisday -(thisdayNum-1) RETURNFirstDayOfWeek

So now we have the logic for calculating the first day of the week and today. So we want to calculate the sales between these 2 dates. So let’s try and use this logic to create a dynamic WTD measure:

OnlineSales WTD = VAR ThisDay = TODAY()-1 VAR ThisDayNum = WEEKDAY(Thisday,2) VAR FirstDayOfWeek = Thisday -(ThisDayNum-1) VAR MonSales = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESBETWEEN('Date'[Date], ThisDay, ThisDay)) VAR WTDSales = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESBETWEEN('Date'[Date], FirstDayOfWeek, ThisDay)) RETURN IF(ThisDayNum=1, MonSales, IF(ThisDayNum>1, WTDSales))

The code uses two different calculations for WTD. If the previous day is a Monday, it just calculates the Monday sales, if it is after Monday, it calculates the WTD sales

Now let’s say we want to go back to any point in time and get the Week to Date sales, we can tweak the code as follows:

OnlineSales WTD = VAR ThisDay = MAX('Date'[Date])-1 VAR ThisDayNum = WEEKDAY(Thisday,2) VAR FirstDayOfWeek = Thisday -(ThisDayNum-1) VAR MonSales = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESBETWEEN('Date'[Date], ThisDay, ThisDay)) VAR WTDSales = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESBETWEEN('Date'[Date], FirstDayOfWeek, ThisDay)) RETURN IF(ThisDayNum=1, MonSales, IF(ThisDayNum>1, WTDSales))

Now if we add the day filter and select today’s date, the week-to-date measure should still match the check, but we can also go back to any point in time and calculate week-to-date.

Lastly, to calculate the previous year week to date sales, we can create a measure based on our existing OnlineSales WTD measure, but use the SAMEPERIODLASTYEAR() function, to calculate the previous year’s week-to-date, without having to re-rewrite all the code.

OnlineSales WTD Prev Year = CALCULATE([OnlineSales WTD], SAMEPERIODLASTYEAR('Date'[Date]))

December 16, 2023
How to find Max Date with List.Max() in Power Query to Create a Future Date

In the Contoso Retail database, we have old historical data which only goes up until the end of 2009. For our demo reports, we’d like to use more up-to-date data, so we want to add a date field to the FactOnlineSales (and FactSales) tables, so we can mimic more current reports.

To add what we’ll call a ‘future date’ field, we want to find out the number of days from the present day to the last date in the FactOnlineSales table, then we can recalculate all the old dates bringing them forward by this difference.

Getting the Max Date in Power Query

There doesn’t appear to be a direct M code function for max dates, but there is a List.Max() function which works with lists, so we need to create a list from the date field in the FactOnlineSales table. The steps are as follows:

1. Create a list from the date field (right-click on the DateKey field and select ‘Add as New Query’.

2. This will create a list of dates from – here I renamed it to FactOnlineSalesDates

3. We can now reference this list in the FactOnlineSales table using the List.Max() function.
So we need to create a new custom column in the FactOnlineSales table as follows:

We then just need to transform it into a Date type:

That will give us a column containing the max date (the last date), in the FactOnlineSalesTable. The MaxDate is 31/12/2009. We can now use that data in our calculations. Sure we could have just looked for it, but if the data is dynamic, it’s good to be able to calculate it, so we are sure it is correct.

4. Next we can add another custom column containing today’s date and transform it to a Date type as well. We use the Date.From() function, working on the DateTime.LocalNow() function:

5. Next add another custom column to calculate the date difference, between today and the max date in the FactOnlineSales table. We can use the Duration.Days() function to calculate the difference in days between today and the last date in the FactOnlineSales table.

This gives us the number we are looking for.

Now all we need to do is add another custom column to transform the dates into the present day.
We can call this ‘DateFuture’. And we can transform it to Date type.

We’ve added 4 additional columns here to a big table. To limit the impact on performance, we should remove the 3 columns we don’t need, just leaving us with the DateFuture column.

Now close & apply the data from Power Query to load it into the model.
In the model view, we can now edit the relationship between the Date table and the FactOnlineSales table, so they are joined using the DateFuture key, which will bring all the data into the future, when we use filters from the Date table.

If we now look at the FactOlineSales table, we can see the latest date is today (at the time of writing).

In practice a data warehouse is updated overnight, so the latest complete days worth of data should be the day before, so we want to tweak our calculation.

If we go back into Power Query query we can subtract a day from today’s date, using the Date.AddDays()function, so this will feed into the future date calculation.
Date.AddDays(Date.From(DateTime.LocalNow()), -1)

Now we see the data is up to today.

So, this is useful if we want to mimic a real-life scenario, where data is up until yesterday relative to today, but it will be confusing for financial reports if our future date keeps changing, so it makes sense to have a fixed date as well.

So to fix the today’s date field, we can do the following, using the #date function

December 16, 2023
How to install the Contoso Retail Database on SQL Server

There are three steps to get up and running with a Contoso Retail Datawarehouse that you can use for experimenting with Power BI.

Step 1. If you don’t already have access to an installation of Microsoft SQL server you can install one on your local machine.

There are 2 free non-production versions available:

Microsoft SQL Server Express – This includes just the database engine required to get up and run with the database.

Microsoft SQL Server Developer – In addition to the database engine, this also includes SQL Server Analysis services, reporting services, integration services, and Machine Learning services. It’s over a GB, so a larger installation than the Express version. Of note, Analysis services is included in the Developer version, so it’s useful to install this version.

Microsoft SQL Server Express and Developer versions are available with instructions here.
Installation is very straightforward, just follow the instructions.

Step 2. Install SQL Server Management Studio (SSMS). You will use this tool to connect to the SQL Server installation and run queries on the database.
You can also download this from Microsoft – here.

This is the connection information for connecting to the SQL Server developer on my local machine. Note for me if I use localhost\MSSQLSERVER for the server name I get an error. It works on localhost only

For further troubleshooting, if you open the services app, you can check that services are running and start as required (Run > Services.exe)

If you connect ok, you should see a screen like this one. You only need to worry about the Databases folder, so don’t worry about all the other stuff for now. Note: This is the Developer version, the Express version, will be slightly different.

Step 3: The Contoso Retail Datawarehouse database can be downloaded from Microsoft here.
There are two files: ContosoBIdemoABF.exe and ContosoBIdemoBAK.exe .

The ContosoRetailDW.bak file is the raw database, which you can get started with and only requires the database engine (which is included in SQL Server Express).

The ContosoBIdemoABF.exe is the OLAP version of the database, which requires SQL Server Analysis Services. For this, you will need either a full version or the free developer version.

To install the first file. Connect to the SQL Server Database Engine (Open SSMS and select database engine). Right-click on Databases and select ‘Restore Database’.

Select Device from the next screen.

You can then select ‘Add’ to add the file. I placed the ContosoRetailDW.bak file in my C: drive to make it easier to find:

Then just select the file and click OK, OK, and if all goes well, the database will be restored.

Now to check it is really there, just return to the main screen, spin open the ContosoRetailDW database, and check the table is there.

Now you can start running some queries against it and you can connect the Power BI desktop to it.

December 3, 2023
An Example investigation of the Contoso Data Warehouse Tables
For installing and getting started with the Contoso Retail Database, please see this article.

A standard starting point for creating customer insight reports is to combine data from sales with your customer table. In the Contoso Retail DW, we have the FactSales and DimCustomer tables that we can use. The DimCustomer Table has a field called Customerkey, but if we look in the main FactSales table we cannot see a Customerkey.

So what we can do is search all the tables for tables with a field name like ‘CustomerKey’.

Here is the code:
```
SELECT c.name AS 'FieldName'
,(SCHEMA_NAME(t.schema_id) + '.' + t.name) AS 'TableName'
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
WHERE c.name LIKE 'CustomerKey%'
ORDER BY FieldName, TableName
```
The output is below:

We can see FactOnlineSales table does have a CustomerKey.

There is a CustomerType field in the DimCustomer table which segments the data into 2 different types: Person and Company.
```
SELECT customerType
FROM DimCustomer
GROUP BY CustomerType
```
Going back to the FactSales table, we can see there is a StoreKey field, which looks useful and there is a table called DimStore.

A basic join across these tables (see below) returns 306 store names, so we know that we can at least work at the store level. Whether online sales data includes customer-level data is still to be investigated. If so, this will help us structure the reports, so we will potentially have Store vs. Online (with online presumably being a single website for the Contoso company). At this stage, we don’t know yet.
```
SELECT st.StoreName, SUM(s.SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[DimStore] st LEFT OUTER JOIN dbo.FactSales s
ON s.[StoreKey] = st.[StoreKey]
GROUP BY st.StoreName
```
With this new information, we can import the DimStore Table as we know we are going to use that in the report. In this example, we are importing the table directly into the report, but with the Power BI service, we would want to set up the table imports into the Power BU dataflows, so they become available to all reports.

On exploration of the DimStore table, find there is a StoreType field that has 4 rows:
Catalog,
Online,
Reseller,
Store.

We can now have a look at sales under these new segments, to see how relevant they are:

We can use the following query:
```
SELECT st.StoreType, format(SUM(s.SalesAmount),'$0,,M') as SalesAmount
FROM
[ContosoRetailDW].[dbo].[DimStore] st LEFT OUTER JOIN dbo.FactSales s
ON s.[StoreKey] = st.[StoreKey]
GROUP BY st.StoreType
```
This gives us a really good new understanding of how the business is organized.
We can see that there as well as sales via Catalog, Online, and Reseller, there are also Store sales, which may indicate that the database includes sales by store as opposed to sales by customer.

Now, let’s go back to that question about individual customer-level data.
The following query will return sales from the FactOnlineSales table by Customer Type
```
Select [CustomerType], format(SUM(o.SalesAmount),'$0,,M') as SalesAmount
from
[ContosoRetailDW].[dbo].[DimCustomer] c
LEFT OUTER JOIN [ContosoRetailDW].[dbo].[FactOnlineSales] o
ON c.CustomerKey =o.CustomerKey
GROUP BY [CustomerType]
```
So we can see that sales in the FactOnlineSales table by company are over 3 times that of person.

Let’s see if the FactSales Table contains data in the FactOnlineSales table
When we try to join the FactSales table to the FactOnlineSales table, we find there is no relationship between the SalesKey and OnlineSalesKey, which we might expect.
```
SELECT
[SalesKey]
FROM
[ContosoRetailDW].[dbo].[FactSales] s
INNER JOIN [dbo].[FactOnlineSales] o
ON s.[SalesKey] = o.[OnlineSalesKey]
```
The output is blank so there is no relationship here.

So, let’s take all the online sales from the FactSales table by Date and see how closely this matches the sales by date of the FactOnlineSales Table:
We have 2 queries:

FactSales
```
SELECT
[SalesKey]
FROM
[ContosoRetailDW].[dbo].[FactSales] s
INNER JOIN [dbo].[FactOnlineSales] o
ON s.[SalesKey] = o.[OnlineSalesKey]
```
FactOnlineSales
```
SELECT [DateKey], SUM(SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[FactOnlineSales]
group by [DateKey]
```
If we grab the output of these queries and graph them in Excel, we can see the sales are similar, but with differences in the 2 tables.

So now, we just need to figure out what the differences are. Again, if we could reach out to someone in the company, that would be great, but we can’t so we need to figure it out ourselves.

The next step is to get a bit more granular, so select a specific date from the 2 tables and see if we can figure out the differences.

Here is the code. We’re going to see if filtering both tables by the Store name= ‘Contoso Europe Online Store’. This will give us matching sales across the 2 tables.
```
SELECT d.StoreName, SUM(SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[FactOnlineSales] s
LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimCustomer] c
ON s.CustomerKey=c.CustomerKey
LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] d
ON s.StoreKey = d.StoreKey
WHERE DateKey = '2009-01-01' AND d.StoreName = 'Contoso Europe Online Store'
group by d.StoreName
```
```
SELECT s.DateKey, SUM(s.SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[FactSales] s LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] st
ON s.StoreKey = st.StoreKey
WHERE StoreType = 'Online' AND s.DateKey = '2009-01-01' AND st.StoreName = 'Contoso Europe Online Store'
GROUP BY s.DateKey
```
The first query output is as follows:

StoreName SalesAmount
Contoso Europe Online Store 432284.288

The second query output is:

DateKey SalesAmount
2009-01-01 00:00:00.000 521675.102

So that’s a shame there is still a big difference between the 2 there. The next step seems to be go down to the product level, so we try the following two queries:

Both tables also include two fields of interest: ReturnAmount and Discount. It would be good to check they are similar across tables. We can run both queries together.
```
SELECT SUM(s.SalesAmount) as SalesAmount, SUM(s.[ReturnAmount]) as ReturnAmount, SUM(s.[DiscountAmount])
 as DiscountAmount
 FROM
[ContosoRetailDW].[dbo].[FactSales] s LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] st
ON s.StoreKey = st.StoreKey
WHERE StoreType = 'Online' AND s.DateKey = '2009-01-01' AND st.StoreName = 'Contoso Europe Online Store'
GROUP BY s.DateKey

SELECT SUM(s.SalesAmount) as SalesAmount, SUM(s.[ReturnAmount]) as ReturnAmount, SUM(s.[DiscountAmount])
 as DiscountAmount
 FROM
[ContosoRetailDW].[dbo].[FactOnlineSales] s LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] st
ON s.StoreKey = st.StoreKey
WHERE StoreType = 'Online' AND s.DateKey = '2009-01-01' AND st.StoreName = 'Contoso Europe Online Store'
GROUP BY s.DateKey
```
What we see here is that the return amounts are similar but the discount amounts on the FactOnlineSales table are over 3x that of the FactSales table.

The first thing I do is add the ReturnAmount and the DiscountAmount to both queries.
The discount amount from the first two products ordered by sales amount is much higher in the FactOnlineSales table than in the FactSales table. If I add the discount amount and return amount to the sales amount in each table, we it gets much closer.

That’s it for now, I’m pretty convinced all the sales in FactOnlineSales are in the FactSales table.
As the FactOnlineSales key includes a CustomerKey, we can use this table for customer-level analysis, but bearing in mind it is Online sales data only.
November 30, 2023
How to Build a Date Table in Power BI
Probably the most essential table you need in your Power BI model is a date or calendar table.
You join your date table from any other table containing dates, such as your main Fact Sales table.

The advantage of using a calendar table is that it contains associated fields such as Year, Month, Start of Month, Start of Quarter, and relevant Financial Periods that will be bespoke to the business.

The date table should be stored in a data flow so everyone can use it. Here we create one using DAX, which could be exported and imported as a table in the Power Query service. You could also create a date table with Power Query directly. Creating one in DAX gives us practice with Time Intelligence functions.

The steps to create a date table are as follows:
1. Open Table View and select ‘New Table’
2. Start by creating the first column which will be our date key. We can use the CALENDAR() function to auto-create a list of dates in the table. The date range should include the minimum date relevant to your date up to any dates in the future. You could use TODAY() as dates up to today, for the end date, but sometimes for example with forecasting, it can be useful to have future data.
Of course, a big table may slow performance, so just bear that in mind.
```
Date = 
VAR DateStart = DATE(2007, 01, 01)
VAR DateEnd = DATE(2030, 12, 31)
RETURN
CALENDAR(DateStart, DateEnd)
```
3. Once you have your first date column setup which will be the key to joining your other tables, you can then start any other useful columns.

Here are some for starters:
```
Year = YEAR('Date'[Date])
Month = MONTH('Date'[Date])
Day = DAY('Date'[Date])
Quarter = QUARTER('Date'[Date])
StartOfYear = STARTOFYEAR('Date'[Date])
StartOfQuarter = STARTOFQUARTER('Date'[Date])
StartOfMonth = STARTOFMONTH('Date'[Date])
DayOfWeek = WEEKDAY('Date'[Date], 2)
```
Be careful with the day of the week number. It’s always best to double-check against another calendar.
The default is 1 for Sunday, to set the first day of the week, use WEEKDAY(‘Date'[Date], 2).

Once you have the day number setup, which can be useful in itself, you can then create a column for the day name:
```
DayName = 

SWITCH('Date'[DayOfWeek],
1, "Mon",
2, "Tue",
3, "Wed",
4, "Thu",
5, "Fri",
6, "Sat",
7, "Sun")
```
Once we’ve finished creating our data table, we can then join it within the model to the relevant tables, such as the Contoso FactSales table which has a date key. Note, that the Date field in the Date table is unique.
November 29, 2023
How to create Customer Age Bands in Power BI using DAX

An interesting dimension to report on for demographics is the age of customers. Understanding what type of customers buy what types of products can help inform our customer insight that can be used for targeting marketing offers more effectively.

In the DimCustomer table of the Contoso Retail Sales DW, we have a field called: Birthdate, but of course, this is unique for all the customers. What we want to do is group these ages into buckets or age bands, so we can create a more informative report. We can then use the grouping in graphs or as filters as in the graph below:

To create the categorization, I created a new calculated column in the DimCustomer table with the DAX code following the steps outlined below. Note the exact categories you use are up to you or the business. There are no definitive rules on what age bands you should follow, but by studying other reports, you will get the gist of the most common methods.

The first best practice step is to start creating variables.
The _dob variable is assigned the [BirthDate] field.
The _now variable is assigned the date of today, using the TODAY() function.
The _ageadys variable is then assigned the difference in the number of days between the date of birth field and today’s date, which is essential for the persons’ age in days.
The _ageyears variable divides the age in days by 365, to get the age in years.
The section after the RETURN statement then returns the age band category, based on the IF ELSE logic.
It’s important to put the statements in the correct order, starting from the least upwards, and note that there are some blank Birthdates if have used IF(_ageyears = BLANK(), “None”.
I only knew there were some blank dates by visually inspecting the BirthDate columns. If I hadn’t used this the blanks would have been picked up as “<18”), so it’s important to sense check as much as possible.

DAX for Calculated Column (note Age Range is the column name):
Age Range =
VAR _dob = DimCustomer[BirthDate]
VAR _now = TODAY()
VAR _agedays = DATEDIFF(_dob, _now, DAY)
VAR _ageyears = _agedays/365

RETURN
IF(_ageyears = BLANK(), “None”,
IF(_ageyears<18, “<18”,
IF(_ageyears<30, “18 to 29”,
IF(_ageyears<40, “30 to 39”,
IF(_ageyears<50, “40 to 49”,
IF(_ageyears<60, “50 to 59”,
IF(_ageyears<70, “60 to 69”,
IF(_ageyears<80, “70 to 79”,
IF(_ageyears>=80, “80 plus”)))))))))

November 29, 2023
How to Add a last refreshed date in Power BI

It’s very useful for users, administrators, and developers to have the data last refresh data added to Power BI reports as this helps confirm how up-to-date, the data is. Sometimes, when a Power BI file is refreshed, you will notice the data doesn’t actually refresh and there is no error, which can be confusing.

This is how to add the refresh date:

1. Open Power Query and add a blank query

2. In the formula bar of the blank query enter = DateTime.LocalNow(). This function gets the current time. Click on the source on the right-hand side to activate the function.

5. You should now have a second Transform tab, to transform the blank query to a table. Select To Table > Table

6. You can then change the data type to Date/Time

7. Finally just rename the column of the table and the column name to something meaningful, like Last Refresh Date.

8. Then Close and Apply to return to the report view.

9. Add a card to your page and add the field from the Last Refresh date table you just created. Adjust the size and you should get something like this.

11. The Date and time of the last refresh date should now refresh each time you refresh the report (as the last refresh table is refreshed).

November 29, 2023
An Overview of the Contoso Retail Datawarehouse Tables

The learn power bi we need data to work with. The Contoso Retail DW database from Microsoft is freely available, so we’ll use that one. You could just install a copy to a local desktop version of SQL Express.
A description of the dataset from Microsoft is as follows:

‘The Contoso BI Demo dataset is used to demonstrate DW/BI functionalities across the entire Microsoft Office product family. This dataset includes C-level, sales/marketing, IT, and common finance scenarios for the retail industry and support map integration. In addition, this dataset offers large volumes of transactions from OLTP and well-structured aggregations from OLAP, along with reference and dimension data’.

So although, when you work in business, the data warehouse will be different. In a Retail business, it will include various similarities that that of the Contoso data warehouse. Indeed there will still be similarities across other types of organisations, such as Business to Business (B2B) and Charitable organisations. They will all for example have a customer table and a sales/revenue table. Customers will have similar characteristics, such as their first order date, and last order date, and demographics, such as age, gender, and location. Field names might be different, but there will be similarities. Some organizations will have additional datasets bespoke to their industry. Online gambling companies, for example, will have tables for bonus amounts, although this might be compared to rewards given by retail companies.
Another example is VIP/loyalty levels which are also used in many industries. Supermarkets have loyalty programs, Charities, and B2B organizations will identify high-level donors or high-value customers. Gambling Companies will have high-value losing customers.

So let’s have a look at the Contoso Retail Datawarehouse and see what’s in it:

Dimension Tables

dbo.DimAccount
SELECT TOP 100 * FROM dbo.DimAccount
Inspecting the information we can see a list of accounts. This will likely have been pulled from an accounting (Finance Department).

dbo.DimChannel
SELECT TOP 100 * FROM dbo.DimChannel
This table is just a list of 4 channels: Store, Online, Catalog, and Reseller. We would expect to use this information when querying transaction information to add what channel the transactions are from. We would expect to join using the ChannelKey field.

dbo.DimCurrency
SELECT TOP 100 * FROM dbo.DimCurrency
Again a similar table to DimChannel but for currency. We would expect a sales table to include the currency key to which we can join on to add currency to the query.

dbo.DimCustomer
SELECT TOP 100 * FROM dbo.DimCustomer
The most important table (in addition to sales) in the data warehouse. The big list of businesses. customers. We would expect to join this to a sales table, to report on what customers are buying.
Fields in this table include gender, dob, email address, address, phone numbers, and first purchase date. All sensitive personal information, so you need to think about how you work with this data and who has access to it.

dbo.DimDate
SELECT TOP 100 * FROM dbo.DimDate
An extremely useful and important table, that you would expect to be working with a lot in your reporting. By joining the table to sales orders, you can then not just the relevant Calendar Year/Quarter/Month, but the Financial Year, quarter, month, and week, which are often specific to the organization.

dbo.DimEmployee
SELECT TOP 100 * FROM dbo.DimEmployee
A useful table of employee information likely used for a variety of report requirements, not just related to pay, but maybe related to sales.

dbo.DimEntity
SELECT TOP 100 * FROM dbo.DimEntity
A table of global region and country sites, which are relevant to multinational organizations, used for Global group reporting.

dbo. DimGeography
SELECT TOP 100 * FROM dbo.DimGeography
A table with countries and regions, is probably used to classify countries by Global region in some way.

dbo.DimMachine
SELECT TOP 100 * FROM dbo.DimMachine
Appears to relate to certain types of equipment in stores.

dbo.DimOutage
SELECT TOP 100 * FROM dbo.DimOutage
A table of computer outages, possibly related to DimMachine

dbo.DimProduct
SELECT TOP 100 * FROM dbo.DimProduct
A key table, listing all the products presumably for sales, with a product key to join it to the sales table.

dbo.DimProductCategory
SELECT TOP 100 * FROM dbo.DimProductCategory
A table one would expect to join to DimProduct to allow for category level reporting.

dbo.DimProductSubCategory
SELECT TOP 100 * FROM dbo.DimProductSubCategory
Similar to dbo.DimProductCategory but presumably the subcategory level

dbo.DimPromotion
SELECT TOP 100 * FROM dbo.DimPromotion
A table of promotions, we can join it to the sales table and add marketing information to sales

dbo.DimSalesTerritory
SELECT TOP 100 * FROM dbo.DimSalesTerritory
A table of territories, presumably relevant to the sales team, which may also join to dbo.DimEmployee

dbo.DimScenario
SELECT TOP 100 * FROM dbo.DimScenario
A Finance table with keys for Actual, Budget, and Forecast. There should be sales information we can join to, to determine what type of data it is (Actual, Budget, and forecast). Top-level reporting for example will want to show actual revenue against budget and forecast.

dbo.DimStore
SELECT TOP 100 * FROM dbo.DimStore
A table of store names and addresses.

Fact Tables

dbo.FactExchangeRate
SELECT TOP 100 * FROM dbo.FactExchangeRate
A table of exchange rates. With a multi-national organization, the business will want to be able to report in Global sales (perhaps in dollars), so this table will be key. One would expect this to be updated on a regular basis, but it might be just once a quarter.

dbo.FactInventory
SELECT TOP 100 * FROM dbo.FactInventory
An essential table keeping a count of all products in stock.
A table relating to the costs of machines (presumably check-outs)

dbo.FactITSLA
SELECT TOP 100 * FROM dbo.FactITSLA
A table with information about the uptime of machines

dbo.FactOnlineSales
SELECT TOP 100 * FROM dbo.FactOnlineSales
A transaction table of online sales

dbo.FactSales
SELECT TOP 100 * FROM dbo.FactSales
The most important table of all (along with the customers). The transaction table, possibly including the Online sales data. More investigation is needed.

FactSalesQuota
SELECT TOP 100 * FROM dbo.FactSalesQuota
The sales quotas by store, with scenarios set next to them (Actual, Budget, Forecast), are presumably used for reporting on what stores are hitting their targets.

dbo.FactStrategyPlan
SELECT TOP 100 * FROM dbo.FactStrategyPlan
An empty table

So that’s a brief look at all the tables. As you can see we’ve covered data relevant to C-Suite (targets, store performance, global/regional sales), Sales (transaction/region/customer, employee/quotas), Marketing (Promotions), Finance (Accounts, Sales Quotas, exchange rates, costs).

November 6, 2023
An Overview of Basic Descriptive Statistics

Mean
Median
Mode
Min and Max
Variance
Standard Deviation

Descriptive statistics are general statistics used to describe data. Statistics such as average customer value and average sales amount are typically used as Key Performance Indicators for the business.

In the graph below we plot each sales amount for a fictitious electrical store. It could also represent a supermarket’s sales, charitable donations, purchases from a manufacturer, or any other business where you have this kind of data.

Here is the basic spreadsheet for the fictitious Electric Now shop I used to create this data.

This is just a graph created from Excel using a list table of 115 sales amounts. Looking at the data visually, we can make some observations. The average sales look to be around £60 at a guess, there are a few high sales amounts that look like outliers. Outliers lie outside the normal range and so we need to calculate what that normal range is, to determine what to exclude.

Mean
Mean or average is probably the most commonly used descriptive statistic. Everyone understands averages, so they are commonly used throughout analysis and reporting.

In the example above the mean is £74.18, so the guess wasn’t far out. The means is just the total sum of all the sale amounts divided by the count of sale amounts. I’m lazy so I just selected the column of sales amounts in Excel and looked at the average at the bottom of the sheet. This also gives you the total and the count, used to calculate the sum. The formula in Excel looks like =AVERAGE(A2:A116) where A2:A116 is the range of cells holding the range of sales amounts.

Examples of using the mean are as follows:

Average Order Value (AOV) – A commonly used statistic to gauge the average value of orders. Businesses will try and improve this number by increasing prices or selling more to existing customers. Typically the value you be presented along with other KPIs such as Revenue, Number of customers, etc.

Average Customer Value (ACV) – A measure that can be used over time, such as a month, quarter, or year to determine the the average value of customers. This helps planning, in that to achieve x amount of revenue a business may need to sell to x amount of customers with an AOV of x.

Average Open Rate, Average Clickthrough Rate – Common statistics used in email marketing.

Average Daily Revenue, Average Monthly Revenue – Overall averages that a business can use to forecast its monthly revenue. e.g. Monthly revenue = Average daily revenue x 30/31 (assuming a flat sales pattern, which is often not the case). Using trend lines or phasing revenue is a better way of forecasting or breaking down budgets by day, week, or month.

If you use the average to predict the sales for the rest of the month, you can see due to the outlier, you run the risk of grossly overestimating.
Total sales for the week commencing 1st November is £8531, which you can see in the screenshot above. So the daily run rate is £1,219 for the first week in November (£8531/7).
So to forecast the month I could do something like £1219 * 30 as there are 30 days in November. So that gives me £36,560.
Businesses love talking about run-rate as they are always focused on revenue. Using historical data, you should be able to get a picture of the historical run rate, to compare how you are currently performing. Businesses will want to look at this vs. the previous year as seasonality impacts a lot of businesses. November is different from October.

Note: for newbies, you might wonder how you get all the sale amounts. Well, essentially what you’re doing is picking up all the sales receipts and entering the totals onto a list from which you can perform your calculations. In the modern world of cause, this is all captured in a database. Organizations typically develop data warehouses, which take a copy of this information into a safe non-live database. SQL is used to report on the data or to hook the data into platforms such as data cubes or Power BI workflows, where analysts can access the data easily. Note, that you may not have to do all the nitty gritty work to get your data. It all depends on how the business is organized.

Median

Median is less used, but can still be useful when displayed alongside the Medium. An average can be influenced more by high or low outliers, which can distort the picture.

In our Electric Now sales store example, the medium is £61.44. It’s quite a bit less than the average of £74.18 and as we thought, the outliers are skewing the data up a bit. Having flagged the medium you may want to remove outliers to give a forecast of to give you a forecast range.
In Excel, the formula is: =MEDIAN(A2:A116) where A2:A116 is the range of sales amounts.

To manually get the median, sort your sales amounts from lowest to highest and find the number in the middle.

Mode
The mode is useful when you want to know the most frequent points in a set of data.

For example, if you had a subscription option to a website with variable rates available. Looking at the mode will tell you the most common subscription amount.

If your data varies significantly, you might find it helps to group your data into bands in order to create more frequent groups.
For example, if you had 1000 sales amounts with all varying pence amounts, these could be banded into groups of £10-15, £20-25, £30-35, etc. This would also be much easier to graph.

If we pivot the data from the Electric Now shop in the spreadsheet and graph the data, we can see that £39.44 is the most common sales amount, but not by much.
I can also just use the formula =MODE(A2:A116)

As this isn’t very informative we can choose to group, bucket, or band the data together as in the graph below.
From this we can make some better higher-level observations, such as sales amounts are evenly distributed from 25 to 100.
We don’t have many sales below 25.

But to be more accurate, we want to say what % of sales are in these ranges. As in the table below.

Min and Max
Depending on the data in question, the Min and the Max can be useful, when used in combination with the other statistics.
The Min is just the minimum value in your data range and the Max is the Maximum.

In the Electric Now example, the Min is £2.12 and the Max is £412.

What it does tell you is that some orders can be quite small amounts. If you wanted to have a report on the maximum sale amount that you wanted to beat, it would be useful to display the MAX sale amount.

Once you’ve understood all these descriptive statistics work, you can skip ahead by using the Excel data analysis tool-pack (File > More > Options > Add-ins > Analysis Toolpack > Go. You can then summarize the data using the descriptive statistics tool in a table like the one below.
Note the range is here is the difference between the Min and the Max. The standard deviation we can look at next.

Standard Deviation
The standard deviation helps describe how data points are distributed relative to the mean. Are they all close to the mean, do they distribute evenly from the mean in the form of a normal distribution? In a normal distribution, 68% of data lies within 1 standard deviation of the mean, 95% of data lies within 2 standard deviations of the mean and 97.5% of data lies within 3 standard deviations of the mean.

As we saw in the description statistics table, the standard deviation is 59.6, so with our mean of £74.2,
The range of values between £44.4 to £104 are within the first standard deviation of the mean and include 68% of our data.
Likewise, values between £14.6 and £134 are within two standard deviations of the mean and include 95% of our data.
-Lastly, values between -£15 and £163 are within three standard deviations of the mean and include 97.5% of our data.

Manually calculations of the Standard deviation go like this:
1. Sum up all the differences of the values from the mean and square the result. In this example that equals: 404479
2. Next Divide that number by Count -1 or 114. In this example we get 3548 this is the variance.
3. The Standard deviation is the square root of the variance which is 59.6

Plotting all the sales amounts on a single axis and overlaying the standard distribution points (blue is 1st, red is 2nd, and yellow is 3rd – each including the whole of the range in between, we get a graph as below.

October 24, 2023