For installing and getting started with the Contoso Retail Database, please see this article.
A standard starting point for creating customer insight reports is to combine data from sales with your customer table. In the Contoso Retail DW, we have the FactSales and DimCustomer tables that we can use. The DimCustomer Table has a field called Customerkey, but if we look in the main FactSales table we cannot see a Customerkey.
So what we can do is search all the tables for tables with a field name like ‘CustomerKey’.
Here is the code:
SELECT c.name AS 'FieldName'
,(SCHEMA_NAME(t.schema_id) + '.' + t.name) AS 'TableName'
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
WHERE c.name LIKE 'CustomerKey%'
ORDER BY FieldName, TableName
The output is below:
We can see FactOnlineSales table does have a CustomerKey.
There is a CustomerType field in the DimCustomer table which segments the data into 2 different types: Person and Company.
SELECT customerType
FROM DimCustomer
GROUP BY CustomerType
Going back to the FactSales table, we can see there is a StoreKey field, which looks useful and there is a table called DimStore.
A basic join across these tables (see below) returns 306 store names, so we know that we can at least work at the store level. Whether online sales data includes customer-level data is still to be investigated. If so, this will help us structure the reports, so we will potentially have Store vs. Online (with online presumably being a single website for the Contoso company). At this stage, we don’t know yet.
SELECT st.StoreName, SUM(s.SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[DimStore] st LEFT OUTER JOIN dbo.FactSales s
ON s.[StoreKey] = st.[StoreKey]
GROUP BY st.StoreName
With this new information, we can import the DimStore Table as we know we are going to use that in the report. In this example, we are importing the table directly into the report, but with the Power BI service, we would want to set up the table imports into the Power BU dataflows, so they become available to all reports.
On exploration of the DimStore table, find there is a StoreType field that has 4 rows:
Catalog,
Online,
Reseller,
Store.
We can now have a look at sales under these new segments, to see how relevant they are:
We can use the following query:
SELECT st.StoreType, format(SUM(s.SalesAmount),'$0,,M') as SalesAmount
FROM
[ContosoRetailDW].[dbo].[DimStore] st LEFT OUTER JOIN dbo.FactSales s
ON s.[StoreKey] = st.[StoreKey]
GROUP BY st.StoreType
This gives us a really good new understanding of how the business is organized.
We can see that there as well as sales via Catalog, Online, and Reseller, there are also Store sales, which may indicate that the database includes sales by store as opposed to sales by customer.
Now, let’s go back to that question about individual customer-level data.
The following query will return sales from the FactOnlineSales table by Customer Type
Select [CustomerType], format(SUM(o.SalesAmount),'$0,,M') as SalesAmount
from
[ContosoRetailDW].[dbo].[DimCustomer] c
LEFT OUTER JOIN [ContosoRetailDW].[dbo].[FactOnlineSales] o
ON c.CustomerKey =o.CustomerKey
GROUP BY [CustomerType]
So we can see that sales in the FactOnlineSales table by company are over 3 times that of person.
Let’s see if the FactSales Table contains data in the FactOnlineSales table
When we try to join the FactSales table to the FactOnlineSales table, we find there is no relationship between the SalesKey and OnlineSalesKey, which we might expect.
SELECT
[SalesKey]
FROM
[ContosoRetailDW].[dbo].[FactSales] s
INNER JOIN [dbo].[FactOnlineSales] o
ON s.[SalesKey] = o.[OnlineSalesKey]
The output is blank so there is no relationship here.
So, let’s take all the online sales from the FactSales table by Date and see how closely this matches the sales by date of the FactOnlineSales Table:
We have 2 queries:
FactSales
SELECT
[SalesKey]
FROM
[ContosoRetailDW].[dbo].[FactSales] s
INNER JOIN [dbo].[FactOnlineSales] o
ON s.[SalesKey] = o.[OnlineSalesKey]
FactOnlineSales
SELECT [DateKey], SUM(SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[FactOnlineSales]
group by [DateKey]
If we grab the output of these queries and graph them in Excel, we can see the sales are similar, but with differences in the 2 tables.
So now, we just need to figure out what the differences are. Again, if we could reach out to someone in the company, that would be great, but we can’t so we need to figure it out ourselves.
The next step is to get a bit more granular, so select a specific date from the 2 tables and see if we can figure out the differences.
Here is the code. We’re going to see if filtering both tables by the Store name= ‘Contoso Europe Online Store’. This will give us matching sales across the 2 tables.
SELECT d.StoreName, SUM(SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[FactOnlineSales] s
LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimCustomer] c
ON s.CustomerKey=c.CustomerKey
LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] d
ON s.StoreKey = d.StoreKey
WHERE DateKey = '2009-01-01' AND d.StoreName = 'Contoso Europe Online Store'
group by d.StoreName
SELECT s.DateKey, SUM(s.SalesAmount) as SalesAmount
FROM
[ContosoRetailDW].[dbo].[FactSales] s LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] st
ON s.StoreKey = st.StoreKey
WHERE StoreType = 'Online' AND s.DateKey = '2009-01-01' AND st.StoreName = 'Contoso Europe Online Store'
GROUP BY s.DateKey
The first query output is as follows:
StoreName SalesAmount
Contoso Europe Online Store 432284.288
The second query output is:
DateKey SalesAmount
2009-01-01 00:00:00.000 521675.102
So that’s a shame there is still a big difference between the 2 there. The next step seems to be go down to the product level, so we try the following two queries:
Both tables also include two fields of interest: ReturnAmount and Discount. It would be good to check they are similar across tables. We can run both queries together.
SELECT SUM(s.SalesAmount) as SalesAmount, SUM(s.[ReturnAmount]) as ReturnAmount, SUM(s.[DiscountAmount])
as DiscountAmount
FROM
[ContosoRetailDW].[dbo].[FactSales] s LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] st
ON s.StoreKey = st.StoreKey
WHERE StoreType = 'Online' AND s.DateKey = '2009-01-01' AND st.StoreName = 'Contoso Europe Online Store'
GROUP BY s.DateKey
SELECT SUM(s.SalesAmount) as SalesAmount, SUM(s.[ReturnAmount]) as ReturnAmount, SUM(s.[DiscountAmount])
as DiscountAmount
FROM
[ContosoRetailDW].[dbo].[FactOnlineSales] s LEFT OUTER JOIN [ContosoRetailDW].[dbo].[DimStore] st
ON s.StoreKey = st.StoreKey
WHERE StoreType = 'Online' AND s.DateKey = '2009-01-01' AND st.StoreName = 'Contoso Europe Online Store'
GROUP BY s.DateKey
What we see here is that the return amounts are similar but the discount amounts on the FactOnlineSales table are over 3x that of the FactSales table.
The first thing I do is add the ReturnAmount and the DiscountAmount to both queries.
The discount amount from the first two products ordered by sales amount is much higher in the FactOnlineSales table than in the FactSales table. If I add the discount amount and return amount to the sales amount in each table, we it gets much closer.
That’s it for now, I’m pretty convinced all the sales in FactOnlineSales are in the FactSales table.
As the FactOnlineSales key includes a CustomerKey, we can use this table for customer-level analysis, but bearing in mind it is Online sales data only.
Leave a Reply