Category: Power BI

  • Testing Power BI DAX Measures Speed with DAX Studio

    DAX Expression > DAX query > Vertipaq query.

    Test 1: Testing the Impact of USERELATIONSHIP() in a DAX Measure

    Does using USERELATIONSHIP() with inactive relationships slow down a DAX measure? I’m not sure so I thought I would test it out.

    To test this I will create a measure using the existing between my date table and my FactOnlineSales table and then I created a copy of the date table and joined it in the same way to the FactOnlineSales table, but making the relationship inactive.

    I use DAX studio to create 2 measures for calculating total sales and then I throw in some filters for year and continent as follows:

    DEFINE
        MEASURE '_Measures'[test1] =
            CALCULATE (
                SUMX (
                    'FactOnlineSales',[Sales Amount]
                ),
                
                'DimDate'[CalendarYear] = 2009,
                'DimGeography'[ContinentName] = "Europe"
                ) 
    
        MEASURE '_Measures'[test2] =
            CALCULATE (
                SUMX (
                    'FactOnlineSales',[Sales Amount]
                ),
                'DimDate (2)'[CalendarYear] = 2009,
              	'DimGeography'[ContinentName] = "Europe",
                USERELATIONSHIP('DimDate (2)'[Datekey], FactOnlineSales[DateKey])
            )
    EVALUATE
    ROW (
        "test", [test2] 
    )

    Below is a screenshot of my DAX studio settings. As you can see I’ve got the ‘Clear cache on Run’ button selected. This will help ensure each test is equal. I also have the use ‘Server Timings’ button selected. Note be careful using ‘Clear cache on run’ on a connection to a production model as it will clear the cache (hence slowing down reports).

    For the original sale measure that has an active relationship to the data table, I get a time of 2,197 which is broken down into FE (Formula engine) is 624 ms and 1,573 ms for the SE (Storage engine -that’s the Vertipaq engine). The FE turns the DAX expression into a DAX query and creates a query plan that it sends to the SE to retrieve the results of the query. The FE also caches the results of queries (this is why I have turned on the clear cache when the DAX query is run, otherwise the subsequent queries will return results very quickly).

    Comparing the 2 measures
    Here we run the query on each measure 5 times to get an average and as you can see there was very little difference in performance. Indeed the first 3 tests on the measure with the active relationship took longer (but this may have been because my laptop was busy doing something). My results are below, I think they are pretty inconclusive, but they are interesting none the less, and the method would be interesting to test with a larger model.


    In the server timings you can view the Vertipaq query that is executed by the Analysis services:

    Each step of the execution is below:
    The first step selects the data from FactOnlineSales with the 2 filters returning 1.4M rows of data of 133 KB.
    The second step does the sum of the sales amount and returns one row. The third step returns the start and end time as well as other information (if you mouse over the query).

    The Par. field stands for Parrallism, which represents the number of parallel threads used by the storage engine to retrieve the result at each step.

  • Analysing the Power BI Data Model to Improve Refresh Rates with DAX Studio


    I’ll start with some references:

    Definitive Guide to DAX Book by Marco Russo and Alberto Ferrari (the definitive experts).
    The Guy in the Cube video on model refresh slow:
    Chat GPT (I don’t know who that is!)
    My own testing and experience and demo using Power BI model using data from the Contoso Retail data warehouse

    First things: Connecting DAX Studio to your data model in the Power BI service
    1. Get the XMLA entry point URL from Power BI by selecting workspace settings > License info:
    You will need to log in with your Power BI login to connect.


    2. You can connect to your model in the service with DAX studio as follows (you may need to download and install it first). You can open it either independently or using the external tools within Power BI.



    File and Model Size
    Typical models use data imported into physical storage from a data warehouse and potentially other sources, such as Google Analytics. The Vertipaq engine stores the model in virtual memory. My Contoso Retail Power BI file is 193 MB (that’s not very big). It is compressed to reduce storage requirements.

    Model Size in DAX Studio (Advanced > View Metrics > Summary)
    When I look at the model summary from the advanced Metrics in DAX studio, the model loaded into virtual memory by the Vertipaq engine is 253 MB. Also of interest are the 37 tables and 366 columns. My model only has 8 tables and 1 calculated table, so it will be interesting later to understand what these other tables are.



    So why is this different?

    In-memory Compression: Power BI compresses data in the .pbix file to reduce storage, so the file size on disk is often smaller than its in-memory model size. However, once loaded, Power BI expands it for efficient query performance. There is still compression (next section), it’s just not as much as in the stored file.

    Metadata Overhead: The model size shown in DAX Studio includes metadata, indexes, dictionaries, and structures needed for DAX calculations and relationships that Power BI uses while executing queries, which aren’t directly reflected in the file’s storage size.

    Cache and Temp Data: DAX Studio may include caches or other temporary data generated during analysis, which increases the apparent size of the model in memory.

    Unused Columns or Tables: Any tables or columns loaded but not used may also contribute to the model’s size in DAX Studio, while Power BI might not fully load them in the saved file.

    Note: According to the model settings, the model is not using caching (which is off by default), so I can rule out one of the four possibilities.

    Power BI License requirements based on model size

    The Power BI Pro license allows for models up to 1GB in size, 10 GB of native storage, and 8 refreshes a day
    The Power BI Premium license allows for models up to 100GB in size and 100 TB of native storage and up to 48 refreshes a day.

    Note: Beyond just loading the raw model, Power BI needs additional memory overhead for processing, caching, and temporary storage during data refreshes and calculations. This means that even if your model fits within the 1 GB limit, you’re likely to experience slow performance or even errors during complex operations, especially if near that cap.

    Semantic Model Refresh Speed (Vertipaq engine: imported tables)

    We can easily access the model refresh times in the Power BI workspace (Select ‘Refresh history’ from the 3 dots menu next to the model entry. This gives us our total model refresh benchmarks (we can average them (or remove outliers as required at known busy times).

    So what are the stages of the model refresh and which ones can we troubleshoot and optimize?

    1. Data Import time: This one we can measure. Adam from Guy in the Cube video gives us a simple suggestion on how we can benchmark our table’s load times, by simply doing a SELECT * on each table in the data warehouse and making a benchmark of the time (repeat process at different times of the day (perhaps remove outliers at busy times) and average. I guess there is also a network latency speed to consider with large volumes of data, but let’s just do the: Total Refresh time – Table SELECT * time:

    Here is an example from my Contoso data warehouse tables:
    1. DimCurrency: Time 0: Rows: 28
    2. DimCustomer: Time 0: Rows: 18,869
    3. DimDate: Time 0: rows 648
    4. DimGeography: Time 0: Rows 274
    5. DimProduct: Time 0: Rows: 2467
    6. DimProductCAtegory: Time 0: Rows 8
    7. DimProductSubcategory: Time 0: Rows: 48
    8. DimStore: Time 0: Rows 306
    9. FactOnlineSales: 2 Mins 49 Secs, Rows: 12.6M

    As you can see only the FactOnline sales table took any real time to SELECT.
    But then you might say – hey, why are you using tables in your models and not views? If a new column appears the model refresh will fail.
    So you’re right in which case you can run the tests on your views.

    Now I don’t have a gateway setup to connect to my laptop today, so I need to set up something in the cloud to perform the test but let’s say the model refreshed in 4 minutes and your SELECT * benchmarks totaled 2 Minutes and 49 seconds, then you can say the for 1 Minute and 11 seconds the power bi service is doing something else.

    So what is happening was the data has been imported? Well, that’s when we need to get into the Vertipaq engine. Note, that the engine is proprietary and so has secrets that we will never learn, but what we do know about the steps it performs are as follows:

    Vertipaq engine steps
    1. Loads the tables into columns
    2. Check each column for the best sort order
    3. Applys compression to create the smallest fastest model
    4. Loads the model into Memory

    Note: the Vertipaq engine only deals with imported data. Direct Queries are processed by the Direct Query engine.

    DAX formula engine
    1. Creates calculated columns and tables (in memory) – not compressed as after Vertipaq
    2. Creates relationships between columns

    So we need to learn more about what these two engines do.

    The Vertipaq Engine
    1. Stores imported data into a columnar database (for faster data retrieval).

    2 . Sorts the columns by the best order for compression techniques (such as RLE)

    3. Compresses the columns using various (proprietary techniques), such as:

    a) Value encoding (e.g. deducting a constant value from each column entry to create a value with fewer bits and hence less space.

    b) Hash encoding (dictionary coding), creating a table similar to normalization technique where an ID and the value are stored in a table in memory (e.g. product sizes, 0: small 1: medium, and 2: large). The 0, 1, and 2 will replace the sizes small, medium, and large resulting in fewer bits being stored in the table.

    c) Run Length encoding (RLE): When values are repeated in a column entries 1-2000 are all Q1, then this can be put in a table of Q1: 2000 (first 50 rows are 50). Entries 2001 to 3500 are Q2. This can be put into the table as Q2: 1500 and so on.
    Using this technique the Vertipaq engine can compute the results very quickly. Sorting is therefore very important for RLE-type compression. The engine does do sorting, but you can potentially help it by sorting the data by columns with the lowest cardinality first.
    the
    Factors impacting compression are therefore as follows:
    1. Cardinality (the more unique values, the less compression can be done, hence increasing the size of the column in memory
    2. Number of repetitive values – The more, the better the RLE compression possible.
    3. The number of rows is important, but the cardinality and repetitive values are more important for compression
    4. The data type – e.g. automatic date/time should be turned off as it created a larger entry size than the date field alone. Firstly the size will be smaller and if removing time results in much lower cardinality the column can be compressed much further using either RLE or Hash encoding.

    Primary Keys such as user ID integer types will most likely use value encoding.
    Numbers such as sales amounts that don’t have massive outliers will probably also be value encoded otherwise dictionary encoding may be used.

    To determine the best sort order, a sample of the first x number of rows (I think its a million), is taken, so it’s important that the first x number of rows are of the best quality. You can help the Vertipaq engine therefore by ensuring your initial data is representative of the whole dataset or the engine will have to re-encode data e.g. if it finds outliers and finds its value compression doesn’t work (hence slowing down the process).

    We can look at the type of compression used, by using DAX Studio (Advanced > View Metrics)


    Firstly, you’ll see an overview of the tables. Each column in a table is encoded differently, so you’ll see the encoding type as ‘Many’.
    You need to drill down to get to the column level (or you could just select the column tab on the left-hand side).

    Now, let’s have a look at the FactOnlineSales table.

    The table is sorted by default by the column size. This is the size in Bytes that the column is consuming in memory.

    The first column: OnlineSalesKey has 12, 6127,608 rows and a cardinality of 12, 627,608 and as the name suggests is a unique key, hence it has the same cardinality (unique values) as rows.
    It also is the largest column consuming 84MB of memory and 31.77% of the database.
    It is encoding as VALUE type,

    Rows: 12627608
    Cardinarlity: 12627608
    Col Size: 84156128 (Data + Dictionary + HierSize)
    Data 33, 645560
    Dictionary: 128
    Hier Size: 50, 5101440
    % Table: 33.07%

    Unique keys are generally required fields in most tables, so it’s not something you can usually safe space by optimizing (unless you can reduce the size of the key somehow).

    By applying the same analysis you can determine which columns and tables are having the biggest impact on your model size and hence slowing your model refresh time.

  • How to Build a Survey Results Report with Power BI

    Power BI can be useful for displaying and storing survey results for your organization. In this example we build some data for the results of a fictitious Airline survey, using questions taken from a real online airport survey.


    The data was built in Excel with a sheet for questions and a sheet for answers. You can download the Excel file here.

    It’s easy to publish the report as a PDF, so you can share it with users who don’t have access to Power BI. Here is the airport survey report as PDF

    The questions are a simple list with a Question ID and section.


    The answers on the second sheet are formatted so that each row represents all the answers for each respondent.



    Steps for Building the Airline Survey Report.

    Step 1
    The Data is imported to our Power BI Desktop survey using Power Query.

    The only change to the question table is a simple column rename.


    The answer table is transformed so that each line of the table contains the respondent ID, the question, and the Value (the answer). This prepares the data so it can be joined to the question table and easily visualized.
    The applied steps are as follows:

    As you can see the first step after the import is simply to promote the headers, so the column names are the actual column names.
    The second step is just to change the column types to the correct types.
    The key third step is to unpivot the data, so the table is transformed as below a separate row for each respondent’s answer. This makes the report easy to build.


    The report only requires one measure, which is just a unique count of the respondent ID

    Answer Count = DISTINCTCOUNT(Answers[Respondent])


    Basic Graphs can be added.

    And Likert Scale style graphs using a 100% Stacked bar graph visual.

    Its a simple, but useful report.



  • How to Build a KPI Report with Power BI


    The Power BI KPI demo report is a demonstration of how a typical KPI report will look. The report uses data from the fictitious Microsoft Contoso Retail Datawarehouse which is stored in a SQL Server Database. I’ve simply renamed it to rn retail for demo purposes.

    View the Demo Report on Youtube:



    The report is currently divided into 5 pages.

    1. The first page is a typical style KPI report, leading with the financial KPI, then breaking down to the store type, and then some online KPI. As the database is quite old (going back to 2009), it doesn’t have a lot of person-level information, but does for online (hence online KPIs are included here). The KPIs are presented here in comparison to the previous year’s performance, but targets would also typically be included.
      The month selected allows the user to go back to any point in time in the year (or years if more data was included).



      2. The second page covers sales MTD, QTD, and YTD vs. the previous year there are graphs showing the YoY breakdown by store type and a table allowing more drill down into the individual store performance YoY. This interactive tool allows users to quickly pinpoint some of the reasons responsible for higher or lower YoY performance. The measure calculations can be found lower down this page.



      3. The third page of the report goes into more detail on stores broken down by country and city and includes the sales per square feet calculation. Again this helps the user explore the data to identify best and worst-performing stores in more detail.


    2. The fourth page, shows sales by product category and product, allowing you to drill down from the top category levels. There are then graphs for Monthly and yearly performance.



      5. The fifth page focuses on activities. This can be developed further to show actives by activity segment (e.g. new, active, reactive, lapsed, etc. once activity segmentation is added to the model)


      The Data Model:




      Time Intelligence DAX measures
      The DAX measures for MTD, QTD and YTD and comparisons to the previous year are below:
    //Month to Date Sales
    Sales MTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESMTD('Date'[Date]))

    //Quarter to Date Sales
    Sales QTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESQTD('Date'[Date]))

    //Year to Date Sales
    Sales YTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESYTD('Date'[Date]))

    //Last Year Month to Date Sales
    Sales LY MTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESMTD('Date'[Date])))

    //Last Year Quarter to Date Sales
    Sales LY QTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESQTD('Date'[Date])))

    //Last Year Year to Date Sales
    Sales LY YTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESYTD('Date'[Date])))

    //Variance Month to Date vs. Last Year Sales
    Sales VAR MTD vs. Last Year = [Sales MTD]-[Sales LY MTD]

    //Variance Quarter to Date vs. Last Year Sales
    Sales VAR QTD vs. Last Year = [Sales QTD]-[Sales LY QTD]

    //Variance Year to Date vs. Last Year Sales
    Sales VAR YTD vs. Last Year = [Sales YTD]-[Sales LY YTD]

    //Variance % Month to Date vs. Last Year Sales
    Sales VAR% MTD vs. Last Year = ([Sales MTD]-[Sales LY MTD])/[Sales LY MTD]

    //Variance % Quarter to Date vs. Last Year Sales
    Sales VAR% QTD vs. Last Year = ([Sales QTD]-[Sales LY QTD])/[Sales LY QTD]

    //Variance % Year to Date vs. Last Year Sales
    Sales VAR% YTD vs. Last Year = ([Sales YTD]-[Sales LY YTD])/[Sales LY YTD]



  • How to Create a Power BI Top Customers Report

    Most businesses will have sales and marketing departments as well as top-level C-Suite/director levels. They are all going to be interested in a bunch of similar reports from organization to organization.

    One of the most important reports, that sales and marketing teams are going to be interested in is:
    Who are our best customers?

    Top customers often contribute a significant amount of revenue to any organization. The 80/20 rule, that 80% of the revenue is made up of 20% of customers is quite common. Identifying top customers allows teams to segment their audience. Top customers can be given special treatment, such as dedicated account managers and special offers.

    In the model from the Contoso database we use here, the DimCustomer table is joined to the FactOnlineSales table, and the FactOnlineSales table is joined to the Date table. We use the FactOnlineSales table rather than the FactSales table as the FactOnlineSales table as it includes a CustomerKey we can join to the DimCustomer table and pull out customer data at the person level (the FactSales table appears to include store-level data only).

    To start creating the report, we can drag in the SalesAmount field from the FactOnlineSales table and the FirstName, LastName, and CustomerKey from the DimCustomer table as well as DateFirstPurchased. These are some of the fields a sales team is going to be interested in and they will likely ask for more to be added as time goes by.

    Adding the Year field as a filtered visual from the data table, which is already joined to the FactOnlineSales table by the unique date key, allows us to filter the sales amount by year (and we could add months for example, as required).
    We filter the report at the person level by adding the CustomerType field in the Filters panel:

    You will notice here I just dragged in the SalesAmount field from the FactOnlineSales table, but that’s a bit lazy and we should create a measure for the field, like this:

    OnlineSales = SUM(FactOnlineSales[SalesAmount])

    The sales amount filter is useful, but teams are likely to be interested in several different sales calculations, which we can calculate using additional measures. For example, we might want to see the current year-to-date sales and the lifetime sales of a customer in the same report as follows:

    Calculating Lifetime Sales with DAX

    OnlineSales Lifetime = CALCULATE(SUM(FactOnlineSales[SalesAmount]), REMOVEFILTERS('Date'))

    Using REMOVEFILTERS(‘Date’), removes all filters applied from the Date table, such as the year that is being used in the drop-down in this report, so the SalesAmount will be calculated for the customer’s lifetime. This now makes the report more interesting. The year filter still applies to the OnlineSales Measure, but does not apply to the OnlineSales Lifetime measure as below:

    Comparing year-to-date vs. previous year-to-date is another common request.
    To calculate year-to-date sales amount, we use the CALCULATE() function again, but this time combined with the DATESYTD() function:

    Calculating Year-to-Date and Previous Year-to-Date Sales with DAX

    To calculate Year to date sales we can use the DATESYTD function with CALCULATE:

    OnlineSales YTD = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESYTD('Date'[Date]))

    and to calculate the OnlineSales YTD Prev Year, we can wrap the DATESYTD() function with the SAMETIMELASTYEAR() function as follows:

    OnlineSales Prev YTD = CALCULATE(SUM(FactOnlineSales[SalesAmount]), SAMEPERIODLASTYEAR(DATESYTD('Date'[Date])))

    Adding these measures to our report table, we should now have something looking like this :

    Here we now have a top customer report for 2009, we can see which customers have brought in the highest revenue this year vs. last year to date, so we are comparing equal periods. We can also see customers’ lifetime value. In this example, Jon Yang is our best online customer, with $61K in online sales. This is equal to his lifetime sales. As you will notice, their first Date of Purchase is 22nd July 2001, which is a long time ago, but the database doesn’t have a record of this transaction as 2009 YTD sales equal lifetime sales. Presumably, this data was not recorded historically, which is often the case with older databases. The data you work with is often not perfect and we need to be aware of these specifics.

    Now we have this year’s YTD measure and the previous year’s YTD measure, it is easy for us to calculate the variance % (or difference), between the two years.

    Online Sales Var YTD% = ([OnlineSales YTD]-[OnlineSales Prev YTD])/[OnlineSales Prev YTD]

    We have an infinity number for the first customer as they had no sales in the previous year, we can tidy that up using the DIVIDE() function, which automatically handles errors and infinity to tidy up the code.

    Online Sales Var YTD% = DIVIDE(([OnlineSales YTD]-[OnlineSales Prev YTD]),[OnlineSales Prev YTD])


    Let’s now add the month as an additional filter. As you can see the YTD and Prev YTD, are recalculated to show the online sales amount up until the end of the month selected. That’s useful and pretty powerful.

    In a real-world scenario, sales teams are going to want the most up-to-date information, so we want to look to add previous days’ sales, week-to-date sales, and month-to-date sales. The issue here is that our data is old, so we need to do some juggling first to bring the Contoso sales data into the future to use for experimentation.
    The process of adding a future date is outlined here.

    DAX Previous Day Sales Calculation

    Once we have the future date in our FactOnlineSales table as in the link above and join it to the date table, our report is brought into the future and we can create a measure calculating yesterday’s sales amount:

    OnlineSales Yesterday = CALCULATE(SUM(FactOnlineSales[SalesAmount]), PREVIOUSDAY('Date'[Date]))

    As you can see I can now filter on the current year, which is 2023, and remove the month filter.
    I can now see the top customers for yesterday if I sort by the OnlineSalesYesterday field.
    This is now giving the sales teams up-to-date, daily reporting, so they can keep their finger on the pulse of the business:

    Now we’re using more ‘current’ data, we can now see we need Week-to-Date and Month-to-Date measures.

    Calculating Week-to-Date Sales can be found here

  • How to find Max Date with List.Max() in Power Query to Create a Future Date

    In the Contoso Retail database, we have old historical data which only goes up until the end of 2009. For our demo reports, we’d like to use more up-to-date data, so we want to add a date field to the FactOnlineSales (and FactSales) tables, so we can mimic more current reports.

    To add what we’ll call a ‘future date’ field, we want to find out the number of days from the present day to the last date in the FactOnlineSales table, then we can recalculate all the old dates bringing them forward by this difference.

    Getting the Max Date in Power Query

    There doesn’t appear to be a direct M code function for max dates, but there is a List.Max() function which works with lists, so we need to create a list from the date field in the FactOnlineSales table. The steps are as follows:

    1. Create a list from the date field (right-click on the DateKey field and select ‘Add as New Query’.

    2. This will create a list of dates from – here I renamed it to FactOnlineSalesDates

    3. We can now reference this list in the FactOnlineSales table using the List.Max() function.
    So we need to create a new custom column in the FactOnlineSales table as follows:

    We then just need to transform it into a Date type:

    That will give us a column containing the max date (the last date), in the FactOnlineSalesTable. The MaxDate is 31/12/2009. We can now use that data in our calculations. Sure we could have just looked for it, but if the data is dynamic, it’s good to be able to calculate it, so we are sure it is correct.

    4. Next we can add another custom column containing today’s date and transform it to a Date type as well. We use the Date.From() function, working on the DateTime.LocalNow() function:

    5. Next add another custom column to calculate the date difference, between today and the max date in the FactOnlineSales table. We can use the Duration.Days() function to calculate the difference in days between today and the last date in the FactOnlineSales table.

    This gives us the number we are looking for.

    Now all we need to do is add another custom column to transform the dates into the present day.
    We can call this ‘DateFuture’. And we can transform it to Date type.


    We’ve added 4 additional columns here to a big table. To limit the impact on performance, we should remove the 3 columns we don’t need, just leaving us with the DateFuture column.

    Now close & apply the data from Power Query to load it into the model.
    In the model view, we can now edit the relationship between the Date table and the FactOnlineSales table, so they are joined using the DateFuture key, which will bring all the data into the future, when we use filters from the Date table.

    If we now look at the FactOlineSales table, we can see the latest date is today (at the time of writing).


    In practice a data warehouse is updated overnight, so the latest complete days worth of data should be the day before, so we want to tweak our calculation.

    If we go back into Power Query query we can subtract a day from today’s date, using the Date.AddDays()function, so this will feed into the future date calculation.
    Date.AddDays(Date.From(DateTime.LocalNow()), -1)


    Now we see the data is up to today.

    So, this is useful if we want to mimic a real-life scenario, where data is up until yesterday relative to today, but it will be confusing for financial reports if our future date keeps changing, so it makes sense to have a fixed date as well.

    So to fix the today’s date field, we can do the following, using the #date function



  • How to Build a Financial KPI Report with Power BI

    Developing a financial KPI report is probably going to be the most important report you create for a business. It will be used by management and senior management to monitor and gain insight into the performance of a business and essentially track if they are on target or not on their KPI (which they will be potentially bonused or fired!).

    To demonstrate the development of a financial KPI report, we will use data from the Contoso Retail database, which is a fictitious database created by Microsoft.

    Instructions for installing the database on your local machine are here

    Once you have installed the database you can connect Power BI to it and start pulling in some tables.
    The first table we work with is the FactSales Table. This is the main table containing all the sales transactions for Contoso.

    You might wonder how to know which tables to pull in and this is all part of the developer’s job. With a bit of luck, you will have other developers who can guide you through how the database is organized, but sometimes you will be left to figure it all out for yourself and this can be a lengthy process in learning and understanding. Talking to subject matter experts on areas of the data will also be of great assistance to you as you develop your knowledge of how everything fits together.

    To import the FactSales table from the Contoso Retail database on your SQL server installation. Select Get Data > SQL Server.

    You will then need to enter the Server information. I’m using SQL Server developer edition installed on my local machine. For me, I just enter ‘localhost’ as the server name. If you have a server on the network, ask your IT guy for the server name. Also, you can enter the database name here ‘ContosoRetailDW’. Persevere, you are nearly there.

    Once you have connected, you will see a list of all the tables and views in the Contoso Database.
    Select the tables you want to import and click ‘Transform’ to load the table into Power Query or ‘Load’, to just load the table directly into the model without bringing up the Power Query window.

    It’s best practice to transform the data first before working on it, so open the data in Power Query. If you’re not there already, hit the Transform data button from the Home ribbon.

    The FactSales table is in pretty clean shape, but if you need to transform the dates into date form or add additional columns, pivot the table, and many other options you can do it here in Power Query.
    If the tables were loaded into dataflows on the Power BI service, you can work with Power Query in the service, so the table is transformed for anyone using the dataflow and not just in the report you have built.

    Once you’ve finished any transformation you wanted, you can close the Power Query window and return to the main view.
    In addition to the FactSales table, as we want to create measures using Time Intelligence in this demonstration, we also want to use a Date table, to join to our FactSales table.
    A date table is a key table in most Power BI reports and it’s good to include it as standard in your Power BI template.
    Instructions for creating a Date table with DAX can be found here.

    Here’s what a basic date table looks like. It can have a lot of columns, including a separate set of columns related to the financial year.

    Once you have your Date table available, we can join the Date table to the FactSales table using the date field as the Key in each. Open up the Modelling view.

    You can then join the tables by selecting the Date field on the date table and dragging it to the FactSales DateKey. Power BI will automatically assign the join type. In this case, One (on the Date table) to Many (on the FactSales table), because Date is a unique field on the Date table, but non-unique on the FactSales table.

    Now these basic elements of the model are set up, we can start creating the measures to use in the Financial Report, starting with the most important to any business, the sales or turnover numbers.

    Calculating the Sales Amount for YTD, QTD, MTD

    1. The first measure we create is Revenue Year to Date. We use the calculate function to filter the Sales Amount by the DatesYTD function. The DatesYTD Function takes the date from the date table

    Sales YTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESYTD(‘Date'[Date]))

    2. Next, there is the Quarter to Date measure we can create.
    Sales QTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESQTD(‘Date'[Date]))

    3 . Then, we create a revenue Month to Date measure, in a similar format, but with the DATESMTD Function

    Sales MTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESMTD(‘Date'[Date]))

    We can then use the measures in cards in the Power BI report.

    Calculating Sales Measures for the Same Period in the Previous Year (YTD, QTD, MTD)


    Once we have measures for the currently select year, we typically will want to display performance against the previous year, to see how the business is growing Year on Year (YoY)
    Here we can use the following DAX Formula:

    4. For the same period last year – year-to-date, we wrap the date filter DATESYTD, with a second filter SAMEPERIODLASTYEAR(), to get the YTD revenue for the same period of the previous year.

    Revenue LY YTD =
    CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESYTD(‘Date'[Date])))

    5. For the same quarter to date last year, we can use the following:

    Revenue LY QTD =
    CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESQTD(‘Date'[Date])))

    6. And lastly, for the same period last year month to date, we can do this:
    Revenue LY MTD =
    CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESMTD(‘Date'[Date])))

    We can test the working of the calculations on Power BI, using filters on the Date table and the 6 measures we created on the cards below.

    Calculating Sales Variance Year on Year YTD, QTD, MTD

    We can now create some simple variance measures, for calculating the differences from the period selected versus the previous year.

    The following 3 measures just take the difference between the measures taken above for the current year vs. the previous year.

    Revenue VAR YTD vs. Last Year = [Revenue YTD]-[Revenue LY YTD]
    Revenue VAR QTD vs. Last Year = [Revenue QTD]-[Revenue LY QTD]
    Revenue VAR MTD vs. Last Year = [Revenue MTD]-[Revenue LY MTD]

    We can now display the variances below the current and previous years’ calculations.

    Now we have these initial measures we can develop additional measures for other KPI, such as Orders, Quantity, Average Order Value (AOV).

    The first measure created was for orders YTD. Here we use a similar time intelligence formula as for sales, but in the format:
    Orders YTD = CALCULATE(DISTINCTCOUNT(FactSales[SalesKey]), DATESYTD(‘Date'[Date]))

    And from there we can create the Average Order Value (AOV) Year to Date (YTD) measure of:
    AOV YTD = [Sales YTD]/[Orders YTD]

    The next measure to add is quantity, we can create the actual quantity:
    Quantity = SUM(FactSales[SalesQuantity])
    and the YTD quantity of:
    Quantity YTD = CALCULATE(SUM(FactSales[SalesQuantity]), DATESYTD(‘Date'[Date]))
    As we can see from the report, the quantity YTD for 2008 is 16.5M vs. Orders of 1.1M.

    Once we have the measures we can start building the final KPI report, using only the most relevant measures in a summary report. It often helps to have a more basic summary report in addition to a more detailed report for reference, as different report audiences will require different levels of detail. In a meeting of twenty people, delivering the top-level figures will often be sufficient, to get the key message across. The report below shows just the top-level KPI at the top of each measure set into a separate card.
    Below the measures are displayed as trends over time, using bar charts and on the first graph we compare sales against the previous years.

    From our exploratory analysis of the Contoso database, we learned that the StoreType in the DimStore table, could be used to break the data down into four main sales channels. We can import the store table from Power Query, just as we did for the FactSales table, and join it to the FactSales table
    In the Model view, we can then join the DimStore table to the FactSales table and join them using the StoreKey.

    Once we have the model set up in this way we can start to drill down into the top-line KPI and break them down by Catalog, Online, Reseller, and Store channels. Here we simply add a slicer for Store Type to our report page, but of course, many options are available. It would be good to view sales by StoreType side by side for example.

    That’s it for now. These are the structural elements of getting started building a Financial KPI report. The report can branch out into many different areas. It’s easy to get lost, the best practice is to listen to the person that requested the report and address all there requirements as best as possible. When the report starts being used, it will inevitably require further development as more questions are raised, that people want answers to.

  • How to Build a Date Table in Power BI

    Probably the most essential table you need in your Power BI model is a date or calendar table.
    You join your date table from any other table containing dates, such as your main Fact Sales table.

    The advantage of using a calendar table is that it contains associated fields such as Year, Month, Start of Month, Start of Quarter, and relevant Financial Periods that will be bespoke to the business.

    The date table should be stored in a data flow so everyone can use it. Here we create one using DAX, which could be exported and imported as a table in the Power Query service. You could also create a date table with Power Query directly. Creating one in DAX gives us practice with Time Intelligence functions.


    The steps to create a date table are as follows:
    1. Open Table View and select ‘New Table’
    2. Start by creating the first column which will be our date key. We can use the CALENDAR() function to auto-create a list of dates in the table. The date range should include the minimum date relevant to your date up to any dates in the future. You could use TODAY() as dates up to today, for the end date, but sometimes for example with forecasting, it can be useful to have future data.
    Of course, a big table may slow performance, so just bear that in mind.

    Date = 
    VAR DateStart = DATE(2007, 01, 01)
    VAR DateEnd = DATE(2030, 12, 31)
    RETURN
    CALENDAR(DateStart, DateEnd)

    3. Once you have your first date column setup which will be the key to joining your other tables, you can then start any other useful columns.

    Here are some for starters:

    Year = YEAR('Date'[Date])
    Month = MONTH('Date'[Date])
    Day = DAY('Date'[Date])
    Quarter = QUARTER('Date'[Date])
    StartOfYear = STARTOFYEAR('Date'[Date])
    StartOfQuarter = STARTOFQUARTER('Date'[Date])
    StartOfMonth = STARTOFMONTH('Date'[Date])
    DayOfWeek = WEEKDAY('Date'[Date], 2)


    Be careful with the day of the week number. It’s always best to double-check against another calendar.
    The default is 1 for Sunday, to set the first day of the week, use WEEKDAY(‘Date'[Date], 2).

    Once you have the day number setup, which can be useful in itself, you can then create a column for the day name:

    DayName = 
    
    SWITCH('Date'[DayOfWeek],
    1, "Mon",
    2, "Tue",
    3, "Wed",
    4, "Thu",
    5, "Fri",
    6, "Sat",
    7, "Sun")

    Once we’ve finished creating our data table, we can then join it within the model to the relevant tables, such as the Contoso FactSales table which has a date key. Note, that the Date field in the Date table is unique.


  • How to create Customer Age Bands in Power BI using DAX

    An interesting dimension to report on for demographics is the age of customers. Understanding what type of customers buy what types of products can help inform our customer insight that can be used for targeting marketing offers more effectively.

    In the DimCustomer table of the Contoso Retail Sales DW, we have a field called: Birthdate, but of course, this is unique for all the customers. What we want to do is group these ages into buckets or age bands, so we can create a more informative report. We can then use the grouping in graphs or as filters as in the graph below:


    To create the categorization, I created a new calculated column in the DimCustomer table with the DAX code following the steps outlined below. Note the exact categories you use are up to you or the business. There are no definitive rules on what age bands you should follow, but by studying other reports, you will get the gist of the most common methods.

    The first best practice step is to start creating variables.
    The _dob variable is assigned the [BirthDate] field.
    The _now variable is assigned the date of today, using the TODAY() function.
    The _ageadys variable is then assigned the difference in the number of days between the date of birth field and today’s date, which is essential for the persons’ age in days.
    The _ageyears variable divides the age in days by 365, to get the age in years.
    The section after the RETURN statement then returns the age band category, based on the IF ELSE logic.
    It’s important to put the statements in the correct order, starting from the least upwards, and note that there are some blank Birthdates if have used IF(_ageyears = BLANK(), “None”.
    I only knew there were some blank dates by visually inspecting the BirthDate columns. If I hadn’t used this the blanks would have been picked up as “<18”), so it’s important to sense check as much as possible.

    DAX for Calculated Column (note Age Range is the column name):
    Age Range =
    VAR _dob = DimCustomer[BirthDate]
    VAR _now = TODAY()
    VAR _agedays = DATEDIFF(_dob, _now, DAY)
    VAR _ageyears = _agedays/365

    RETURN
    IF(_ageyears = BLANK(), “None”,
    IF(_ageyears<18, “<18”,
    IF(_ageyears<30, “18 to 29”,
    IF(_ageyears<40, “30 to 39”,
    IF(_ageyears<50, “40 to 49”,
    IF(_ageyears<60, “50 to 59”,
    IF(_ageyears<70, “60 to 69”,
    IF(_ageyears<80, “70 to 79”,
    IF(_ageyears>=80, “80 plus”)))))))))


  • How to Add a last refreshed date in Power BI

    It’s very useful for users, administrators, and developers to have the data last refresh data added to Power BI reports as this helps confirm how up-to-date, the data is. Sometimes, when a Power BI file is refreshed, you will notice the data doesn’t actually refresh and there is no error, which can be confusing.

    This is how to add the refresh date:

    1. Open Power Query and add a blank query

    2. In the formula bar of the blank query enter = DateTime.LocalNow(). This function gets the current time. Click on the source on the right-hand side to activate the function.

    5. You should now have a second Transform tab, to transform the blank query to a table. Select To Table > Table


    6. You can then change the data type to Date/Time


    7. Finally just rename the column of the table and the column name to something meaningful, like Last Refresh Date.

    8. Then Close and Apply to return to the report view.

    9. Add a card to your page and add the field from the Last Refresh date table you just created. Adjust the size and you should get something like this.

    11. The Date and time of the last refresh date should now refresh each time you refresh the report (as the last refresh table is refreshed).