Category: Power BI

  • Testing Dual mode in Power BI and BigQuery



    In this test, we will try to see the performance improvement of using dual mode in a Power BI dimension table using a cloud database and a shared dataset, so we don’t get the performance issues experienced in the first dual mode test on a laptop.

    Test 1
    In the first test, we will use a single dimension table to test and a single fact table set to direct query mode.

    I will use a public dataset from Google BigQuery.
    Get Data > Google Big Query. I enter the project ID and am prompted for my Google login.

    BigQuery connection from Power BI

    I can then navigate to the dataset I want to work with.

    Navigate to BiQuery dataset
    Table Selction

    We choose a table from the New York taxi trips dataset, which has over 16 million rows.

    SQL row count.

    This is configured with a Direct Query connection.

    Selecting table import mode


    I will then add some simple measures for testing as follows:

    Measures

    Next, we’ll add the tax-zone_geom as a dimension table and add it with Direct Query mode initially. It has 263 rows.

    I find there are duplicates in my dimension, which is a pain, so I need a new view in BigQuery to fix this.
    I will create a simple view as follows (replace myproject_id with your own):

    CREATE OR REPLACE VIEW `customer-myprojectid.pbitesting.DimZone` AS

    SELECT
    DISTINCT Zone_id, Zone_name
    FROM
    bigquery-public-data.new_york_taxi_trips.taxi_zone_geom

    I can then connect Power BI to BigQuery using the different ProjectID and select my de-duped dimension table.

    Talbe Selection DimZone


    DimZone Distinct Values


    Next, I change the zone_id and pickup_location_id fields to whole numbers to save space.

    I will then connect my DimZone table with my tlc_green_trips_2016 table using zone_id and pick_up_location_id.

    Model relationships

    I then changed my tax_zone_geom table to dual mode.

    Selecting Dual Mode



    Then I publish my dataset to the Power BI service and we can connect to it with a new report.

    Get Data > Power BI semantic models >

    Selecting the semantic dataset

    When I try to connect, I get the following BigQuery authentication error:

    DataSource.Error: ADBC: The service bigquery has thrown an exception. HttpStatusCode is Unauthorized. Request is missing the required authentication credentials. Expected OAuth 2 access token, login cookie, or other valid authentication credential

    So I need to add the credentials for accessing BigQuery to my dataset as follows in the dataset settings, and then I will need to sign in with my Google Account.

    Configuring dataset credentials

    My settings then look like this:

    dataset credentials for BigQuery

    I can then return to my Power BI Desktop report and connect to my semantic model.
    I can now view the tables in my shared dataset.

    Dataset tables



    I can then build a simple matrix with my measures from my tlc_green_tips_2016 fact table and the Zone_name from my dimension table, as well as adding a Zone_name slicer to my model.
    It takes about 40 seconds to load.

    Matrix for taxi data

    Initial testing results are interesting. Even though caching is switched off in the dataset, we still see caching on testing with performance analyzer. Apparently this is due to session level caching, which can’t be switched off.

    Checking report query setting in power bi

    We didn’t see the session-level caching in the previous test, so we will have to select different selections each time to do the dual mode via import mode test. Also, we can’t do the full table test as this is also impacted by session-level caching.

    In the first test, we will use a one-dimensional table and a fact table in direct query mode. We will test the dimension table in dual mode and then in import mode, and see the impact on performance.

    Dual Mode Test 1 Results: 1 Dimension Table and 1 Fact Table in Direct Query Mode

    Dual mode results test 1

    After excluding outliers, we can see the import mode is trending faster than the dual mode.
    After outliers are removed, which look more like anomalies due to resource limitations, rather than as part of the norm,
    the T-test shows a significant difference between the 2 sets of observations.



    Test 2

    In the next test, we will test dual mode and 2 different fact tables. One is in import mode and the other is in direct query mode. This is where we should see the performance improvement of using dual mode.
    We start by adding an additional fact table to the model, but this time in import mode.

    Adding another table

    The tlc_green_trips_2017 table. This table is 12.7M rows.
    Then I rebuild the model. The 2016 is in direct query mode, the DimZone table is in dual mode, and the 2017 table. I then add some additional measures for each table to use in the Matrix test.

    Model 2 with dual mode


    My new test matrix now has some measures from both my fact table 2016 (import), 2017 (direct query)

    New Matrix design

    Test Results:
    The trend appears to favor the import mode again, which was kind of disappointing as i wanted to see a similar improvement as i saw using SQL server locally.

    Running a Test using Python in Power BI
    We can use the Python visual to create some Python graphs and a T-Test.
    I use AI to generate the code for me, and i add it into the Python script editor.
    And i add the Zone, mode, and load time to the value wells of the visualisations pane.

    The results are shown below, above the box plot, with the p-value = 0.0001 or a 1 in a thousand chance of this happening by chance. It seems that import mode is faster again. Perhaps BigQuery doesn’t work as well. It seems testing is the only way of finding out. Models may be faster in some circumstances than in others.


  • Testing Dual Mode Impact using the performance analyzer

    Testing Dual Mode Impact using the performance analyzer

    Dual mode can potentially speed up queries by enabling the cache in certain situations. In theory, it can speed up queries in certain situations by utilizing the cache.
    Microsoft’s description is here


    In the following example, we start with the V_FactOnline sales view. It is a large table with over 12M rows and 19 columns. We could import it into our dataset, but it will increase the refresh time of our dataset and use valuable memory in our model that we may want to conserve.


    To get an idea of the size of the table in our model, we can use the Vertipaq analyzer in DAX Studio.
    I have imported 3 tables into our data model so far.


    1. V_FactOnlineSales in import mode
    2. V_DimProduct in import mode
    3. FactOnlineSales table in direct query mode (which is the same size as V_FactOnlineSales

    The total size of our dataset now is 233MB, and we can see that the imported V_FactOnlineSales table is taking up nearly all that space. When we have added another 50 tables to our model, we might decide to switch the v_fact_online sales table to a direct query model to reduce refresh time and free up memory. You will also notice that the FactSalesOnline table in direct query mode uses no memory.

    Vertipaq analyzer dataset summary
    Vertipaq analzer table sizes


    After deleting the imported V_FactOnlineSales table, we can import it again, but in import mode, so it does not take up any space in our dataset. We have a one-to-many relationship between V_dimProduct and V_FactOnline Sales, with a single cross-filter direction, so only v_DimProduct and filter V_FactOnlineSales and not the other way around. They are joined by the ProductKey.

    one to many relationship between dimension and fact table.

    We’ll start by creating some simple measures from the FactOnlineSales table as follows:

    Sales = SUM(V_FactOnlineSales[SalesAmount])
    Cost = SUM(V_FactOnlineSales[TotalCost])
    Profit = [Sales]-[Cost]
    Profit Margin = DIVIDE([Profit], [Sales], 0)

    We can organise them in the home table of V_FactOnline Sales and give them a folder to keep them together.

    We can then add our measures to a matrix visual and a slicer on product name using the measures and the product name field from the dimproduct table. We can then use the performance analyzer and refresh the visuals. It takes about 5 seconds for the matrix to load. If I clear the query results and refresh the performance analyzer. I have a nice benchmark of about 5 seconds to work with. The performance analyzer is designed not to cache the queries, so this seems reliable to work with.

    Power BI matrix showing product sales and profit.
    Performance analyzer results.

    If I select a single product from the slicer, it takes about 1 second to load the matrix.

    Performance analyer results sliced

    Now, if I remove a product from my slicer, it takes longer (about 6 seconds, consistently).

    Performance analyzer results one prouct removed

    Next, I could move on to testing the DimProduct table in dual mode. Note I had to recreate the table in Direct Query mode and switch it to dual mode, as I wasn’t allowed to switch from import mode to dual mode.

    A comparison of the tests between the two modes is in the table below.
    I found I had quite consistent results, although when the full table was selected, I did run into performance problems on my laptop, and it was more of an issue in Dual mode, which does use more memory and some additional CPU (it creates a copy of the table import mode into memory to execute the query). Therefore, I will need to do a cloud-based test to get a more reliable picture. In this test, I’m using Power BI locally with SQL Server on an aging laptop.

    The most obvious difference I found between the 2 modes was in individual selections in the slicer, with the Dual mode performing over twice the speed vs. import mode, which is consistent with what should be expected by using dual mode.

    Matrix loading speed using the performance analyzer

    Product Slicer SelectionDimProduct: Import
    FactOnelineSales: Direct Query

    DimProduct: Dual Mode
    FactOnlineSales: Direct Query
    Select All:
    5-6 seconds
    6 -8 seconds (CPU maxing out)
    One Product selected1.5 seconds0.5 seconds
    Exclude one product 6 seconds1.5 seconds

    To summarize, performance improvements were apparent for slicing with a dimension using dual mode, but a cloud-based test is required, in particular, to see the speed of the full unfiltered matrix refresh.

  • Power BI Report Builder Report Page formatting basics

    Power BI Report Builder Report Page formatting basics

    While many people use Power BI, but never try report builder, but report builder is a handy tool for certain projects. If you have large volumes of detailed data that would take up too much space in your data model, or you find that direct queries are too slow, paginated reports can provide an alternative.

    In this example, we’ll grab some data from Kaggle. I want a big customer file that I can work with. In real life, you’re most likely to be importing this from a Data Warehouse, but the only difference here is the connection and import; all the methods of work will be the same once you’ve imported the data into your dataset.
    Here is the file I will use, entitled Bank Customer Segmentation 1M+ rows.
    https://www.kaggle.com/datasets/shivamb/bank-customer-segmentation

    The file we have is from a fictitious bank; someone has randomly generated a date for.
    On inspection in Excel, the file is 1,048,569 rows

    report builder Transactions example


    You can download the report builder for free from here:

    https://www.microsoft.com/en-us/download/details.aspx?id=10594

    Open the report builder after you’ve installed it and select a blank report to get started:

    report builder creation

    No the first thing we want to do is get our data, so right click on the Datasets folder and select ‘Get Data’, and you can log in your Power BI account here. If you don’t have one, there is a free trial. I’ll let you figure that bit out.

    Once you’ve logged in, you’ll get the data source screen where you can select your data source. There are a whole host of data connector files here to choose from, including data warehouses such as Google BigQuery, SQL Server, Snowflake, as well as other tools such as Google Analytics, and in this case, because I’m lazy, I will just use the CSV file that I downloaded, but it doesn’t matter for learning report builder.

    report builder data source

    On the next screen, you need to enter your data connection information, but in this example, there is the option of uploading a file at the top below connection settings.

    The main difference here is that with a connection to a data warehouse or whatever, once you publish your report to the Power BI service, the report will refresh with the latest data when the report is reloaded, whereas I will need to upload my file again.

    report builder data source connection

    Once my file is uploaded, I get a preview of the data, and I can change a few things, like the delimiter, which is set to comma-delimited by default.

    importing data


    Next, I click on Transform Data, which takes me down to a scaled-down version of Power Query compared to the version included with Power BI Desktop.

    report builder power query



    Here you can do things like add custom columns and format your data.
    Click Create to import your dataset from Power Query into the Report Builder design window.

    Your first dataset


    So let’s start by building a simple paginated report. I will use a simple table: Insert a table, and then I will drag my fields into the table.

    report builder adding a table

    Design and formatting the paginated report

    First, we need to turn on the properties window (View > Properties), where we can easily edit things like fonts and background colours.

    I got our mutual friend to create a logo for me and supplied the background colours in RGB format,

    For the logo, I will add it to the head of the report (Insert Header). I add a rectangle and move the image into the rectangle, so I can control its properties. The design is now coming to life (Roar).

    Report builder adding a header

    I can preview the report to test how it works by selecting Home > Run:
    It takes a while to load as it loads the whole dataset, not page by page.

    Below is the screenshot of the report in preview mode. There are a million rows, so it breaks the page across pages by default: It paginates the report. The logo in the header displays across pages, but the headers do not, so I need to fix that.

    Report builder preview report

    Fixing Row Headers in place with Power BI Report Builder

    To fix the row headers in place, we need to do the following:
    Select the table (Tablix) and select advanced mode in the bottom right corner. The row and column groups will display at the bottom of the screen.

    manage rows in report builder


    In the row groups box (bottom left), I can control the positioning of the rows during pagination and scrolling.
    I set the following for the Title row and the headers row:

    FixedData = True. This will fix the row in position when I scroll.
    KeepWithGroup = After. This is to stop the row from breaking from the row below.
    RepeatOnNewPage = True. This I to repeat the row on each page.

    I set these on both the title row and the headers row (click on static for each, and the properties will show on the right). To turn on properties, click View> Properties from the main ribbon.

    Now, when I scroll, the header and title are fixed at the top of the page and also stay in place when I navigate pages.

    fixing the top row

    Now we have a basic report working, let’s publish it to the service. Remember, if we had a connection to a data warehouse table, the report would grab the latest data for us, but for now, we’re lazy and just using a static file that won’t change.

    To do so, Select File Publish and select the workspace you want to publish to:

    Once I’ve published my report, I can view it on the service and give others access:
    The report takes a while to load the 1 M-plus rows. Once it’s loaded, you can see some of the features available, such as export to Excel, which is a popular option. So can also set up a subscription to the report so the report will run and email it to you. This can be a useful option for some reports.

    exporting to excel
    subscribing to a report
  • Testing Power BI DAX Measures Speed with DAX Studio

    DAX Expression > DAX query > Vertipaq query.

    Test 1: Testing the Impact of USERELATIONSHIP() in a DAX Measure

    Does using USERELATIONSHIP() with inactive relationships slow down a DAX measure? I’m not sure so I thought I would test it out.

    To test this I will create a measure using the existing between my date table and my FactOnlineSales table and then I created a copy of the date table and joined it in the same way to the FactOnlineSales table, but making the relationship inactive.

    I use DAX studio to create 2 measures for calculating total sales and then I throw in some filters for year and continent as follows:

    DEFINE
        MEASURE '_Measures'[test1] =
            CALCULATE (
                SUMX (
                    'FactOnlineSales',[Sales Amount]
                ),
                
                'DimDate'[CalendarYear] = 2009,
                'DimGeography'[ContinentName] = "Europe"
                ) 
    
        MEASURE '_Measures'[test2] =
            CALCULATE (
                SUMX (
                    'FactOnlineSales',[Sales Amount]
                ),
                'DimDate (2)'[CalendarYear] = 2009,
              	'DimGeography'[ContinentName] = "Europe",
                USERELATIONSHIP('DimDate (2)'[Datekey], FactOnlineSales[DateKey])
            )
    EVALUATE
    ROW (
        "test", [test2] 
    )

    Below is a screenshot of my DAX studio settings. As you can see I’ve got the ‘Clear cache on Run’ button selected. This will help ensure each test is equal. I also have the use ‘Server Timings’ button selected. Note be careful using ‘Clear cache on run’ on a connection to a production model as it will clear the cache (hence slowing down reports).

    For the original sale measure that has an active relationship to the data table, I get a time of 2,197 which is broken down into FE (Formula engine) is 624 ms and 1,573 ms for the SE (Storage engine -that’s the Vertipaq engine). The FE turns the DAX expression into a DAX query and creates a query plan that it sends to the SE to retrieve the results of the query. The FE also caches the results of queries (this is why I have turned on the clear cache when the DAX query is run, otherwise the subsequent queries will return results very quickly).

    Comparing the 2 measures
    Here we run the query on each measure 5 times to get an average and as you can see there was very little difference in performance. Indeed the first 3 tests on the measure with the active relationship took longer (but this may have been because my laptop was busy doing something). My results are below, I think they are pretty inconclusive, but they are interesting none the less, and the method would be interesting to test with a larger model.


    In the server timings you can view the Vertipaq query that is executed by the Analysis services:

    Each step of the execution is below:
    The first step selects the data from FactOnlineSales with the 2 filters returning 1.4M rows of data of 133 KB.
    The second step does the sum of the sales amount and returns one row. The third step returns the start and end time as well as other information (if you mouse over the query).

    The Par. field stands for Parrallism, which represents the number of parallel threads used by the storage engine to retrieve the result at each step.

  • Analysing the Power BI Data Model to Improve Refresh Rates with DAX Studio


    I’ll start with some references:

    Definitive Guide to DAX Book by Marco Russo and Alberto Ferrari (the definitive experts).
    The Guy in the Cube video on model refresh slow:
    Chat GPT (I don’t know who that is!)
    My own testing and experience and demo using Power BI model using data from the Contoso Retail data warehouse

    First things: Connecting DAX Studio to your data model in the Power BI service
    1. Get the XMLA entry point URL from Power BI by selecting workspace settings > License info:
    You will need to log in with your Power BI login to connect.


    2. You can connect to your model in the service with DAX studio as follows (you may need to download and install it first). You can open it either independently or using the external tools within Power BI.



    File and Model Size
    Typical models use data imported into physical storage from a data warehouse and potentially other sources, such as Google Analytics. The Vertipaq engine stores the model in virtual memory. My Contoso Retail Power BI file is 193 MB (that’s not very big). It is compressed to reduce storage requirements.

    Model Size in DAX Studio (Advanced > View Metrics > Summary)
    When I look at the model summary from the advanced Metrics in DAX studio, the model loaded into virtual memory by the Vertipaq engine is 253 MB. Also of interest are the 37 tables and 366 columns. My model only has 8 tables and 1 calculated table, so it will be interesting later to understand what these other tables are.



    So why is this different?

    In-memory Compression: Power BI compresses data in the .pbix file to reduce storage, so the file size on disk is often smaller than its in-memory model size. However, once loaded, Power BI expands it for efficient query performance. There is still compression (next section), it’s just not as much as in the stored file.

    Metadata Overhead: The model size shown in DAX Studio includes metadata, indexes, dictionaries, and structures needed for DAX calculations and relationships that Power BI uses while executing queries, which aren’t directly reflected in the file’s storage size.

    Cache and Temp Data: DAX Studio may include caches or other temporary data generated during analysis, which increases the apparent size of the model in memory.

    Unused Columns or Tables: Any tables or columns loaded but not used may also contribute to the model’s size in DAX Studio, while Power BI might not fully load them in the saved file.

    Note: According to the model settings, the model is not using caching (which is off by default), so I can rule out one of the four possibilities.

    Power BI License requirements based on model size

    The Power BI Pro license allows for models up to 1GB in size, 10 GB of native storage, and 8 refreshes a day
    The Power BI Premium license allows for models up to 100GB in size and 100 TB of native storage and up to 48 refreshes a day.

    Note: Beyond just loading the raw model, Power BI needs additional memory overhead for processing, caching, and temporary storage during data refreshes and calculations. This means that even if your model fits within the 1 GB limit, you’re likely to experience slow performance or even errors during complex operations, especially if near that cap.

    Semantic Model Refresh Speed (Vertipaq engine: imported tables)

    We can easily access the model refresh times in the Power BI workspace (Select ‘Refresh history’ from the 3 dots menu next to the model entry. This gives us our total model refresh benchmarks (we can average them (or remove outliers as required at known busy times).

    So what are the stages of the model refresh and which ones can we troubleshoot and optimize?

    1. Data Import time: This one we can measure. Adam from Guy in the Cube video gives us a simple suggestion on how we can benchmark our table’s load times, by simply doing a SELECT * on each table in the data warehouse and making a benchmark of the time (repeat process at different times of the day (perhaps remove outliers at busy times) and average. I guess there is also a network latency speed to consider with large volumes of data, but let’s just do the: Total Refresh time – Table SELECT * time:

    Here is an example from my Contoso data warehouse tables:
    1. DimCurrency: Time 0: Rows: 28
    2. DimCustomer: Time 0: Rows: 18,869
    3. DimDate: Time 0: rows 648
    4. DimGeography: Time 0: Rows 274
    5. DimProduct: Time 0: Rows: 2467
    6. DimProductCAtegory: Time 0: Rows 8
    7. DimProductSubcategory: Time 0: Rows: 48
    8. DimStore: Time 0: Rows 306
    9. FactOnlineSales: 2 Mins 49 Secs, Rows: 12.6M

    As you can see only the FactOnline sales table took any real time to SELECT.
    But then you might say – hey, why are you using tables in your models and not views? If a new column appears the model refresh will fail.
    So you’re right in which case you can run the tests on your views.

    Now I don’t have a gateway setup to connect to my laptop today, so I need to set up something in the cloud to perform the test but let’s say the model refreshed in 4 minutes and your SELECT * benchmarks totaled 2 Minutes and 49 seconds, then you can say the for 1 Minute and 11 seconds the power bi service is doing something else.

    So what is happening was the data has been imported? Well, that’s when we need to get into the Vertipaq engine. Note, that the engine is proprietary and so has secrets that we will never learn, but what we do know about the steps it performs are as follows:

    Vertipaq engine steps
    1. Loads the tables into columns
    2. Check each column for the best sort order
    3. Applys compression to create the smallest fastest model
    4. Loads the model into Memory

    Note: the Vertipaq engine only deals with imported data. Direct Queries are processed by the Direct Query engine.

    DAX formula engine
    1. Creates calculated columns and tables (in memory) – not compressed as after Vertipaq
    2. Creates relationships between columns

    So we need to learn more about what these two engines do.

    The Vertipaq Engine
    1. Stores imported data into a columnar database (for faster data retrieval).

    2 . Sorts the columns by the best order for compression techniques (such as RLE)

    3. Compresses the columns using various (proprietary techniques), such as:

    a) Value encoding (e.g. deducting a constant value from each column entry to create a value with fewer bits and hence less space.

    b) Hash encoding (dictionary coding), creating a table similar to normalization technique where an ID and the value are stored in a table in memory (e.g. product sizes, 0: small 1: medium, and 2: large). The 0, 1, and 2 will replace the sizes small, medium, and large resulting in fewer bits being stored in the table.

    c) Run Length encoding (RLE): When values are repeated in a column entries 1-2000 are all Q1, then this can be put in a table of Q1: 2000 (first 50 rows are 50). Entries 2001 to 3500 are Q2. This can be put into the table as Q2: 1500 and so on.
    Using this technique the Vertipaq engine can compute the results very quickly. Sorting is therefore very important for RLE-type compression. The engine does do sorting, but you can potentially help it by sorting the data by columns with the lowest cardinality first.
    the
    Factors impacting compression are therefore as follows:
    1. Cardinality (the more unique values, the less compression can be done, hence increasing the size of the column in memory
    2. Number of repetitive values – The more, the better the RLE compression possible.
    3. The number of rows is important, but the cardinality and repetitive values are more important for compression
    4. The data type – e.g. automatic date/time should be turned off as it created a larger entry size than the date field alone. Firstly the size will be smaller and if removing time results in much lower cardinality the column can be compressed much further using either RLE or Hash encoding.

    Primary Keys such as user ID integer types will most likely use value encoding.
    Numbers such as sales amounts that don’t have massive outliers will probably also be value encoded otherwise dictionary encoding may be used.

    To determine the best sort order, a sample of the first x number of rows (I think its a million), is taken, so it’s important that the first x number of rows are of the best quality. You can help the Vertipaq engine therefore by ensuring your initial data is representative of the whole dataset or the engine will have to re-encode data e.g. if it finds outliers and finds its value compression doesn’t work (hence slowing down the process).

    We can look at the type of compression used, by using DAX Studio (Advanced > View Metrics)


    Firstly, you’ll see an overview of the tables. Each column in a table is encoded differently, so you’ll see the encoding type as ‘Many’.
    You need to drill down to get to the column level (or you could just select the column tab on the left-hand side).

    Now, let’s have a look at the FactOnlineSales table.

    The table is sorted by default by the column size. This is the size in Bytes that the column is consuming in memory.

    The first column: OnlineSalesKey has 12, 6127,608 rows and a cardinality of 12, 627,608 and as the name suggests is a unique key, hence it has the same cardinality (unique values) as rows.
    It also is the largest column consuming 84MB of memory and 31.77% of the database.
    It is encoding as VALUE type,

    Rows: 12627608
    Cardinarlity: 12627608
    Col Size: 84156128 (Data + Dictionary + HierSize)
    Data 33, 645560
    Dictionary: 128
    Hier Size: 50, 5101440
    % Table: 33.07%

    Unique keys are generally required fields in most tables, so it’s not something you can usually safe space by optimizing (unless you can reduce the size of the key somehow).

    By applying the same analysis you can determine which columns and tables are having the biggest impact on your model size and hence slowing your model refresh time.

  • How to Build a Survey Results Report with Power BI

    Power BI can be useful for displaying and storing survey results for your organization. In this example we build some data for the results of a fictitious Airline survey, using questions taken from a real online airport survey.


    The data was built in Excel with a sheet for questions and a sheet for answers. You can download the Excel file here.

    It’s easy to publish the report as a PDF, so you can share it with users who don’t have access to Power BI. Here is the airport survey report as PDF

    The questions are a simple list with a Question ID and section.


    The answers on the second sheet are formatted so that each row represents all the answers for each respondent.



    Steps for Building the Airline Survey Report.

    Step 1
    The Data is imported to our Power BI Desktop survey using Power Query.

    The only change to the question table is a simple column rename.


    The answer table is transformed so that each line of the table contains the respondent ID, the question, and the Value (the answer). This prepares the data so it can be joined to the question table and easily visualized.
    The applied steps are as follows:

    As you can see the first step after the import is simply to promote the headers, so the column names are the actual column names.
    The second step is just to change the column types to the correct types.
    The key third step is to unpivot the data, so the table is transformed as below a separate row for each respondent’s answer. This makes the report easy to build.


    The report only requires one measure, which is just a unique count of the respondent ID

    Answer Count = DISTINCTCOUNT(Answers[Respondent])


    Basic Graphs can be added.

    And Likert Scale style graphs using a 100% Stacked bar graph visual.

    Its a simple, but useful report.



  • How to Build a KPI Report with Power BI


    The Power BI KPI demo report is a demonstration of how a typical KPI report will look. The report uses data from the fictitious Microsoft Contoso Retail Data Warehouse, which is stored in a SQL Server Database. I’ve renamed it to rn retail for demo purposes.

    The report is currently divided into 5 pages.

    1. The first page is a typical style KPI report, leading with the financial KPI, then breaking down to the store type, and then some online KPI. As the database is quite old (going back to 2009), it doesn’t have a lot of person-level information, but it does for online (hence online KPIs are included here). The KPIs are presented here in comparison to the previous year’s performance, but targets would also typically be included.
      The month selected allows the user to go back to any point in time in the year (or years if more data was included).



      2. The second page covers sales MTD, QTD, and YTD vs. the previous year there are graphs showing the YoY breakdown by store type and a table allowing more drill down into the individual store performance YoY. This interactive tool allows users to quickly pinpoint some of the reasons responsible for higher or lower YoY performance. The measure calculations can be found lower down this page.



      3. The third page of the report goes into more detail on stores broken down by country and city and includes the sales per square feet calculation. Again this helps the user explore the data to identify best and worst-performing stores in more detail.


    2. The fourth page, shows sales by product category and product, allowing you to drill down from the top category levels. There are then graphs for Monthly and yearly performance.



      5. The fifth page focuses on activities. This can be developed further to show actives by activity segment (e.g., new, active, reactive, lapsed, etc. once activity segmentation is added to the model.


      The Data Model:




      Time Intelligence DAX measures
      The DAX measures for MTD, QTD and YTD and comparisons to the previous year are below:
    //Month to Date Sales
    Sales MTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESMTD('Date'[Date]))

    //Quarter to Date Sales
    Sales QTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESQTD('Date'[Date]))

    //Year to Date Sales
    Sales YTD = CALCULATE(SUM(FactSales[SalesAmount]) , DATESYTD('Date'[Date]))

    //Last Year Month to Date Sales
    Sales LY MTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESMTD('Date'[Date])))

    //Last Year Quarter to Date Sales
    Sales LY QTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESQTD('Date'[Date])))

    //Last Year Year to Date Sales
    Sales LY YTD = CALCULATE(SUM(FactSales[SalesAmount]) , SAMEPERIODLASTYEAR(DATESYTD('Date'[Date])))

    //Variance Month to Date vs. Last Year Sales
    Sales VAR MTD vs. Last Year = [Sales MTD]-[Sales LY MTD]

    //Variance Quarter to Date vs. Last Year Sales
    Sales VAR QTD vs. Last Year = [Sales QTD]-[Sales LY QTD]

    //Variance Year to Date vs. Last Year Sales
    Sales VAR YTD vs. Last Year = [Sales YTD]-[Sales LY YTD]

    //Variance % Month to Date vs. Last Year Sales
    Sales VAR% MTD vs. Last Year = ([Sales MTD]-[Sales LY MTD])/[Sales LY MTD]

    //Variance % Quarter to Date vs. Last Year Sales
    Sales VAR% QTD vs. Last Year = ([Sales QTD]-[Sales LY QTD])/[Sales LY QTD]

    //Variance % Year to Date vs. Last Year Sales
    Sales VAR% YTD vs. Last Year = ([Sales YTD]-[Sales LY YTD])/[Sales LY YTD]



  • How to Create a Power BI Top Customers Report

    Most businesses will have sales and marketing departments as well as top-level C-Suite/director levels. They are all going to be interested in a bunch of similar reports from organization to organization.

    One of the most important reports, that sales and marketing teams are going to be interested in is:
    Who are our best customers?

    Top customers often contribute a significant amount of revenue to any organization. The 80/20 rule, that 80% of the revenue is made up of 20% of customers is quite common. Identifying top customers allows teams to segment their audience. Top customers can be given special treatment, such as dedicated account managers and special offers.

    In the model from the Contoso database we use here, the DimCustomer table is joined to the FactOnlineSales table, and the FactOnlineSales table is joined to the Date table. We use the FactOnlineSales table rather than the FactSales table as the FactOnlineSales table as it includes a CustomerKey we can join to the DimCustomer table and pull out customer data at the person level (the FactSales table appears to include store-level data only).

    To start creating the report, we can drag in the SalesAmount field from the FactOnlineSales table and the FirstName, LastName, and CustomerKey from the DimCustomer table as well as DateFirstPurchased. These are some of the fields a sales team is going to be interested in and they will likely ask for more to be added as time goes by.

    Adding the Year field as a filtered visual from the data table, which is already joined to the FactOnlineSales table by the unique date key, allows us to filter the sales amount by year (and we could add months for example, as required).
    We filter the report at the person level by adding the CustomerType field in the Filters panel:

    You will notice here I just dragged in the SalesAmount field from the FactOnlineSales table, but that’s a bit lazy and we should create a measure for the field, like this:

    OnlineSales = SUM(FactOnlineSales[SalesAmount])

    The sales amount filter is useful, but teams are likely to be interested in several different sales calculations, which we can calculate using additional measures. For example, we might want to see the current year-to-date sales and the lifetime sales of a customer in the same report as follows:

    Calculating Lifetime Sales with DAX

    OnlineSales Lifetime = CALCULATE(SUM(FactOnlineSales[SalesAmount]), REMOVEFILTERS('Date'))

    Using REMOVEFILTERS(‘Date’), removes all filters applied from the Date table, such as the year that is being used in the drop-down in this report, so the SalesAmount will be calculated for the customer’s lifetime. This now makes the report more interesting. The year filter still applies to the OnlineSales Measure, but does not apply to the OnlineSales Lifetime measure as below:

    Comparing year-to-date vs. previous year-to-date is another common request.
    To calculate year-to-date sales amount, we use the CALCULATE() function again, but this time combined with the DATESYTD() function:

    Calculating Year-to-Date and Previous Year-to-Date Sales with DAX

    To calculate Year to date sales we can use the DATESYTD function with CALCULATE:

    OnlineSales YTD = CALCULATE(SUM(FactOnlineSales[SalesAmount]), DATESYTD('Date'[Date]))

    and to calculate the OnlineSales YTD Prev Year, we can wrap the DATESYTD() function with the SAMETIMELASTYEAR() function as follows:

    OnlineSales Prev YTD = CALCULATE(SUM(FactOnlineSales[SalesAmount]), SAMEPERIODLASTYEAR(DATESYTD('Date'[Date])))

    Adding these measures to our report table, we should now have something looking like this :

    Here we now have a top customer report for 2009, we can see which customers have brought in the highest revenue this year vs. last year to date, so we are comparing equal periods. We can also see customers’ lifetime value. In this example, Jon Yang is our best online customer, with $61K in online sales. This is equal to his lifetime sales. As you will notice, their first Date of Purchase is 22nd July 2001, which is a long time ago, but the database doesn’t have a record of this transaction as 2009 YTD sales equal lifetime sales. Presumably, this data was not recorded historically, which is often the case with older databases. The data you work with is often not perfect and we need to be aware of these specifics.

    Now we have this year’s YTD measure and the previous year’s YTD measure, it is easy for us to calculate the variance % (or difference), between the two years.

    Online Sales Var YTD% = ([OnlineSales YTD]-[OnlineSales Prev YTD])/[OnlineSales Prev YTD]

    We have an infinity number for the first customer as they had no sales in the previous year, we can tidy that up using the DIVIDE() function, which automatically handles errors and infinity to tidy up the code.

    Online Sales Var YTD% = DIVIDE(([OnlineSales YTD]-[OnlineSales Prev YTD]),[OnlineSales Prev YTD])


    Let’s now add the month as an additional filter. As you can see the YTD and Prev YTD, are recalculated to show the online sales amount up until the end of the month selected. That’s useful and pretty powerful.

    In a real-world scenario, sales teams are going to want the most up-to-date information, so we want to look to add previous days’ sales, week-to-date sales, and month-to-date sales. The issue here is that our data is old, so we need to do some juggling first to bring the Contoso sales data into the future to use for experimentation.
    The process of adding a future date is outlined here.

    DAX Previous Day Sales Calculation

    Once we have the future date in our FactOnlineSales table as in the link above and join it to the date table, our report is brought into the future and we can create a measure calculating yesterday’s sales amount:

    OnlineSales Yesterday = CALCULATE(SUM(FactOnlineSales[SalesAmount]), PREVIOUSDAY('Date'[Date]))

    As you can see I can now filter on the current year, which is 2023, and remove the month filter.
    I can now see the top customers for yesterday if I sort by the OnlineSalesYesterday field.
    This is now giving the sales teams up-to-date, daily reporting, so they can keep their finger on the pulse of the business:

    Now we’re using more ‘current’ data, we can now see we need Week-to-Date and Month-to-Date measures.

    Calculating Week-to-Date Sales can be found here

  • How to find Max Date with List.Max() in Power Query to Create a Future Date

    In the Contoso Retail database, we have old historical data which only goes up until the end of 2009. For our demo reports, we’d like to use more up-to-date data, so we want to add a date field to the FactOnlineSales (and FactSales) tables, so we can mimic more current reports.

    To add what we’ll call a ‘future date’ field, we want to find out the number of days from the present day to the last date in the FactOnlineSales table, then we can recalculate all the old dates bringing them forward by this difference.

    Getting the Max Date in Power Query

    There doesn’t appear to be a direct M code function for max dates, but there is a List.Max() function which works with lists, so we need to create a list from the date field in the FactOnlineSales table. The steps are as follows:

    1. Create a list from the date field (right-click on the DateKey field and select ‘Add as New Query’.

    2. This will create a list of dates from – here I renamed it to FactOnlineSalesDates

    3. We can now reference this list in the FactOnlineSales table using the List.Max() function.
    So we need to create a new custom column in the FactOnlineSales table as follows:

    We then just need to transform it into a Date type:

    That will give us a column containing the max date (the last date), in the FactOnlineSalesTable. The MaxDate is 31/12/2009. We can now use that data in our calculations. Sure we could have just looked for it, but if the data is dynamic, it’s good to be able to calculate it, so we are sure it is correct.

    4. Next we can add another custom column containing today’s date and transform it to a Date type as well. We use the Date.From() function, working on the DateTime.LocalNow() function:

    5. Next add another custom column to calculate the date difference, between today and the max date in the FactOnlineSales table. We can use the Duration.Days() function to calculate the difference in days between today and the last date in the FactOnlineSales table.

    This gives us the number we are looking for.

    Now all we need to do is add another custom column to transform the dates into the present day.
    We can call this ‘DateFuture’. And we can transform it to Date type.


    We’ve added 4 additional columns here to a big table. To limit the impact on performance, we should remove the 3 columns we don’t need, just leaving us with the DateFuture column.

    Now close & apply the data from Power Query to load it into the model.
    In the model view, we can now edit the relationship between the Date table and the FactOnlineSales table, so they are joined using the DateFuture key, which will bring all the data into the future, when we use filters from the Date table.

    If we now look at the FactOlineSales table, we can see the latest date is today (at the time of writing).


    In practice a data warehouse is updated overnight, so the latest complete days worth of data should be the day before, so we want to tweak our calculation.

    If we go back into Power Query query we can subtract a day from today’s date, using the Date.AddDays()function, so this will feed into the future date calculation.
    Date.AddDays(Date.From(DateTime.LocalNow()), -1)


    Now we see the data is up to today.

    So, this is useful if we want to mimic a real-life scenario, where data is up until yesterday relative to today, but it will be confusing for financial reports if our future date keeps changing, so it makes sense to have a fixed date as well.

    So to fix the today’s date field, we can do the following, using the #date function



  • How to Build a Date Table in Power BI

    Probably the most essential table you need in your Power BI model is a date or calendar table.
    You join your date table from any other table containing dates, such as your main Fact Sales table.

    The advantage of using a calendar table is that it contains associated fields such as Year, Month, Start of Month, Start of Quarter, and relevant Financial Periods that will be bespoke to the business.

    The date table should be stored in a data flow so everyone can use it. Here we create one using DAX, which could be exported and imported as a table in the Power Query service. You could also create a date table with Power Query directly. Creating one in DAX gives us practice with Time Intelligence functions.


    The steps to create a date table are as follows:
    1. Open Table View and select ‘New Table’
    2. Start by creating the first column which will be our date key. We can use the CALENDAR() function to auto-create a list of dates in the table. The date range should include the minimum date relevant to your date up to any dates in the future. You could use TODAY() as dates up to today, for the end date, but sometimes for example with forecasting, it can be useful to have future data.
    Of course, a big table may slow performance, so just bear that in mind.

    Date = 
    VAR DateStart = DATE(2007, 01, 01)
    VAR DateEnd = DATE(2030, 12, 31)
    RETURN
    CALENDAR(DateStart, DateEnd)

    3. Once you have your first date column setup which will be the key to joining your other tables, you can then start any other useful columns.

    Here are some for starters:

    Year = YEAR('Date'[Date])
    Month = MONTH('Date'[Date])
    Day = DAY('Date'[Date])
    Quarter = QUARTER('Date'[Date])
    StartOfYear = STARTOFYEAR('Date'[Date])
    StartOfQuarter = STARTOFQUARTER('Date'[Date])
    StartOfMonth = STARTOFMONTH('Date'[Date])
    DayOfWeek = WEEKDAY('Date'[Date], 2)


    Be careful with the day of the week number. It’s always best to double-check against another calendar.
    The default is 1 for Sunday, to set the first day of the week, use WEEKDAY(‘Date'[Date], 2).

    Once you have the day number setup, which can be useful in itself, you can then create a column for the day name:

    DayName = 
    
    SWITCH('Date'[DayOfWeek],
    1, "Mon",
    2, "Tue",
    3, "Wed",
    4, "Thu",
    5, "Fri",
    6, "Sat",
    7, "Sun")

    Once we’ve finished creating our data table, we can then join it within the model to the relevant tables, such as the Contoso FactSales table which has a date key. Note, that the Date field in the Date table is unique.


  • How to create Customer Age Bands in Power BI using DAX

    An interesting dimension to report on for demographics is the age of customers. Understanding what type of customers buy what types of products can help inform our customer insight that can be used for targeting marketing offers more effectively.

    In the DimCustomer table of the Contoso Retail Sales DW, we have a field called: Birthdate, but of course, this is unique for all the customers. What we want to do is group these ages into buckets or age bands, so we can create a more informative report. We can then use the grouping in graphs or as filters as in the graph below:


    To create the categorization, I created a new calculated column in the DimCustomer table with the DAX code following the steps outlined below. Note the exact categories you use are up to you or the business. There are no definitive rules on what age bands you should follow, but by studying other reports, you will get the gist of the most common methods.

    The first best practice step is to start creating variables.
    The _dob variable is assigned the [BirthDate] field.
    The _now variable is assigned the date of today, using the TODAY() function.
    The _ageadys variable is then assigned the difference in the number of days between the date of birth field and today’s date, which is essential for the persons’ age in days.
    The _ageyears variable divides the age in days by 365, to get the age in years.
    The section after the RETURN statement then returns the age band category, based on the IF ELSE logic.
    It’s important to put the statements in the correct order, starting from the least upwards, and note that there are some blank Birthdates if have used IF(_ageyears = BLANK(), “None”.
    I only knew there were some blank dates by visually inspecting the BirthDate columns. If I hadn’t used this the blanks would have been picked up as “<18”), so it’s important to sense check as much as possible.

    DAX for Calculated Column (note Age Range is the column name):
    Age Range =
    VAR _dob = DimCustomer[BirthDate]
    VAR _now = TODAY()
    VAR _agedays = DATEDIFF(_dob, _now, DAY)
    VAR _ageyears = _agedays/365

    RETURN
    IF(_ageyears = BLANK(), “None”,
    IF(_ageyears<18, “<18”,
    IF(_ageyears<30, “18 to 29”,
    IF(_ageyears<40, “30 to 39”,
    IF(_ageyears<50, “40 to 49”,
    IF(_ageyears<60, “50 to 59”,
    IF(_ageyears<70, “60 to 69”,
    IF(_ageyears<80, “70 to 79”,
    IF(_ageyears>=80, “80 plus”)))))))))


  • How to Add a last refreshed date in Power BI

    It’s very useful for users, administrators, and developers to have the data last refresh data added to Power BI reports as this helps confirm how up-to-date, the data is. Sometimes, when a Power BI file is refreshed, you will notice the data doesn’t actually refresh and there is no error, which can be confusing.

    This is how to add the refresh date:

    1. Open Power Query and add a blank query

    2. In the formula bar of the blank query enter = DateTime.LocalNow(). This function gets the current time. Click on the source on the right-hand side to activate the function.

    5. You should now have a second Transform tab, to transform the blank query to a table. Select To Table > Table


    6. You can then change the data type to Date/Time


    7. Finally just rename the column of the table and the column name to something meaningful, like Last Refresh Date.

    8. Then Close and Apply to return to the report view.

    9. Add a card to your page and add the field from the Last Refresh date table you just created. Adjust the size and you should get something like this.

    11. The Date and time of the last refresh date should now refresh each time you refresh the report (as the last refresh table is refreshed).