Latest Posts

  • Testing Power BI DAX Measures Speed with DAX Studio

    DAX Expression > DAX query > Vertipaq query.

    Test 1: Testing the Impact of USERELATIONSHIP() in a DAX Measure

    Does using USERELATIONSHIP() with inactive relationships slow down a DAX measure? I’m not sure so I thought I would test it out.

    To test this I will create a measure using the existing between my date table and my FactOnlineSales table and then I created a copy of the date table and joined it in the same way to the FactOnlineSales table, but making the relationship inactive.

    I use DAX studio to create 2 measures for calculating total sales and then I throw in some filters for year and continent as follows:

    DEFINE
        MEASURE '_Measures'[test1] =
            CALCULATE (
                SUMX (
                    'FactOnlineSales',[Sales Amount]
                ),
                
                'DimDate'[CalendarYear] = 2009,
                'DimGeography'[ContinentName] = "Europe"
                ) 
    
        MEASURE '_Measures'[test2] =
            CALCULATE (
                SUMX (
                    'FactOnlineSales',[Sales Amount]
                ),
                'DimDate (2)'[CalendarYear] = 2009,
              	'DimGeography'[ContinentName] = "Europe",
                USERELATIONSHIP('DimDate (2)'[Datekey], FactOnlineSales[DateKey])
            )
    EVALUATE
    ROW (
        "test", [test2] 
    )

    Below is a screenshot of my DAX studio settings. As you can see I’ve got the ‘Clear cache on Run’ button selected. This will help ensure each test is equal. I also have the use ‘Server Timings’ button selected. Note be careful using ‘Clear cache on run’ on a connection to a production model as it will clear the cache (hence slowing down reports).

    For the original sale measure that has an active relationship to the data table, I get a time of 2,197 which is broken down into FE (Formula engine) is 624 ms and 1,573 ms for the SE (Storage engine -that’s the Vertipaq engine). The FE turns the DAX expression into a DAX query and creates a query plan that it sends to the SE to retrieve the results of the query. The FE also caches the results of queries (this is why I have turned on the clear cache when the DAX query is run, otherwise the subsequent queries will return results very quickly).

    Comparing the 2 measures
    Here we run the query on each measure 5 times to get an average and as you can see there was very little difference in performance. Indeed the first 3 tests on the measure with the active relationship took longer (but this may have been because my laptop was busy doing something). My results are below, I think they are pretty inconclusive, but they are interesting none the less, and the method would be interesting to test with a larger model.


    In the server timings you can view the Vertipaq query that is executed by the Analysis services:

    Each step of the execution is below:
    The first step selects the data from FactOnlineSales with the 2 filters returning 1.4M rows of data of 133 KB.
    The second step does the sum of the sales amount and returns one row. The third step returns the start and end time as well as other information (if you mouse over the query).

    The Par. field stands for Parrallism, which represents the number of parallel threads used by the storage engine to retrieve the result at each step.

  • Analysing the Power BI Data Model to Improve Refresh Rates with DAX Studio


    I’ll start with some references:

    Definitive Guide to DAX Book by Marco Russo and Alberto Ferrari (the definitive experts).
    The Guy in the Cube video on model refresh slow:
    Chat GPT (I don’t know who that is!)
    My own testing and experience and demo using Power BI model using data from the Contoso Retail data warehouse

    First things: Connecting DAX Studio to your data model in the Power BI service
    1. Get the XMLA entry point URL from Power BI by selecting workspace settings > License info:
    You will need to log in with your Power BI login to connect.


    2. You can connect to your model in the service with DAX studio as follows (you may need to download and install it first). You can open it either independently or using the external tools within Power BI.



    File and Model Size
    Typical models use data imported into physical storage from a data warehouse and potentially other sources, such as Google Analytics. The Vertipaq engine stores the model in virtual memory. My Contoso Retail Power BI file is 193 MB (that’s not very big). It is compressed to reduce storage requirements.

    Model Size in DAX Studio (Advanced > View Metrics > Summary)
    When I look at the model summary from the advanced Metrics in DAX studio, the model loaded into virtual memory by the Vertipaq engine is 253 MB. Also of interest are the 37 tables and 366 columns. My model only has 8 tables and 1 calculated table, so it will be interesting later to understand what these other tables are.



    So why is this different?

    In-memory Compression: Power BI compresses data in the .pbix file to reduce storage, so the file size on disk is often smaller than its in-memory model size. However, once loaded, Power BI expands it for efficient query performance. There is still compression (next section), it’s just not as much as in the stored file.

    Metadata Overhead: The model size shown in DAX Studio includes metadata, indexes, dictionaries, and structures needed for DAX calculations and relationships that Power BI uses while executing queries, which aren’t directly reflected in the file’s storage size.

    Cache and Temp Data: DAX Studio may include caches or other temporary data generated during analysis, which increases the apparent size of the model in memory.

    Unused Columns or Tables: Any tables or columns loaded but not used may also contribute to the model’s size in DAX Studio, while Power BI might not fully load them in the saved file.

    Note: According to the model settings, the model is not using caching (which is off by default), so I can rule out one of the four possibilities.

    Power BI License requirements based on model size

    The Power BI Pro license allows for models up to 1GB in size, 10 GB of native storage, and 8 refreshes a day
    The Power BI Premium license allows for models up to 100GB in size and 100 TB of native storage and up to 48 refreshes a day.

    Note: Beyond just loading the raw model, Power BI needs additional memory overhead for processing, caching, and temporary storage during data refreshes and calculations. This means that even if your model fits within the 1 GB limit, you’re likely to experience slow performance or even errors during complex operations, especially if near that cap.

    Semantic Model Refresh Speed (Vertipaq engine: imported tables)

    We can easily access the model refresh times in the Power BI workspace (Select ‘Refresh history’ from the 3 dots menu next to the model entry. This gives us our total model refresh benchmarks (we can average them (or remove outliers as required at known busy times).

    So what are the stages of the model refresh and which ones can we troubleshoot and optimize?

    1. Data Import time: This one we can measure. Adam from Guy in the Cube video gives us a simple suggestion on how we can benchmark our table’s load times, by simply doing a SELECT * on each table in the data warehouse and making a benchmark of the time (repeat process at different times of the day (perhaps remove outliers at busy times) and average. I guess there is also a network latency speed to consider with large volumes of data, but let’s just do the: Total Refresh time – Table SELECT * time:

    Here is an example from my Contoso data warehouse tables:
    1. DimCurrency: Time 0: Rows: 28
    2. DimCustomer: Time 0: Rows: 18,869
    3. DimDate: Time 0: rows 648
    4. DimGeography: Time 0: Rows 274
    5. DimProduct: Time 0: Rows: 2467
    6. DimProductCAtegory: Time 0: Rows 8
    7. DimProductSubcategory: Time 0: Rows: 48
    8. DimStore: Time 0: Rows 306
    9. FactOnlineSales: 2 Mins 49 Secs, Rows: 12.6M

    As you can see only the FactOnline sales table took any real time to SELECT.
    But then you might say – hey, why are you using tables in your models and not views? If a new column appears the model refresh will fail.
    So you’re right in which case you can run the tests on your views.

    Now I don’t have a gateway setup to connect to my laptop today, so I need to set up something in the cloud to perform the test but let’s say the model refreshed in 4 minutes and your SELECT * benchmarks totaled 2 Minutes and 49 seconds, then you can say the for 1 Minute and 11 seconds the power bi service is doing something else.

    So what is happening was the data has been imported? Well, that’s when we need to get into the Vertipaq engine. Note, that the engine is proprietary and so has secrets that we will never learn, but what we do know about the steps it performs are as follows:

    Vertipaq engine steps
    1. Loads the tables into columns
    2. Check each column for the best sort order
    3. Applys compression to create the smallest fastest model
    4. Loads the model into Memory

    Note: the Vertipaq engine only deals with imported data. Direct Queries are processed by the Direct Query engine.

    DAX formula engine
    1. Creates calculated columns and tables (in memory) – not compressed as after Vertipaq
    2. Creates relationships between columns

    So we need to learn more about what these two engines do.

    The Vertipaq Engine
    1. Stores imported data into a columnar database (for faster data retrieval).

    2 . Sorts the columns by the best order for compression techniques (such as RLE)

    3. Compresses the columns using various (proprietary techniques), such as:

    a) Value encoding (e.g. deducting a constant value from each column entry to create a value with fewer bits and hence less space.

    b) Hash encoding (dictionary coding), creating a table similar to normalization technique where an ID and the value are stored in a table in memory (e.g. product sizes, 0: small 1: medium, and 2: large). The 0, 1, and 2 will replace the sizes small, medium, and large resulting in fewer bits being stored in the table.

    c) Run Length encoding (RLE): When values are repeated in a column entries 1-2000 are all Q1, then this can be put in a table of Q1: 2000 (first 50 rows are 50). Entries 2001 to 3500 are Q2. This can be put into the table as Q2: 1500 and so on.
    Using this technique the Vertipaq engine can compute the results very quickly. Sorting is therefore very important for RLE-type compression. The engine does do sorting, but you can potentially help it by sorting the data by columns with the lowest cardinality first.
    the
    Factors impacting compression are therefore as follows:
    1. Cardinality (the more unique values, the less compression can be done, hence increasing the size of the column in memory
    2. Number of repetitive values – The more, the better the RLE compression possible.
    3. The number of rows is important, but the cardinality and repetitive values are more important for compression
    4. The data type – e.g. automatic date/time should be turned off as it created a larger entry size than the date field alone. Firstly the size will be smaller and if removing time results in much lower cardinality the column can be compressed much further using either RLE or Hash encoding.

    Primary Keys such as user ID integer types will most likely use value encoding.
    Numbers such as sales amounts that don’t have massive outliers will probably also be value encoded otherwise dictionary encoding may be used.

    To determine the best sort order, a sample of the first x number of rows (I think its a million), is taken, so it’s important that the first x number of rows are of the best quality. You can help the Vertipaq engine therefore by ensuring your initial data is representative of the whole dataset or the engine will have to re-encode data e.g. if it finds outliers and finds its value compression doesn’t work (hence slowing down the process).

    We can look at the type of compression used, by using DAX Studio (Advanced > View Metrics)


    Firstly, you’ll see an overview of the tables. Each column in a table is encoded differently, so you’ll see the encoding type as ‘Many’.
    You need to drill down to get to the column level (or you could just select the column tab on the left-hand side).

    Now, let’s have a look at the FactOnlineSales table.

    The table is sorted by default by the column size. This is the size in Bytes that the column is consuming in memory.

    The first column: OnlineSalesKey has 12, 6127,608 rows and a cardinality of 12, 627,608 and as the name suggests is a unique key, hence it has the same cardinality (unique values) as rows.
    It also is the largest column consuming 84MB of memory and 31.77% of the database.
    It is encoding as VALUE type,

    Rows: 12627608
    Cardinarlity: 12627608
    Col Size: 84156128 (Data + Dictionary + HierSize)
    Data 33, 645560
    Dictionary: 128
    Hier Size: 50, 5101440
    % Table: 33.07%

    Unique keys are generally required fields in most tables, so it’s not something you can usually safe space by optimizing (unless you can reduce the size of the key somehow).

    By applying the same analysis you can determine which columns and tables are having the biggest impact on your model size and hence slowing your model refresh time.

  • How to use the Window and OFFSET Functions in DAX Measures:


    The Window Function

    To use the Window function, we can start by creating a virtual table of customer sales by sales key using the CALCULATETABLE function and then creating a window on the table to return only the results we want by position. We can make use of DAX query to do this.

    
    DEFINE
    
    VAR CustomerSales =  CalculateTable(SUMMARIZECOLUMNS('V_FactOnlineSales'[CustomerKey], "Sales", SUMX('V_FactOnlineSales', V_FactOnlineSales[SalesAmount])), NOT(ISBLANK(V_FactOnlineSales[CustomerKey])))
    
    VAR CustomerSalesWindow = 
    
        WINDOW(
            1,ABS,3,ABS,
          CustomerSales, 
            ORDERBY([Sales], DESC), 
        )
        
    EVALUATE
    CustomerSalesWindow


    As you can see from the code above, we start off by creating a table in a variable called CustomerSales, which has a column for CustomerKey and a summarized sales column. The table is filtered, so that the customer is not blank.

    We then create a second variable called ‘CustomerSalesWindow’ to filter our CustomerSales table using the Window function.
    The initial customer sales table is as follows

    We then create the window using the following code:

    
    VAR CustomerSalesWindow = 
    
        WINDOW(
            1,ABS,3,ABS,
          CustomerSales, 
            ORDERBY([Sales], DESC), 
        )
    
    EVALUATE
    CustomerSalesWindow


    As you can see the original table is sorted by sales in descending order (we can also use partition, but not In this example).

    The syntax for the window function is as follows:
    WINDOW ( from[, from_type], to[, to_type][, or ][, ][, ][, ][, ][, ] )

    So you can see that we have:
    WINDOW(
    from: 1
    from-type: ABS
    to: 1
    to-type: ABS
    relation: CustomerSales
    ORDERBY([Sales[, DESC))


    As you can see the ORDERBY and PARTITION are very similar to SQL in that it is sorting the sales field in descending order.
    The positions from ABS: 1 to ABS: 3 give us the top 3 sales results.

    Of course, we could have used something like TOPN() function to get the top 3 sales, but if we wanted a specific location e.g. The 3rd to 5th positions, then the WINDOW() function would make it much easier.

    The OFFSET() Function

    The OFFSET() function ‘Returns a single row that is positioned either before or after the current row within the same table, by a given offset’.

    In this example, we use OFFSET() to display the previous month’s sales in a table. We start off by creating a _SummarySales table and then add a second column _SalesOffest, which uses -1 as the delta which shows the previous month’s sales when the ORDERBY clause on CalanderMonth is set to ascending order (ASC). We can then add the final SalesMonthOnMonth column to the table to get the final summary.

    <span role="button" tabindex="0" data-code="AFDEFINE — Syntax: — OFFSET ( [, or ][, ][, ][, ][, ][, ] ) — Create Sales by moth table VAR _SummarySales = SUMMARIZECOLUMNS( DimDate[CalendarMonth], “Sales”, SUMX( RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount] ) ) — Add the offset column by one month VAR _SalesOffset = ADDCOLUMNS( _SummarySales, “PreviousMonth”, SELECTCOLUMNS( OFFSET( -1, _SummarySales, ORDERBY( [CalendarMonth], ASC ) ), [Sales] ) ) — Add a month on month difference column VAR _SalesMonthOnMonth = ADDCOLUMNS( _SalesOffset, “Month on Month”, [Sales] – [PreviousMonth] ) EVALUATE _SalesMonthOnMonth” style=”color:#d8dee9ff;display:none” aria-label=”Copy” class=”code-block-pro-copy-button”>
    AFDEFINE
    	-- Syntax:
    	-- OFFSET ( [,  or ][, <orderBy>][, ][, <partitionBy>][, <matchBy>][, ] )
    	
    	-- Create Sales by moth table
    	
    	VAR _SummarySales = SUMMARIZECOLUMNS(
    		DimDate[CalendarMonth],
    		"Sales", SUMX(
    			RELATEDTABLE(FactOnlineSales),
    			FactOnlineSales[SalesAmount]
    		)
    	)
    
    -- Add the offset column by one month
    
    	VAR _SalesOffset =
    	ADDCOLUMNS(
    		_SummarySales,
    		"PreviousMonth", SELECTCOLUMNS(
    			OFFSET(
    				-1,
    				_SummarySales,
    				ORDERBY(
    					[CalendarMonth],
    					ASC
    				)
    			),
    			[Sales]
    		)
    	)
    
    -- Add a month on month difference column
    
    	VAR _SalesMonthOnMonth =
    	ADDCOLUMNS(
    		_SalesOffset,
    		"Month on Month", [Sales] - [PreviousMonth]
    	)
    
    EVALUATE
    	_SalesMonthOnMonth

    After adding the previous month column, we can then add a month-on-month column.
    This gives us a table as follows:



  • How to filter DAX Measures In Power BI

    The application of various filter arguments is crucial to controlling the output of DAX measures when visual filters are present on a report page. When CALCULATE() is used in an expression, any filters applied will override any existing filters on the filter being applied.
    e.g. CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimProduct[BrandName]=”Litware”)
    Any filter on V_DimProduct[BrandName]is removed and the new filter is applied. So a slicer on that column would stop working.

    In this example, we create a number of measures in a matrix, using different filter functions. There are 3 slicers that can be applied to the matrix: Year, BrandName, and ProductCategoryName. The original simple measure is sales amount, which is simply just a sum of sales amount from the FactOnlineSales table from the Contoso Retail Datawarehouse.



    Simple SUMX Measure:
    The original measure with no filter arguments, will just be affected by the current filter context applied by both slicers and matrix or table rows and columns
    Sales Amount =
    SUM(V_FactOnlineSales[SalesAmount])

    CALCULATE(): Using the powerful Calculate() function, we can apply filters to the original measure to change the way the visual filters affect the measure.

    Sales Amount (Filter Brand Litware) =CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimProduct[BrandName]=”Litware”)

    In the next example, we apply a filter using calculate, but on the year, which is a filter in the matrix. As you can see, the filter on year is removed from the columns and the sum of sales for 2008 is repeated for each year from 2007 to 2009.

    Sales Amount (Filter Year 2008) = CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimDate[CalendarYear]=2008).

    To fix this issue we can use KEEPFILTERS()

    KEEPFILTERS()
    : The KEEPFILTERS() function helps us keep the filters on Year in the matrix:
    Sales Amount (KeepFilters) =

    CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimDate[CalendarYear]=2008, KEEPFILTERS(V_DimDate))

    REMOVEFILTERS(): By using the REMOVEFILTERS() function along with CALCULATE() we can remove all the filters applied to the report. As you can see in the table above, this removes the filters from the Year, Brand, and Product Category Name columns. Essentially giving us the total sales for the company for all time.

    Sales Amount (RemoveFilters: all) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS())

    REMOVEFILTERS(‘TableName’
    ): We can also use the REMOVEFILTERS() function to remove filters only on a specific table (not all tables). In this example, we remove any filters on the V_DimProduct table. Other filters will continue to filter the measure:

    Sales Amount (RemoveFilters: Table) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS(V_DimDate))

    REMOVEFILTERS(‘TableName'[ColumnName]): A more granular method is to remove filters just on specific columns. In this example, we remove any filtering applied by V_DimProductCategory'[ProductCategoryName]

    Sales Amount (RemoveFilters: Column) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS(‘V_DimProductCategory'[ProductCategoryName]))

    ALL(‘TableName’): Removes all filters from a specified table. This is similar to REMOVEFILTERS(‘TableNAme’), but it works differently.

    Sales Amount (All: DimProductCategory Table) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALL(V_DimProductCategory))

    ALL(‘ColumnName’): Remove all filters from specified columns:

    ALLEXCEPT(): Remove all filters except the specified column or columns. In this example, we remove all filters except from the V_DimProductCategory'[ProductCategoryName] column.

    Sales Amount (AllExcept: ProductCategoryName) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALLEXCEPT(V_DimProductCategory, ‘V_DimProductCategory'[ProductCategoryName]))

    ALLNONBLANKROWS(): If your data has blank rows in it, then this may be useful to you. If it doesn’t then you don’t need it. You can apply it to a table or columns:

    SalesTableRows = SalesTableRows (non blank) = COUNTROWS(ALLNOBLANKROW(‘V_FactOnlineSales’))
    SalesOrderLines (non blank) = COUNTROWS(ALLNOBLANKROW(‘V_FactOnlineSales'[SalesOrderLineNumber]))

    ALLSELECTED(‘TableName’): Removes filters coming from within the visual. So in this example, ALLSELECTED() removes the filters on year within the matrix i.e the columns, but if you use the slicer on year, it will still work. You can also apply to columns.

    Sales Amount (AllSelected DimDate) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALLSELECTED(V_DimDate))

    Using multiple Tables and Columns in Filter Expressions


    The above use of filter functions affecting visuals can be further expanded to include multiple tables and multiple columns as in the examples below:

    CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimProduct[BrandName]=”Litware” &&  V_DimProduct[BrandName]=”Northwind Traders”)

    Sales Amount (Filter Multiple Year) =

    CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimDate[CalendarYear] IN {2008, 2009})

    Sales Amount (RemoveFilters multiple) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS(‘V_DimProductCategory'[ProductCategoryName], ‘V_DimProductCategory'[ProductCategoryDescription]))

    Sales Amount (All: multiple) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALL(V_DimProductCategory), ALL(V_DimDate))













  • How to Create Multiple Measures With Power BI DAX Query Editor

    Creating multiple measures in DAX Query Editor

    It may be useful to create templates for creating DAX models, using commonly used DAX measures. Here we create a template for commonly used sales measures, such as Sales YTD, Sales Last Year, Sales Year on Year, and Sales Year on Year %, we can then apply the same logic for QTD and MTD.

    The full code for creating the measures, I will add to the bottom of this page.

    For this example, I am using just 2 tables from the Contoso Retail Data warehouse: DimDate and FactSales. They are joined on the DateKey.

    We start with the Year sales measures as shown below in the DAX query editor. To add them to the model, we just click ‘Update model: Add new measure’, but first we want to format the code, using the Format Query button.

    Here is the code, with the DAX formatted.

    We can then click the 4 ‘Update mode: Add new measure’ texts and it will add the 4 measures to the model..


    We can then create similar measures for QTD and MTD as follows:

    Here is the code for the Quarterly measures:

    The code for creating the Monthly measures is as follows:

    That gives me the 12 measures in record time!


    As promised here is the full code that can be copied and pasted. Of course, you’ll need to change the table names as required. Note, I have created an empty ‘_Meusures’ table to act as a container for the measures.

    // Learn more about DAX queries at https://aka.ms/dax-queries
    DEFINE
    //Year Measures
    	MEASURE '_Measures'[Sales_YTD] = CALCULATE(
    			SUM(FactSales[SalesAmount]),
    			DATESYTD('DimDate'[Datekey])
    		)
    	MEASURE '_Measures'[Sales_LY_YTD] = CALCULATE(
    			SUM(FactSales[SalesAmount]),
    			SAMEPERIODLASTYEAR(DATESYTD('DimDate'[Datekey]))
    		)
    	MEASURE '_Measures'[Sales_YOY] = '_Measures'[Sales_YTD] - '_Measures'[Sales_LY_YTD]
    	MEASURE '_Measures'[Sales_YOY%] = ('_Measures'[Sales_YTD] - '_Measures'[Sales_LY_YTD]) / '_Measures'[Sales_LY_YTD]
    	
    //QTD Measures
    	MEASURE '_Measures'[Sales_QTD] = CALCULATE(
    			SUM(FactSales[SalesAmount]),
    			DATESQTD('DimDate'[Datekey])
    		)
    	MEASURE '_Measures'[Sales_LY_QTD] = CALCULATE(
    			SUM(FactSales[SalesAmount]),
    			SAMEPERIODLASTYEAR(DATESQTD('DimDate'[Datekey]))
    		)
    	MEASURE '_Measures'[Sales_QTD_YOY] = '_Measures'[Sales_QTD] - '_Measures'[Sales_LY_QTD]
    	MEASURE '_Measures'[Sales_QTD_YOY%] = ('_Measures'[Sales_QTD] - '_Measures'[Sales_LY_QTD]) / '_Measures'[Sales_LY_QTD]
    	
    	//MTD Measures
    	MEASURE '_Measures'[Sales_MTD] = CALCULATE(
    			SUM(FactSales[SalesAmount]),
    			DATESMTD('DimDate'[Datekey])
    		)
    	MEASURE '_Measures'[Sales_LY_MTD] = CALCULATE(
    			SUM(FactSales[SalesAmount]),
    			SAMEPERIODLASTYEAR(DATESMTD('DimDate'[Datekey]))
    		)
    	MEASURE '_Measures'[Sales_MTD_YOY] = '_Measures'[Sales_MTD] - '_Measures'[Sales_LY_MTD]
    	MEASURE '_Measures'[Sales_MTD_YOY%] = ('_Measures'[Sales_MTD] - '_Measures'[Sales_LY_MTD]) / '_Measures'[Sales_LY_MTD]
  • How to use the Index Function in DAX Measures

    Demonstrating the use of the Index function with the Contoso Retail Data warehouse. We can start off by building a virtual table in the DAX query editor, which we can use to apply the Index function. The table creates is a list of the first 10 customers by customer key from the DimCustomer table.

    <span role="button" tabindex="0" data-code="DEFINE — Syntax — INDEX([, or ][, ][, ][, ][, ][, ] ) VAR CustomerSalesSample = CALCULATETABLE( SELECTCOLUMNS( DimCustomer, “CustomerKey”, DimCustomer[CustomerKey], “FirstName”, DimCustomer[FirstName], “LastName”, DimCustomer[LastName], “Total Sales”, SUMX( RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount] ) ), TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC ) ) VAR Ind = INDEX( 1, CustomerSalesSample, ORDERBY( [Total Sales], DESC ) ) EVALUATE Ind” style=”color:#d8dee9ff;display:none” aria-label=”Copy” class=”code-block-pro-copy-button”>
    DEFINE
    	-- Syntax
    	-- INDEX([,  or ][, <orderBy>][, ][, <partitionBy>][, <matchBy>][, ] )
    
    	VAR CustomerSalesSample =
    	CALCULATETABLE(
    		SELECTCOLUMNS(
    			DimCustomer,
    			"CustomerKey", DimCustomer[CustomerKey],
    			"FirstName", DimCustomer[FirstName],
    			"LastName", DimCustomer[LastName],
    			"Total Sales", SUMX(
    				RELATEDTABLE(FactOnlineSales),
    				FactOnlineSales[SalesAmount]
    			)
    		),
    		TOPN(
    			10,
    			DimCustomer,
    			DimCustomer[CustomerKey],
    			ASC
    		)
    	)
    
    VAR Ind = INDEX(
    		1,
    		CustomerSalesSample,
    		ORDERBY(
    			[Total Sales],
    			DESC
    		)
    	)
    EVALUATE
    	Ind

    If we evaluate the CustomerSalesSample table first we can see the table we are working with.

    The syntax for the INDEX function is as follows:

    — INDEX([, or ][, ][, ][, ][, ][, ] )
    In our example, we use the of 1 to get the first sales amount, which is the highest sales amount as the Total Sales column is sorted in descending order (DESC)

    And then when we evaluate the Index expression (variable Ind), we get the following, which is the correct output we are looking for.

    To get the last position, we could either sort the data by ascending order (ASC) or we can use the INDEX of -1 as the following example:

    VAR Ind = INDEX(
    		-1,
    		CustomerSalesSample,
    		ORDERBY(
    			[Total Sales],
    			DESC
    		)
    	)

    When we use -1 as the index we get the blank sales returned, which isn’t what we wanted, so need to modify the code.

    One way of filtering out the blanks from the sales table is to add a filter on the table to filter out the blanks as an additional variable as int he below.

    <span role="button" tabindex="0" data-code="DEFINE — Syntax — INDEX([, or ][, ][, ][, ][, ][, ] ) VAR CustomerSalesSample = CALCULATETABLE( SELECTCOLUMNS( DimCustomer, “CustomerKey”, DimCustomer[CustomerKey], “FirstName”, DimCustomer[FirstName], “LastName”, DimCustomer[LastName], “Total Sales”, SUMX( RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount] ) ), TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC ) ) VAR CustomerSalesNotBlank = FILTER(CustomerSalesSample, NOT(ISBLANK([Total Sales]))) VAR Ind = INDEX( -1, CustomerSalesNotBlank, ORDERBY( [Total Sales], DESC ) ) EVALUATE Ind” style=”color:#d8dee9ff;display:none” aria-label=”Copy” class=”code-block-pro-copy-button”>
    DEFINE
    	-- Syntax
    	-- INDEX([,  or ][, <orderBy>][, ][, <partitionBy>][, <matchBy>][, ] )
    
    	VAR CustomerSalesSample =
    	CALCULATETABLE(
    		SELECTCOLUMNS(
    			DimCustomer,
    			"CustomerKey", DimCustomer[CustomerKey],
    			"FirstName", DimCustomer[FirstName],
    			"LastName", DimCustomer[LastName],
    			"Total Sales", SUMX(
    				RELATEDTABLE(FactOnlineSales),
    				FactOnlineSales[SalesAmount]
    			)
    		),
    		TOPN(
    			10,
    			DimCustomer,
    			DimCustomer[CustomerKey],
    			ASC
    		) 
    
    	)
    	VAR CustomerSalesNotBlank = FILTER(CustomerSalesSample, NOT(ISBLANK([Total Sales])))	
    
    	VAR Ind = INDEX(
    		-1,
    		CustomerSalesNotBlank,
    		ORDERBY(
    			[Total Sales],
    			DESC
    		)
    	)
    EVALUATE
     Ind

    Evaluating this code now gives us the least total sales amount that is not blank:

  • How to use the RankX Function in DAX Measures

    Using the RANK() Function in DAX

    To demonstrate the use of the DAX function, we’ll start by creating a simple sales table with the sales amount and the year. Then we’ll add additional columns to create examples of using the RANK function.

    In the first example (SimpleSalesRank), we’ll just create a simple ranking. The default function ranks the Sales column from the least amount to the highest amount and you’ll notice the blank sales value is included as rank one. Note, we can order by more than one column if partitioning by more than one column.

    SimpleSalesRank = RANK(ORDERBY(SalesRank[Sales]))
    

    The first thing we can do is move the blank value to the end of the ranking, using the LAST parameter.

    SimpleSalesRankBLANKLast = RANK(ORDERBY(SalesRank[Sales]), LAST)

    Then we can rank the sales values from the highest first, while still retaining blank in the last position.

    SimpleSalesRankDESC0 = RANK(ORDERBY(SalesRank[Sales], DESC), LAST)

    We can then partition the ranking by using the Year column.

    SimpleSalesRankPARTITION = RANK(DENSE, ORDERBY(SalesRank[Sales], DESC),  LAST, PARTITIONBY(SalesRank[Year]))

    Using RANKX() function in DAX

    In the next example, we create a similar Sales table with a Sales column and a year column. We can then use the RANK functions to create our rankings.

    First, we’ll create a simple ranking as before, but with RANKX(). Included here is the function syntax in the comments.
    As you can see from the table above, the RANKX function defaults to sorting the highest sales value first, whereas the RANK function sorts it last. The RANKX function also defaults to putting the blank last, whereas the RANK function ordered it first.

    <span role="button" tabindex="0" data-code="RankXFunction = –RANKX(, [, [, [, ]]]) RANKX(SalesRankX2, SalesRankX2[Sales])” style=”color:#d8dee9ff;display:none” aria-label=”Copy” class=”code-block-pro-copy-button”>
    RankXFunction = 
    --RANKX(
    , [, [, [, ]]]) RANKX(SalesRankX2, SalesRankX2[Sales])

    We can reverse the order with the following code, but you’ll notice we don’t use the ORDERBY clause in the code:

    RankXFunctionASC = RANKX(SalesRankX2, SalesRankX2[Sales],,  ASC)

    We can also apply the DENSE clause for the ties, as the default is to SKIP the ranking when there have been equal values, for example, there are 2 sales of value 94, which are both ranked 6, as the default treatment of ties is set to SKIP, the next rank value jumps to 8. With the DENSE clause, the next rank does not jump.

    <span role="button" tabindex="0" data-code="RankXFunctionDENSE = –RANKX(
    , [, [, [, ]]]) –note: defaul ties is SKIP RANKX(SalesRankX2, SalesRankX2[Sales], , ASC, DENSE)” style=”color:#d8dee9ff;display:none” aria-label=”Copy” class=”code-block-pro-copy-button”>
    RankXFunctionDENSE = 
    --RANKX(
    , [, [, [, ]]]) --note: defaul ties is SKIP RANKX(SalesRankX2, SalesRankX2[Sales], , ASC, DENSE)

    In the next example we use RANKX with RELATEDTABLE(). We start of by creating a sample of the DimCustomer table, joining it to the FactOnline Sales table and then adding a ranking table to the new customer table for total sales. We then check the ranking by adding a totalsales

    Step 1: Create a sample customer table. In the core below, we create a simple selection of CustomerKey, FirstName, LastName and then filter the table by the first 10 customers by customerkey. So this will give us a table of the first 10 customers in the DimCustomer table. These data come from the Contoso Retail Data Warehouse.

    CustomerSimple = CALCULATETABLE(
        SELECTCOLUMNS(
            DimCustomer,
            "CustomerKey", DimCustomer[CustomerKey],
            "FirstName", DimCustomer[FirstName],
            "LastName", DimCustomer[LastName]
        ),
        TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC )
    )

    The table below is created (first 3 columns), then we add the SalesRank and CustomerTotalSalesCheck to demonstrate the use of the RANKX function with RELATEDTABLE function

    The code for ranking for sales is below. Then we add the CustoemrTotalSalesCheck to confirm that our ranking is working correctly.
    As you can see it is. The first rank of 1 is allocated to total sales of £3,932. The last value is blank ranked 10. You will notice there is no rank 8, so the default tie is to SKIP rank when then is a tie. We can change this by adding the DENSE clause

    SalesRank = RANKX(CustomerSimple, SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]))

    Add the the CustomerTotalSales Check column:

    CustomerTotalSalesCheck = SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]) 

    Next, we add the DENSE so the ties don’t jump (optional)

    SalesRankDENSE = RANKX(CustomerSimple, SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]),,,Dense)

    Creating a Measure using RANKX()
    Now we have some good examples of using the RANK and RANKX function we can use them in a measure, which creates a temporary table in memory.

    To create the measure we can use the DAX Editor in Power BI, DAX Tabular Editor, or DAX Studio. My preference is the DAX Tabular editor, but the code will work in all three, allowing us to see the data as we build the measure. If you just try and build the measure, you can’t be sure what’s going on unless you build the table directly in Power BI, but they are slower to run.

    Here we borrow the code from the previous example creating a physical table, but ensure the table is set to DimCustomer in the Rank function.

    DEFINE
    VAR CustomerSample = 
    CALCULATETABLE(
        SELECTCOLUMNS(
            DimCustomer,
            "CustomerKey", DimCustomer[CustomerKey],
            "FirstName", DimCustomer[FirstName],
            "LastName", DimCustomer[LastName], 
    		"Rank", RANKX(DimCustomer, SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]),,,Dense) ,
    	    "TotalSalesCheck",  SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]) 
    		    ),
        TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC )
    )
    
    EVALUATE
    CustomerSample
    

    Here is the output of the code in DAX Query View (inside Power BI, I haven’t used tabular editor this time).

    As you can see the top rank is 3932 as before.

    I can now for example filter the rankings table, in this example, I filter greater than 1 and less than 4

    1, [RankNumber]
    DEFINE
    VAR CustomerSample = 
    CALCULATETABLE(
        SELECTCOLUMNS(
            DimCustomer,
            "CustomerKey", DimCustomer[CustomerKey],
            "FirstName", DimCustomer[FirstName],
            "LastName", DimCustomer[LastName], 
    		"RankNumber", RANKX(DimCustomer, SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]),,,Dense) ,
    	    "TotalSalesCheck",  SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]) 
    		    ),
        TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC )
    )
    
    VAR TopThreeRankSales = FILTER(CustomerSample, AND([RankNumber] > 1, [RankNumber]< 4))
    
    EVALUATE
    TopThreeRankSales

    This gives me a new table:

    Next, we want to get the sum of the 2nd and 3rd-ranking sales, we can do this by using the GROUPBY function, but without grouping on any columns.

    1, [RankNumber]
    DEFINE
    VAR CustomerSample = 
    CALCULATETABLE(
        SELECTCOLUMNS(
            DimCustomer,
            "CustomerKey", DimCustomer[CustomerKey],
            "FirstName", DimCustomer[FirstName],
            "LastName", DimCustomer[LastName], 
    		"RankNumber", RANKX(DimCustomer, SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]),,,Dense) ,
    	    "TotalSalesCheck",  SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]) 
    		    ),
        TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC )
    )
    
    VAR TopThreeRankSales = FILTER(CustomerSample, AND([RankNumber] > 1, [RankNumber]< 4))
    VAR SalesOfSecondAndThird = GROUPBY(TopThreeRankSales, "SalesOfSecondAndThird", SUMX(CURRENTGROUP(), [TotalSalesCheck]))
    
    EVALUATE
    SalesOfSecondAndThird

    This gives the following output:

    We should now be able to use the DAX in a measure, so we transfer the DAX code into a Measure as follows:
    The code using DEFINE and EVALUATE is a DAX query in the DAX query editor. When we create a measure we are creating a DAX expression. DAX expressions are converted to DAX queries when they are evaluated. Here is the code for creating the measure below.

    1, [RankNumber]
    _SecondAndThirdTotalSales = 
    VAR CustomerSample = 
    CALCULATETABLE(
        SELECTCOLUMNS(
            DimCustomer,
            "CustomerKey", DimCustomer[CustomerKey],
            "FirstName", DimCustomer[FirstName],
            "LastName", DimCustomer[LastName], 
    		"RankNumber", RANKX(DimCustomer, SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]),,,Dense) ,
    	    "TotalSalesCheck",  SUMX(RELATEDTABLE(FactOnlineSales), FactOnlineSales[SalesAmount]) 
    		    ),
        TOPN( 10, DimCustomer, DimCustomer[CustomerKey], ASC )
    )
    
    VAR TopThreeRankSales = FILTER(CustomerSample, AND([RankNumber] > 1, [RankNumber]< 4))
    VAR SalesOfSecondAndThird = GROUPBY(TopThreeRankSales, "SalesOfSecondAndThird", SUMX(CURRENTGROUP(), [TotalSalesCheck]))
    
    RETURN
    SalesOfSecondAndThird

    We can then use the measure in a visualization as follows:

  • An Example of Creating a Sales Report with SQL

    We can create a more interesting sales report using Window functions and additional calculations in the query, including the gross margin %, total profit, and ranks for product price, sales quantity, and sales rank. The query uses a Common Table Expression (CTE) function to group the data first (as we can’t group using Windows functions).

    This is the kind of report decision-makers will be interested in and as more questions come up, additional fields can be added.

    USE [ContosoRetailDW];
    -- Note: Qty * UnitPrice - Discount = TotalSalesAmount (small descrepency)
    -- CTE Function
    
    WITH sub1 AS
    (
    SELECT p.ProductName, s.UnitCost, s.UnitPrice, SUM(SalesQuantity) as TotalSalesQuantity, 
    (s.UnitCost * SUM(SalesQuantity)) as TotalCost, SUM(DiscountAmount) as TotalDiscount,
    SUM(SalesAmount) as TotalSalesAmount
    FROM [dbo].[FactOnlineSales] as s
    LEFT OUTER JOIN [dbo].[DimProduct] as p ON s.[ProductKey] = p.[ProductKey]
    LEFT OUTER JOIN [dbo].[DimDate] d on s.DateKey = d.DateKey
    LEFT OUTER JOIN [dbo].[DimStore] st ON s.StoreKey = st.StoreKey
    WHERE 
    d.CalendarYear = 2007 AND d.CalendarMonthLabel = 'January'
    AND
    st.StoreName = 'Contoso Europe Online Store'
    GROUP BY  p.ProductName, s.UnitCost, s.UnitPrice
    )
    
    --Main Query referencing the CTE Function
    
    SELECT ProductName, UnitCost, UnitPrice, (UnitPrice-UnitCost)/UnitCost as Margin, 
    TotalSalesQuantity, format(TotalCost,'$0,,.0M') as TotalCost, format(TotalSalesAmount, '$0,,.0M') as TotalSales, 
    format(TotalSalesAmount-TotalCost, '$0,,.0M')  as TotalProfit,
    TotalDiscount,
    RANK() OVER(ORDER BY UnitPrice DESC) as PriceRank,
    RANK() OVER(ORDER BY TotalSalesQuantity DESC) as QtyRank,
    RANK() OVER(ORDER BY TotalSalesAmount DESC) as SalesRank
    FROM sub1
    ORDER BY SalesRank
    ORDER BY SalesRank
  • Using Python for Simple Linear Regression

    Creating a simple Linear regression model and preparing for multi-linear regression.

    In this example, we use a sample of marketing spend data vs. sales and inspect the correlation between radio spend and total sales. The regression line is fitted using the ols function from statsmodels.formula.api

    You can download the original file from Kaggle here then just replace the location you save it to in the (df = pd.read_csv() line.

    0: df.drop_duplicates(inplace=True) #rename columns to snake df.rename(columns = {‘Social Media’: ‘Social_Media’}, inplace = True) # Simple order encoding tv_dict = {‘Low’: 1, ‘Medium’: 2, ‘High’: 3} df[‘TV’] = df[‘TV’].replace(tv_dict) # One-hot encoding for non-ordinal variable df = pd.get_dummies(df, columns=[‘Influencer’], dtype=int) # Define and fit the model ols_formula = “Sales ~ Radio” OLS = ols(formula=ols_formula, data=df) model = OLS.fit() summary = model.summary() print(summary) #Prints off the statistical summary including R squared and the Beta coefficients. # Calculate residuals and predicted values residuals = model.resid y_pred = model.predict(df[‘Radio’]) ” style=”color:#d8dee9ff;display:none” aria-label=”Copy” class=”code-block-pro-copy-button”>
    
    import pandas as pd
    import seaborn as sns
    from statsmodels.formula.api import ols
    import statsmodels.api as sm
    import matplotlib.pyplot as plt
    pd.set_option('display.max_columns', None)
    df = pd.read_csv("marketing_sales_data.csv")
    
    # Drop rows with any null values
    df.dropna(inplace=True)
    
    # Check and handle duplicates if needed
    if df.duplicated().sum() > 0:
        df.drop_duplicates(inplace=True)
    
    #rename columns to snake 
    df.rename(columns = {'Social Media': 'Social_Media'}, inplace = True)
    
    # Simple order encoding
    tv_dict = {'Low': 1, 'Medium': 2, 'High': 3}
    df['TV'] = df['TV'].replace(tv_dict)
    
    # One-hot encoding for non-ordinal variable
    df = pd.get_dummies(df, columns=['Influencer'], dtype=int)
    
    
    # Define and fit the model
    ols_formula = "Sales ~ Radio"
    OLS = ols(formula=ols_formula, data=df)
    model = OLS.fit()
    summary = model.summary()
    print(summary)  #Prints off the statistical summary including R squared and the Beta coefficients.
    
    # Calculate residuals and predicted values
    residuals = model.resid
    y_pred = model.predict(df['Radio'])
    

    Results return from the model.summary() method from the OLS (ordinary least squares) function from the statsmodels module. R squared is calculated as 0.757 meaning 76% of the variability in y (sales) is accounted for by radio. However, if we look at other media, we will see that other variables (TV) also have a strong correlation.

    The Beta coefficient for radio spend is 8.17, which means that for every $1 million in Radio spend, we get $8.17 million in sales.

  • How to check for model assumptions with Python Seaborn Graphics


    We need to check assumptions for models to give us confidence that are models have integrity and are not biased or overfitting the data.

    We check for three assumptions in this example, with sub-plotted seaborn graphics for a linear regression model.
    The code for creating the linear regression model can be found in this post
    You can run the code below once you have built the model. The model models the relationship between radio advertising spending and radio sales.

    assumption-checking

    Graph 1: Checking the Linearity Assumption
    The linearity assumption is as follows: ‘Each predictor variable (x) is linearly related to the outcome of variable y.’
    In the first graph, we plot radio advertising spend against radio sales and can see there is a linear relationship (first graph). So we can conclude the linearity assumption is met.

    Graph 2: Checking Homoscedacity assumption with a scatterplot

    The homoscedasticity assumption (extra points if you can spell it correctly) is as follows:
    y_pred are the predicted y values from a regression line.

    In the second graph, we plot the residuals of the model, which are the difference between actuals and model forecasts.

    Homoscedasticity means that the residuals have equal or almost equal variance across the regression line.
    By plotting the error terms with predicted terms we can check that there should not be any pattern in the error terms.’ Good homoscedacity is therefore a balanced graph of residuals above and below zero.

    Graph 3: Check for Normality Assumption
    In the third graph, the histogram is used to plot the residuals of the regression line (the actual y values vs. the predicted y values) for x. If the model is unbiased, the residuals should be normally distributed (and we see that).

    The fourth graph is a Q-Q plot which is also used to check the normality assumption.

    fig, ax = plt.subplots(2, 2, figsize=(18, 10))
    
    fig.suptitle('Assumption Checks')
    
    #Check for linearity
    ax[0, 0] = sns.regplot(
        ax=ax[0, 0],
        data = df,
        x = df['Radio'],
        y = df['Sales'], 
        );
    ax[0, 0].set_title('Radio Sales')
    ax[0, 0].set_xlabel('Radio Spend ($K)')
    ax[0, 0].set_ylabel('Sales ($)')
    #ax[0].set_xticks(range(0,10,10))
    #ax[0].set_xticks(rotation=90)
    
    
    #Check for Homeoscedacity
    # Plot residuals against the fitted values
    ax[0, 1] = sns.scatterplot( ax=ax[0, 1],x=y_pred, y=residuals)
    ax[0, 1].set_title("Residuals vs Fitted Values")
    ax[0, 1].set_xlabel("Fitted Values")
    ax[0, 1].set_ylabel("Residuals")
    ax[0, 1].axhline(0, linestyle='--', color='red')
    
    
    #Check for normality
    ax[1, 0] = sns.histplot(ax=ax[1, 0], x=residuals)
    ax[1, 0].set_xlabel("Residual Value")
    ax[1, 0].set_title("Histogram of Residuals")
    
    #Check for nomrmality QQ plot
    ax[1, 1] = sm.qqplot(residuals, line='s',ax = ax[1,1])
    ax[1, 0].set_title("Q-Q Plot")
    
    
    
    #sm.qqplot(test, loc = 20, scale = 5 ,  line='45')
    
    plt.show()

  • Privacy Policy