Author: CB

DAX Functions Reference

This DAX functions reference guide is for anyone who wants a fairly complete guide to the main DAX functions you are going to need to create business intelligence reports for businesses with Power BI.

Converting and Checking Data Types with DAX Functions

When writing DAX functions, such as with SQL, you will find that you sometimes need to cast (convert) data types, for example, from a text field to a numeric or from a date stored as text to a date data type.

VALUE()
Converts a text string that represents a number into a numeric value (decimal or whole number). Useful for converting text columns containing numeric values into actual numbers. Power BI sometimes performs this conversion implicitly, but VALUE ensures explicit conversion.

DATEVALUE()
Converts a text string representing a date into a date/time data type. (Commonly used to convert text dates into date values in DAX, though not explicitly listed in the search results, it is a standard DAX function.)

FORMAT(, )
Converts a value to text according to a specified format string. Useful for converting numbers or dates into formatted text strings.

TIMEVALUE() Converts a time in text format to a time data type

DATETIMEVALUE() Converts a text string to a datetime value.

Checking Data Types:

ISNUMBER() Returns TRUE if the value is a number.
ISTEXT() Returns TRUE if the value is text.
ISBLANK() Checks if a value is blank.
ISDATE() Returns TRUE if the value is a date.
ISFILTERED() / HASONEVALUE() – Useful for checking slicers/filters.

DAX Functions for Date, Date Arithmetic, and Time Intelligence

Nearly every Power BI business intelligence report you are likely to create (except projects like surveys, for example)
are going to include some form of time intelligence, often in the form of time series graphs. Monthly graphs are common examples, as are year-on-year, quarter on quarter-on-quarter. Working with time is an essential part of the
analyst or the business intelligence report developer. Learning to work with a central data table is also an essential part of the
process. Mastering these DAX functions is a core skill to have.

YEAR(date) Returns the year from a date.
MONTH(date) Returns the month number (1–12).
DAY(date) Returns the day of the month (1–31).
WEEKDAY(date, [return_type]) Returns the day of the week as a number (1–7).
HOUR(datetime) Returns the hour from a datetime.
MINUTE(datetime) Returns the minute.
SECOND(datetime) Returns the second.
QUARTER(date) Returns the quarter (1–4).
WEEKNUM(date, [return_type]) Returns the ISO or standard week number of the year.
TODAY() Returns the current date (no time).
NOW() Returns the current date and time.

DATE(year, month, day) Creates a date from numeric year, month, and day.
TIME(hour, minute, second) Creates a time from numeric components.
DATEVALUE(text) Converts a text string to a date.
TIMEVALUE(text) Converts a text string to a time.
DATETIMEVALUE(text) Converts a text string to a datetime

Date Arithmetic

EDATE(start_date, months) Adds/subtracts months to/from a date.
EOMONTH(start_date, months) Returns the end of the month after adding months.
DATEADD(dates, number_of_intervals, interval) Shifts dates forward/backward in time.
DATEDIFF(start_date, end_date, interval) Returns the difference between two dates in specified units.
ADDMONTHS(date, months) Alias for EDATE().
ADDDAYS(date, days) Use date + N instead.

Date Time Intelligence

TOTALYTD(expression, dates[, filter][, year_end_date]) Year-to-date total.
TOTALQTD(expression, dates[, filter]) Quarter-to-date total.
TOTALMTD(expression, dates[, filter]) Month-to-date total.
SAMEPERIODLASTYEAR(dates) Returns equivalent period from prior year.
PARALLELPERIOD(dates, number_of_intervals, interval) Returns a table shifted by intervals.
PREVIOUSYEAR(dates[, year_end_date]) Returns the previous year’s full set of dates.
PREVIOUSMONTH(dates) Returns the previous month.
PREVIOUSDAY(dates) Returns the previous day.
NEXTDAY(dates) Returns next day.
NEXTMONTH(dates) Returns next month.
NEXTYEAR(dates) Returns next year.
FIRSTDATE(dates) Returns the first date in a column.
LASTDATE(dates) Returns the last date in a column.
FIRSTNONBLANK(column, expression) Often used to get the first meaningful date.
LASTNONBLANK(column, expression) Same as above, but for last.

For calculating week-to-date, you can check out the post on calculating week-to-date sales here

DAX Functions for Changing Table Relationships in Power BI

Knowing how to change table relationships in a Power BI model is a key skill for managing your data model.
Relationships in a Power BI model are like having permanent joins in a data warehouse. You don’t have to create the join each time to use it.

With DAX functions, you can change the relationship being used when calculating a measure, a bit like creating
joins in SQL, but where you have existing relationships, you also need to know how to break them, to prevent them from influencing your measure. Relationships that you create in a Power BI model are like fixed joins that are always there in
With TREATAS(), you can create new relationships. With USERELATIONSHIP(), you can activate existing inactive relationships, and with CROSSFILTER(), you can break relationships. These powerful DAX functions will enable you to work around your model without creating a spaghetti junction model.

Sales by Ship Date :=
CALCULATE(
SUM(Sales[Amount]),
USERELATIONSHIP(Sales[ShipDate], Calendar[Date])
)

CROSSFILTER()
Changes the direction or disables a relationship during calculation.

Sales Cross Filter Both :=
CALCULATE(
SUM(Sales[Amount]),
CROSSFILTER(Customers[CustomerID], Sales[CustomerID], BOTH)
)

Modes:
BOTH — Bidirectional filter

ONEWAY — Single direction (default)

NONE — Disable the relationship for the calculation

REMOVEFILTERS() / ALL()
Not relationship changers per se, but they can negate filtering effects, which affects how joins behave in a visual context.

Total Sales (Ignore Customer Filter) :=
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(Customers)
)

TREATAS()
Applies values from one column as if they came from another — acts like a custom join.

Sales for Selected Regions :=
CALCULATE(
SUM(Sales[Amount]),
TREATAS({“East”, “West”}, Regions[RegionName])
)

LOOKUPVALUE() / RELATED() / RELATEDTABLE()
These retrieve values across relationships — they don’t modify relationships but emulate joins in calculated columns or measures

Customer Region := RELATED(Customers[Region])

Aggregated DAX Functions and Table DAX Aggregated Functions.
These are the functions that do the math in your DAX measures. They summarize, group, or compute totals over columns or expressions. When referencing columns in related tables, you can use the X functions. X functions are also useful when you are referencing parts of a virtual table created in a DAX function, such as by using the SUMMARIZE() or SELECTCOLUMNS() functions.

`SUM(<column>)`	Adds up all the values in a column.
`AVERAGE(<column>)`	Returns the mean (arithmetic average).
`MIN(<column>)`	Returns the smallest value.
`MAX(<column>)`	Returns the largest value.
`COUNT(<column>)`	Counts non-blank values in a column.
`COUNTA(<column>)`	Counts non-empty values (text, numbers, etc.).
`COUNTBLANK(<column>)`	Counts the number of blank values.
`COUNTROWS(<table>)`	Returns the number of rows in a table.
`DISTINCTCOUNT(<column>)`	Returns the count of unique values.
`PRODUCT(<column>)`	Multiplies all values in a column.
`MEDIAN(<column>)`	Returns the median (middle value).
`STDEV.P(<column>)`	Standard deviation (population).
`STDEV.S(<column>)`	Standard deviation (sample).
`VAR.P(<column>)`	Variance (population).
`VAR.S(<column>)`	Variance (sample).

Table Aggregation Functions

Function	Description
`SUMX(<table>, <expression>)`	Sums up values calculated row by row over a table.
`AVERAGEX(<table>, <expression>)`	Returns the average of an expression over a table.
`MINX(<table>, <expression>)`	Minimum value evaluated row by row.
`MAXX(<table>, <expression>)`	Maximum value row by row.
`COUNTX(<table>, <expression>)`	Count of non-blank results from row-wise evaluation.
`MEDIANX(<table>, <expression>)`	Median value from a row-wise calculation.
`STDEVX.P(<table>, <expression>)`	Standard deviation across rows (population).
`STDEVX.S(<table>, <expression>)`	Standard deviation (sample).
`VARX.P(<table>, <expression>)`	Variance (population).
`VARX.S(<table>, <expression>)`	Variance (sample).

Arithmetic and Logical DAX functions

Arithmetic functions provide the standard set of mathematical functions you would expect in any programming environment. The DIVIDE() function is useful to prevent errors from showing in your reports.

Arithmetic DAX Functions

`DIVIDE(<numerator>, <denominator>[, <alternateResult>])`	Performs safe division, avoiding division-by-zero errors.
`+`	Adds two numbers.
`-`	Subtracts one number from another.
`*`	Multiplies two numbers.
`/`	Divides two numbers (can cause errors if denominator is zero).
`^`	Raises a number to the power of another (exponentiation).
`MOD(<number>, <divisor>)`	Returns the remainder after division.
`QUOTIENT(<numerator>, <denominator>)`	Returns the integer portion of a division.

Logical Functions
Logical DAX functions are simple functions that help you do a bit more in DAX. IF() allows for some simple IF ELSE logic, while SWITCH() allows you to write IF() statements in a more structured way. There is also some error handling and functions to help with conditional logic.

IF() Basic conditional logic.

SWITCH() Multi-condition replacement for nested IFs.

IFERROR() Returns alternate result if there’s an error.

AND(), OR() Combine logical tests.

NOT() Negates a logical expression.

DAX Functions using Calculate()

The CALCULATE() DAX function enables you to filter your measures in many different ways. It is often said to be the most important measure to master in DAX, as it enables you to jump from the simple Excel-like arithmetic functions to a more powerful mastery of DAX, which takes it to a higher level than Excel formulas. It is useful to compare measures that are created with FILTER() against the CALCULATE() function to see the differences between them.
DAX CALCULATE() changes the filter context and then evaluates then.

Simple Filter
Sales for US :=
CALCULATE(
SUM(Sales[Amount]),
Customers[Country] = “United States”
)

Time Intelligence
Sales Last Year :=
CALCULATE(
SUM(Sales[Amount]),
SAMEPERIODLASTYEAR(‘Date'[Date])
)

YTD Sales :=
CALCULATE(
SUM(Sales[Amount]),
DATESYTD(‘Date'[Date])
)

Multiple Filters
High Value EU Sales :=
CALCULATE(
SUM(Sales[Amount]),
Sales[Amount] > 1000,
Customers[Region] = “EU”
)

Using USERELATIONSHIP
Sales by Ship Date :=
CALCULATE(
SUM(Sales[Amount]),
USERELATIONSHIP(Sales[ShipDate], ‘Date'[Date])
)

Using CROSSFILTER
Sales with Bi-directional Filter :=
CALCULATE(
SUM(Sales[Amount]),
CROSSFILTER(Products[ProductID], Sales[ProductID], BOTH)
)

Removing Filters
All Sales (Ignore Product Filter) :=
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(Products)
)

Sales All Time :=
CALCULATE(
SUM(Sales[Amount]),
ALL(‘Date’)
)

Filtering by Measure
Sales Above Average :=
CALCULATE(
SUM(Sales[Amount]),
Sales[Amount] > AVERAGE(Sales[Amount])
)

Conditional Filters (TREATAS)
Sales for Selected Categories :=
CALCULATE(
SUM(Sales[Amount]),
TREATAS({“Furniture”, “Office Supplies”}, Products[Category])
)

Filtering by Related Table
Sales to Gold Customers :=
CALCULATE(
SUM(Sales[Amount]),
Customers[CustomerType] = “Gold”
)

— 🧠 Dynamic Filtering with Measures
Sales Last N Days :=
CALCULATE(
SUM(Sales[Amount]),
DATESINPERIOD(‘Date'[Date], MAX(‘Date'[Date]), -30, DAY)
)

DAX Functions using CALCULATETABLE()

Calculate is similar to CALCULATE() in that it is a filter function, but it is a table filter function, so it is very powerful and can be used for filtering tables created in functions such as SUMMARIZE() and SELECTCOLUMNS ().
You can use it in DAX variables to filter virtual tables, which is a very powerful technique.

Basic Filtered Table with CALCULATETABLE()
EU Sales Table :=
CALCULATETABLE(
Sales,
Customers[Region] = “EU”
)

Time Intelligence Table with CACLULATETABLE()
Sales Last Year Table :=
CALCULATETABLE(
Sales,
SAMEPERIODLASTYEAR(‘Date'[Date])
)

Multiple Filters
High Value US Sales Table :=
CALCULATETABLE(
Sales,
Customers[Country] = “United States”,
Sales[Amount] > 1000
)

The FILTER() DAX Function

The FILTER() function is used to change the filter on a table and can be used inside a CALCULATE function.
It can be used for certain jobs that CALCULATE can’t do, for example:

CALCULATE( [Total Sales], [Total Sales] > 1000 ) — invalid
CALCULATE([Total Sales],FILTER(ALL(Sales),[Total Sales] > 1000) — Valie)

Basic FILTER inside CALCULATE
Sales Over 1000 :=
CALCULATE(
SUM(Sales[Amount]),
FILTER(Sales, Sales[Amount] > 1000)
)

Multiple Conditions
High Sales in EU :=
CALCULATE(
SUM(Sales[Amount]),
FILTER(
Sales,
Sales[Amount] > 1000 && Sales[Region] = “EU”
)
)

Time-based Filtering
Last 30 Days Sales :=
CALCULATE(
SUM(Sales[Amount]),
FILTER(
‘Date’,
‘Date'[Date] >= TODAY() – 30 && ‘Date'[Date] <= TODAY()
)
)

FILTER with RELATED
Gold Customer Sales :=
CALCULATE(
SUM(Sales[Amount]),
FILTER(
Sales,
RELATED(Customers[CustomerType]) = “Gold”
)
)

Nested FILTER with VALUES
Top 5 Customers by Sales :=
TOPN(
5,
FILTER(
VALUES(Customers[CustomerID]),
CALCULATE(SUM(Sales[Amount])) > 0
),
CALCULATE(SUM(Sales[Amount]))
)

FILTER used in a variable
Sales Filtered by Variable :=
VAR FilteredTable = FILTER(Sales, Sales[Amount] > 1000)
RETURN
SUMX(FilteredTable, Sales[Amount])

DAX REMOVFILTERS() Function

REMOVEFILTERS(), as the name suggests, is used to remove filters on a measure. It is usually used inside CALCULATE()
and enables more advanced filter changes. You can use it on an entire table or a specific column.

Remove all filters from a table
Total Sales (Ignore All Product Filters) :=
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(Products)
)

Remove filters from one column only
Total Sales (Ignore Product Category Only) :=
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(Products[Category])
)

Remove filters but keep others
Total Sales (Ignore Region but Keep Country) :=
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(Customers[Region])
)

Remove all date filters (for lifetime total)
Total Sales All Time :=
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(‘Date’)
)

Compare Filtered vs Unfiltered
Sales vs All Time :=
DIVIDE(
SUM(Sales[Amount]),
CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(‘Date’)
)
)

Remove filters in a variable
Sales With No Customer Filters :=
VAR AllSales = CALCULATE(
SUM(Sales[Amount]),
REMOVEFILTERS(Customers)
)
RETURN
AllSales

DAX Table Functions

DAX table functions are powerful functions to manipulate tables inside measures.

VALUES() Returns a one-column table of distinct values.

ALL() Removes filters and returns all values.

ALLSELECTED() Returns values considering applied visual filters.

SELECTCOLUMNS() Creates a new table with selected columns.

ADDCOLUMNS() Adds a calculated column to a table.

SUMMARIZE() Groups and aggregates a table.

UNION(), INTERSECT(), EXCEPT() Set operations on tables.

CROSSJOIN() Cartesian join between two tables.

Ranking and Windowing

RELATED() Pulls a value from a related table (many-to-one).
RELATEDTABLE() Returns a table from a related one-to-many relationship.
LOOKUPVALUE() Returns a value by matching one or more columns (like VLOOKUP).

Statistical Functions

MEDIAN() / MEDIANX() Median value.
STDEV.P() / STDEV.S() Standard deviation.
VAR.P() / VAR.S() Variance.
GEOMEAN() / GEOMEANX() Geometric mean.

More Iteration Functions
MAXX(), MINX() Row-by-row max/min across a table.
COUNTX() Row-by-row counting.
RANKX() Ranks evaluated row-by-row.
CONCATENATEX() Joins text values across rows with a delimiter.

Once you’ve got the DAX functions under your belt, check out the post on creating multiple measures at once

DAX Code Examples

Here are some more DAX code examples.

1. Using Variables
2. FORMAT()
3. HASONEVALUE()
4. AND, &&
5. CALCULATETABLE() and SUMMARIZE()
6. USERELATIONSHIP()
7. SWITCH()
8. ISFILTERED() and making visual transparent
9. SELECTEDVALUE() and creating a dynamic Graph Title
10. FILTER and ADDCOLUMNS
11. RANK()

VAR: Using Variables

Running Total =
VAR MaxDateInFilterContext = MAX ( Dates[Date] ) //variable 1 max date#
VAR MaxYear = YEAR ( MaxDateInFilterContext ) //variable 2 year of max date
VAR DatesLessThanMaxDate = //variable 3 filter dates > variable 1 and variable 2
FILTER (
ALL ( Dates[Date], Dates[Calendar Year Number] ),
Dates[Date] <= MaxDateInFilterContext
&& Dates[Calendar Year Number] = MaxYear
)
VAR Result = //variable 4 total sales filtered by variable 3
CALCULATE (
[Total Sales],
DatesLessThanMaxDate
)
RETURN
Result //return variable 4

FORMAT: Formatting Numbers
actual = if(sum[actual] >1000000, “FORMAT(SUM([actual], “#, ##M”), IF(SUM([actual]>=1000, “FORMAT(SUM(actual]), “#,,.0K”))

FORMAT(min(column, “0.0%”)
FORMAT(min(column, “Percent”)

eg, if the matrix is filtered,
IF(ISFILTERED(field], SELECTEDVALUE([column])

HASONEVALUE: Check if the column has one value in if
Valuecheck = if(HASONEVALUE([column], VALUES(field))

FILTER table by related field = united states and sumx salesamount_usd
= SUMX(FILTER( ‘InternetSales_USD’ , RELATED(‘SalesTerritory'[SalesTerritoryCountry]) <>”United States” ) ,’InternetSales_USD'[SalesAmount_USD])

AND, can also use &&
Demand =
    SUMX (
        FILTER (
            RELATEDTABLE ( Assignments ),
            AND (
                [AssignmentStartDate] <= [TimeByDay],
                [TimeByDay] <= [AssignmentFinishDate]
            )
        ),
        Assignments[Av Per Day]
    )

CALCULATETABLE, SUMMARIZE
Calculate Table with Summarize and Filter

Order Profile =
CALCULATETABLE (
SUMMARIZE (
‘Sales Table’,
‘Sales Table'[Order_Num_Key],
Customer[Sector],
“Total Value”, SUM ( ‘Sales Table'[Net Invoice Value] ),
“Order Count”, DISTINCTCOUNT ( ‘Sales Table'[Order_Num_Key] )
),
YEAR ( DimDate[Datekey] ) = YEAR ( TODAY () )
)


)

USERELATIONSHIP Uses inactive relationship between tables

CALCULATE (
    [Sales Amount],
    Customer[Gender] = "Male",
    Products[Color] IN { "Green", "Yellow", "White" },
    USERELATIONSHIP ( Dates[Date], Sales[Due Date] ),
    FILTER ( ALL ( Dates ), Dates[Date] < MAX ( Dates[Date] ) )

SWITCH

SWITCH(<expression>, <value>, <result>[, <value>, <result>]…[, <else>])
= SWITCH([Month], 1, “January”, 2, “February”, 3, “March”, 4, “April” , 5, “May”, 6, “June”, 7, “July”, 8, “August” , 9, “September”, 10, “October”, 11, “November”, 12, “December” , BLANK() ) //place on separate lines

SWITCH with Measure

= SWITCH(TRUE(),
[measure] = “turnover”, [turnover]
[measure] = “Profit”, “[Profit]

, BLANK()

)

Visuals
ISFILTERED()
Check Filtered = ISFILTERED([column])

For a complete DAX guide, visit SQLBI at https://dax.guide/

November 29, 2025

Setting up Power BI and Git Integration with Azure DevOps

Git version control keeps track of every change to DAX measures, model relationships, table metadata, M code, and report visuals (if any). It allows you to go back to older versions and compare versions, and see exactly what has been changed.
If 2 people work on the same PBIX file, changes can be overwritten, making collaboration difficult. With Git, each person works on their own branch, and Git merges the changes, so you can review and approve changes.

You need to convert your PBIX file to a PBIP file for Power BI Git integration. The PBIP file points to a folder holding all your Power BI datasets and report elements in text-readable files(except the data itself), but when you work with PBIP, you can see data (confusing) as Power BI Desktop stores it in a hidden temp file separate from the PBIP folder, so you can still add tables, etc, with PBIP.

There are 2 options:

1. GitHub
2. Azure DevOps

Both use the same underlying Git technology, which is the open source version control system originally created by Linus Torvald. Microsoft bought GitHub for $7.5 billion dollars in 2018. So Microsoft actually owns both.

Power BI Git Integration – GitHub vs. Azure DevOps

A comparison summary of GitHub vs. Azure DevOps is below. Privacy and security are the main differences.

Feature	Azure DevOps	GitHub
Home	Microsoft Entra ID tenant	Github.com
Privacy	Higher – stays in corporate tenant, controlled by your policies	Lower – repos live in GitHub cloud, identity separate unless using SSO
Best for	Enterprise, regulated data, security	Open-source, general dev, small teams.
Access control	Uses Entra ID roles, security groups, conditional access, MFA	Github roles + optional Github
Data	Repos stored inside Azure DevOps under your tenant	Repos stores in GitHub’s cloud.
Cost	Free for individuals, companies pay.	Free for individuals, companies pay.

Power BI File Types

There are several different Power BI file types, which are useful to understand before diving into Git Integration.

File Type	Includes	Features
PBIX file – compresses the PBIP + cached data	Everything in one file – Report visuals – Data model (semantic model) – Power Query M code – cached data (imported tables).	Binary – not Git friendly. You cannot track changes.
PBIP – Power BI Project (Folder based format)	Folder including: – model.bim (semantic model) – DataModelSchema.json (Power Query / M) – *.pbir (report JSON) – Project.json	These are Git friendly (all txt files) PBIP – Power BI Project (Folder-based format) You can merge changes. Designed for DevOps Tabular editor can edit a BIM file.
BIM – Tabular Model file (model.BIM)	A JSON file that can be created by PBI desktop, Tabular editor, or Analysis Services tabular.	Does not contain Power Query M, Data or report pages. You can version control the semantic model only.
TMDL – Tabular Modular Definition (newer than BIM)	Created by TE3 and Fabric Semantic Models. Contains the same info as BIM but in separate files: – e.g. Sales.json – Data.json – Amount.json – Total sales.json	Better for version control, but not fully supported and more complex.
TMSL – Tabular Model Scripting Language	JSON format used in XMLA deployments
JSON Power BI Report (.PBIR)	Part of PBIP – Contains only the report layout, no model or data.

Git version control

For solo development, you can just use Git locally, but you need to switch between PBIX and PBIP files that can be synced with the Git Repo.

For multi-developer safety and automation, enable workspace Git integration:
When you use Git integration with Power BI, it automatically converts the file into a PBIP format in your Git repo.

– Set up workspace with Git Integration
– Connect the workspace to a Gi Repo
– Choose the direction of the sync (Workspace > Git)

Note: Syncing with Git is manual

When you upload to the PBI service, the service will show “Uncommitted changes”.
Then you have to manually synchronize changes with GitHub

So let’s try it.
Luckily, I still have my Fabric free trial on my Power BI account running. It’s on FT1.

To get started, I will setup the following first:

Create a new workspace called: Gi Test Dataset
I create a test dataset and load my Adventure Works tables into it.
Publish my dataset to my GI Test Dataset workspace.
I go to my workspace settings and select Git Integration.

I note that i only have the option for Azure DevOps and not GitHub.
Power BI GitHub integration can be enabled in the Tenant Admin Control settings.

So I will try with Azure DevOps first:

I select Connect and i’m given some details to fill in:

To use Azure DevOps for Power BI git integration, you ned to use an Azure account, which you can get for free to start, if you want to try it.
Once you get to DevOps, you start by setting up your company name and project

My project URL is now as follows, where companyname is the organisation name I chose.
https:/dev.azure.com/companyname/adventureworksgittest

Next, we select Repos, and then we just need to click Initialise to set up the main branch, and it should be ready.

Now I seem to have everything I need. It just asks for a Git folder name, which I enter last.

I hit connect, and i get an error:

So I go back to the Admin portal and enable the other geographical locations option.

Enabled export to Git in other locations

That cleared the error, and when I retry my git connection, I get this message:

And it seems to be working.

Returning to Azure, I can see my Power BI Test folder, which is my PBIP folder, and 2 sub-folders.

The first folder has report-related files, the second has dataset-related files, all in tmdl format

Then we have the model tmdl files:

Files included in the folder on Git are as follows:

The table files contain:
Columns, Measures, Hierarchies, Datatypes, and DAX Measure Definitions
The relationships.tmdl file contains all table relationships.
The model.tmdl contains calculation groups, Global metadata, and Time Intelligence
database.tmdl contains: Data sources, Connection strings, partitions, and refresh policies.
/Cultures/en-US.tmdl contains translations, format strings, etc.
definition.pbism is the entry point file for the semantic model.

Clicking on a table, we can see the TMDL for the table. That is pretty nice. You can even edit the model in DevOps and commit the changes back (but don’t break the model).

If I scroll down, I can also see the m code. That’s pretty useful.

Next, I’m going to add a measure to my dataset and republish.

I go back to Azure DevOps and check my table, but I can’t see my measure.
Because I used the PBIX file, it overwrote the dataset, and so all changes disappeared. Hmmmm.

So I need to change my PBIX file to a PBIP folder

When I publish it still overwrites the dataset.

When I check my files in the workspace, I have a conflict on the dataset.
This is because I started with a PBIX and switched to PBIP

So I went to my Git Repo and deleted my PowerBITest folder, then in Power BI workspace settings > Github integration and disconnected.

Now reconnect my Power BI git Integration as before, and I have green lights (and my folder is back in my Git Repo).

I can now see my measure in my FactInternetSales.tmdl file.

Now I create another measure – but now I’m working in my PBIP (Power BI Project) File.
This time it says ‘Uncommitted’. Good.

Next, I got back to Git Integration in workspace settings, disconnected, and then reconnected with exactly the same details as before. Now I get this:

After syncing my content from the workspace to Git, I now have the green lights again.

And when I check the TMDL, I can see my new measure (quantity):

Now it’s working, I am going to add a 3rd measure: ‘Discount.’

I check my workspace again and see it is not synced, but I still don’t have a sync option in the Power BI Git Integration settings:

And now have a source control button!

I select the source control button, and now I can commit my changes and some details:

And it worked. Great! My 3rd measure is showing.

I can also see a record of my commits with the notes I entered.

Well, that went pretty smoothly. The next steps will be to test my Power BI git integration further and restore the dataset to an earlier version to see how that works.

View Microsoft Power BI git integration documentation for further information

November 15, 2025

DAX Optimization – Analyzing the Query plan and storage retrieveal

Learning how to optimize DAX measures is an essential skill for improving slow-running reports. We can use DAX Studio to help us understand how our DAX measure is translated into code that can retrieve the data for us, and it gets pretty complex.

A summary of the steps for executing a Power BI report measure loaded in a report to return the data is in the table below. Power BI and tools like Tabular Editor and DAX Studio communicate with the tabular model in the Power BI service via the XMLA endpoints. DAX expressions for creating measures can be created in PBI Desktop, PBI Service, and Tabular Editor. A DAX query can be created and edited in the DAX Query tab of the PBI client and DAX Studio. The human-readable internal query plan and xmsQL can be viewed in DAX Studio. Direct queries SQL can be created in the PBI desktop client or connected directly to tables or views in an SQL database. SQL Server profile can be used to trace commands sent to the tabular model and every XMLA query the engine generates behind the scenes.

Using DAX Studio, we can analyse the query plans made by the Formula Engine (FE) to look for issues that may slow the query. Using the server timings tab, we can look at the xmSQL steps to understand where the time is spent by the storage engine (SE) in data retrieval.

Data Retrieval Steps – knowledge to Optimize DAX Measures

Step	Processor	Code	Description
1. Report visual with DAX Measure loads the DAX Expression	Power BI Report	DAX Expression (Measure)	A visual containing a DAX measure triggers a data request when it loads or refreshes. The measures DAX expression is stored in the model metadata.
2. The DAX expression is converted to a DAX Query	Power BI Client / Query Generator	DAX Query	DAX Query
3. DAX Query sent via XMLA to Tabular engine	XMLA Protocol Layer	XMLA Command	The client sends the DAX query inside XMLA statement to the Analysis Services (Tabular) Engine.
4. Formula engine parses and builds the query plan	Formula Engine (in Tabular engine)	Internal query pan (can view in DAX studio, but code is C++ internal)	The FE parses, validates, and creates an internal query plan (DAX query execution plan)
5. The Formula engine requests data retrieval from the storage engine	Storage engine (Vertipaq Storage Engine or Direct Query Storage Engine (e.g., SQL Server)	SE executes compressed scans, filter, and group operations, and returns the results to the FE	FE delegates data scanning and aggregations to the storage engine (import, DirectQuery or composite)
6. The storage engine executes and returns results to FE	Vertipaq Database or external SQL database, e.g., SQL Server	xmSQL (Vertipq) / SQL query (DirectQuery)	SE executes columnar scans, filter, and group operations, and returns the results to the FE
7. Formula Engine performs any final calculations	Formula Engine (FE)	Tabular result (via XMLA response)	The FE completes all non-foldable DAX logic (e.g., iterators, context transactions, SUMX, FILTER) using results returned from the SE.
8. Final result returned and rendered in the visual	Power BI Client	Tabular result (via XMLA reponse)	Tabular result (via XMLA response)

Test 1: A Simple DAX measure

DAX Expression (Created in Power BI Desktop)
Fare Amount = SUM(tlc_green_trips_2016[fare_amount])

DAX Query (DAX Studio)

DEFINE
—- MODEL MEASURES BEGIN —-
MEASURE _Measures[Fare Amount] = SUM(tlc_green_trips_2016[fare_amount])
—- MODEL MEASURES END —

DAX Query that returns Result (one row, one column)

EVALUATE
{ [Fare Amount] }

DAX Query that returns a result for a table /visual

EVALUATE
SUMMARIZECOLUMNS(
DimZone[Zone_id],
“Fare”, [Fare Amount]
)

Query Plan (DAX Studio)
The query plan is created by the Formula engine. We can use the query plan to help us optimize DAX measures.
There are 2 plans:

1. Logical Query Plan – What needs to be computed. Conceptual level – FE only – DAX operations: Calculate, Filter, Sum

2. Physical Query Plan – How to compute (created by FE and SE cooperation) – Internal operations, e.g, Scan_vertipaq, SPool, GroupBy, CrossApply)

Download DAX Studio here

Things to look out for in the query plan are as follows:
1. Iterator nodes (e.g., FilterIterator, SumXIterator) that are slow, row-by-row calculations (FE bound).
2. CallbackDataID: Commands by SE back to FE (which can cause a major slowdown.
3. NonEmpty or CrossJoin nodes on large tables: Cartesian joins are dangerous and can slow large models.

Good things are:
1. Scan_Vertipaq and Sum_Vertipaq show aggregation done in SE
2. GroupBy and AggregationSpool show grouping done by SE
3. TREATAS and CALCULATETABLE – relationship filtering applied efficiently.

So we see in the above example the use of Scan_Vertipaq and Sum_Vertipaq, CALCULATETABLE, and TREATAS
And we can’t see any of the bad points that could slow down the query.

xmSQL (DAX Studio (Server Timings)

Inspecting xmSQL in DAX Studio to Optimize DAX Measures

The storage engine activity is logged in the server timings tab of DAX Studio. The request is broken down into 2 steps plus the metrics:

The steps are as follows:

1. Scan –

SET DC_KIND=”AUTO”; — how to set the data cache
SELECT
‘DimZone'[Zone_id] —Selects the columns to group by for the next step to use as keys.
FROM ‘DimZone’;

Estimated size: rows = 263 bytes = 2,104

2. Then the next step does the work. It uses the results from the first scan in the WHERE clause.

Note the N in N’60’ is NVARCHAR.
WHERE (NOT (([a0] IS NULL))) — allows for entries not in the zoneid list in the first scan.
The SELECT is limited to return 1M rows as default safety in xmSQL..

Note: Grouping and SUM are done in the SE, not the FE – so this is good as SE is faster than the FE (which is single threaded)
The relationship (Zone_ID) is correctly folded as seen in the IN(…)

SELECT TOP (1000001) *
FROM (
SELECT [t0].[pickup_location_id],
SUM([t0].[fare_amount]) AS [a0]
FROM [tlc_green_trips_2016] AS [t0]
WHERE (
(
[t0].[pickup_location_id] IN (
N’60’,
N’27’,
N’202′,
N’185′,
N’101′,
N’258′,
…….
N’236′,
N’44’
)
)
OR ([t0].[pickup_location_id] IS NULL)
)
GROUP BY [t0].[pickup_location_id]
) AS [MainTable]
WHERE (NOT (([a0] IS NULL)))

So to summarise, the Vertipaq query looks good. The heavy work of filtering and grouping is done by the storage engine, so the query is folding correctly and being passed back to the FE to do the work. Understanding these steps is essential in order to optimize DAX measures.

Here is another post on analysing Analyzing DAX Server Timings in DAX Studio

October 24, 2025

Multiple parameter selections using Invoke Custom Function.
Making use of a parameter and ‘invoke custom function’ allows you to control the filtering of a Power BI report data source without editing the Power BI file. This method can offer filtering multiple selections at the source and offers end users the option of controlling this filtering from a spreadsheet without the need to call the developer.

Setting up Power BI using Invoke Custom Function
1. We will filter on a field called ‘type’, which is used in the v_dim_customer table. Here, we create a parameter called ‘type’ as a text field and give it a current value of ‘Person.’
2. Create a native database query to the SQL Server database, which includes the parameter in the WHERE clause of a SQL statement. Here we create a table from our V_DimCustomer table from the ContosoRetailDW.
```
let
    // Connect to the database
    Source = Sql.Database("localhost", "ContosoRetailDW"),

    // SQL query with a parameter placeholder
    SqlQuery = "SELECT CustomerKey, FirstName, DateFirstPurchase, CustomerType FROM V_DimCustomer WHERE CustomerType = @Type",

   
    Result = Value.NativeQuery(Source, SqlQuery, [Type = Type])
in
    Result
```
3. Create a sheet for importing the options. In a prod environment, you could host this on OneDrive Business, Google Sheets, or SharePoint and import it into the model, so it will refresh in the cloud. I’ve called it ‘ControlTable’ in this example. You could also create a manual table if you want.

4. In the main native SQL query – right click on the table and select ‘Function’. Give the function a name, and note that it selects the parameter used in the query automatically.

5. Now select the control table and on the ‘Add Columns’ Tab, select ‘Invoke Custom Function’, give it a name, and select the control function created earlier.

6. Now, on the new columns created for the control function right right-click the 2 arrows and select the desired columns. This will then create a new output using the fields in the Control table to filter the output of the original query, but based on the fields in the spreadsheet, and will therefore update on refresh.

7. Filter the table to check the selections.

8. Now to test if it all works. In the spreadsheet, change the selection; in this example, select Company only and refresh the data.

Your data should now be filtered by company.

Also of note is that when the custom function table is created, the original query table disappears, but it took a while for Power BI to tidy it up.

For more information on Invoke Custom Function
October 18, 2025

How to create Multiple DAX Measures

There are different ways we can build multiple DAX measures at once in Power BI. This can save time and can be useful if you have a good understanding of what you require. Creating measures in Power BI Desktop individually is a slow process, but there are alternatives that are good to learn.

Creating multiple DAX measures with DAX Query

The first method is to create a script in DAX query:

The tables that I am using are below:

AI gives us the code we need to create 10 measures for the Reseller table.
It creates a nice DAX query that we can use to add measures.

-- Define 10 foundational measures for AdventureWorksDW Reseller model
DEFINE
    MEASURE FactResellerSales[Sales] =
        SUM ( FactResellerSales[SalesAmount] )

    MEASURE FactResellerSales[Total Cost] =
        SUM ( FactResellerSales[TotalProductCost] )

    MEASURE FactResellerSales[Gross Profit] =
        [Sales] - [Total Cost]

    MEASURE FactResellerSales[Gross Margin %] =
        DIVIDE ( [Gross Profit], [Sales] )

    MEASURE FactResellerSales[Order Count] =
        DISTINCTCOUNT ( FactResellerSales[SalesOrderNumber] )

    MEASURE FactResellerSales[Units Sold] =
        SUM ( FactResellerSales[OrderQuantity] )

    MEASURE FactResellerSales[Avg Order Value] =
        DIVIDE ( [Sales], [Order Count] )

    MEASURE FactResellerSales[Sales YTD] =
        TOTALYTD ( [Sales], 'DimDate'[FullDateAlternateKey] )

    MEASURE FactResellerSales[Sales PY] =
        CALCULATE ( [Sales], DATEADD ( 'DimDate'[FullDateAlternateKey], -1, YEAR ) )

    MEASURE FactResellerSales[Sales YoY %] =
        DIVIDE ( [Sales] - [Sales PY], [Sales PY] )

-- Example result set to validate measures (optional)
EVALUATE
SUMMARIZECOLUMNS(
    'DimDate'[CalendarYear],
    'DimProductCategory'[EnglishProductCategoryName],
    'DimProductSubcategory'[EnglishProductSubcategoryName],
    "Sales", [Sales],
    "Gross Profit", [Gross Profit],
    "GM %", [Gross Margin %],
    "Orders", [Order Count],
    "Units", [Units Sold],
    "AOV", [Avg Order Value],
    "Sales YTD", [Sales YTD],
    "Sales PY", [Sales PY],
    "YoY %", [Sales YoY %]
)
ORDER BY 'DimDate'[CalendarYear], 'DimProductCategory'[EnglishProductCategoryName], 'DimProductSubcategory'[EnglishProductSubcategoryName]

-- Define 10 foundational measures for AdventureWorksDW Reseller model
DEFINE
    MEASURE FactResellerSales[Sales] =
        SUM ( FactResellerSales[SalesAmount] )

    MEASURE FactResellerSales[Total Cost] =
        SUM ( FactResellerSales[TotalProductCost] )

    MEASURE FactResellerSales[Gross Profit] =
        [Sales] - [Total Cost]

    MEASURE FactResellerSales[Gross Margin %] =
        DIVIDE ( [Gross Profit], [Sales] )

    MEASURE FactResellerSales[Order Count] =
        DISTINCTCOUNT ( FactResellerSales[SalesOrderNumber] )

    MEASURE FactResellerSales[Units Sold] =
        SUM ( FactResellerSales[OrderQuantity] )

    MEASURE FactResellerSales[Avg Order Value] =
        DIVIDE ( [Sales], [Order Count] )

    MEASURE FactResellerSales[Sales YTD] =
        TOTALYTD ( [Sales], 'DimDate'[FullDateAlternateKey] )

    MEASURE FactResellerSales[Sales PY] =
        CALCULATE ( [Sales], DATEADD ( 'DimDate'[FullDateAlternateKey], -1, YEAR ) )

    MEASURE FactResellerSales[Sales YoY %] =
        DIVIDE ( [Sales] - [Sales PY], [Sales PY] )

-- Example result set to validate measures (optional)
EVALUATE
SUMMARIZECOLUMNS(
    'DimDate'[CalendarYear],
    'DimProductCategory'[EnglishProductCategoryName],
    'DimProductSubcategory'[EnglishProductSubcategoryName],
    "Sales", [Sales],
    "Gross Profit", [Gross Profit],
    "GM %", [Gross Margin %],
    "Orders", [Order Count],
    "Units", [Units Sold],
    "AOV", [Avg Order Value],
    "Sales YTD", [Sales YTD],
    "Sales PY", [Sales PY],
    "YoY %", [Sales YoY %]
)
ORDER BY 'DimDate'[CalendarYear], 'DimProductCategory'[EnglishProductCategoryName], 'DimProductSubcategory'[EnglishProductSubcategoryName]

I can then add them to my model by clicking on each measure: ‘Update model: Add measure’. So it does take a little effort.

Creating multiple DAX measures with DAX query

My measures are created, but disorganised, so I’d like them in a folder without us having to do the work. This is where the tabular editor comes in.

Creating multiple DAX measures with Tabular Editor C# Script

We can use AI to give us a C# script that we can run in the Tabular editor (you need to enable unsupported features in File > Preferences), for it to run, but it does run. Note, I’m using the free tabular editor 2 here.

// AdventureWorksDW 2022 - Starter measures (create or update)
// Target table: FactResellerSales
// Folders: Measures\Sales Performance, \Customer Insights, \Product Performance, \Time Intelligence

var t = Model.Tables["FactResellerSales"];
if (t == null) throw new Exception("Table 'FactResellerSales' not found.");

// create-or-update helper
int created = 0, updated = 0;
System.Action<string,string,string,string> Make = (name, expr, folder, fmt) =>
{
    Measure m = null;
    foreach (var mm in t.Measures) { if (mm.Name == name) { m = mm; break; } }
    if (m == null) { m = t.AddMeasure(name, expr); created++; } else { m.Expression = expr; updated++; }
    if (!string.IsNullOrWhiteSpace(folder)) m.DisplayFolder = folder;
    if (!string.IsNullOrWhiteSpace(fmt))    m.FormatString  = fmt;
};

// folder constants
var F_Sales = "Measures\\Sales Performance";
var F_Cust  = "Measures\\Customer Insights";
var F_Prod  = "Measures\\Product Performance";
var F_Time  = "Measures\\Time Intelligence";

// SALES PERFORMANCE
Make("Sales",               "SUM(FactResellerSales[SalesAmount])",                F_Sales, "#,0");
Make("Total Cost",          "SUM(FactResellerSales[TotalProductCost])",           F_Sales, "#,0");
Make("Gross Profit",        "[Sales] - [Total Cost]",                             F_Sales, "#,0");
Make("Gross Margin %",      "DIVIDE([Gross Profit],[Sales])",                     F_Sales, "0.00%");
Make("Order Count",         "DISTINCTCOUNT(FactResellerSales[SalesOrderNumber])", F_Sales, "#,0");
Make("Units Sold",          "SUM(FactResellerSales[OrderQuantity])",              F_Sales, "#,0");
Make("Avg Order Value",     "DIVIDE([Sales],[Order Count])",                      F_Sales, "#,0.00");
Make("Avg Unit Price",      "DIVIDE([Sales],[Units Sold])",                       F_Sales, "#,0.00");
Make("Avg Unit Cost",       "DIVIDE([Total Cost],[Units Sold])",                  F_Sales, "#,0.00");
Make("Profit per Unit",     "DIVIDE([Gross Profit],[Units Sold])",                F_Sales, "#,0.00");
Make("Profit per Order",    "DIVIDE([Gross Profit],[Order Count])",               F_Sales, "#,0.00");

// CUSTOMER INSIGHTS
Make("Distinct Customers",  "DISTINCTCOUNT(FactResellerSales[CustomerKey])",      F_Cust,  "#,0");
Make("Sales per Customer",  "DIVIDE([Sales],[Distinct Customers])",               F_Cust,  "#,0.00");
Make("Orders per Customer", "DIVIDE([Order Count],[Distinct Customers])",         F_Cust,  "#,0.00");
Make("Units per Customer",  "DIVIDE([Units Sold],[Distinct Customers])",          F_Cust,  "#,0.00");
Make("Profit per Customer", "DIVIDE([Gross Profit],[Distinct Customers])",        F_Cust,  "#,0.00");

// PRODUCT PERFORMANCE
Make("Distinct Products",   "DISTINCTCOUNT(FactResellerSales[ProductKey])",       F_Prod,  "#,0");
Make("Sales per Product",   "DIVIDE([Sales],[Distinct Products])",                F_Prod,  "#,0.00");
Make("Profit per Product",  "DIVIDE([Gross Profit],[Distinct Products])",         F_Prod,  "#,0.00");
Make("Sales Category Share","DIVIDE([Sales], CALCULATE([Sales], ALL('DimProductCategory')))",          F_Prod, "0.00%");
Make("Profit Category Share","DIVIDE([Gross Profit], CALCULATE([Gross Profit], ALL('DimProductCategory')))", F_Prod, "0.00%");

// TIME INTELLIGENCE
// Make sure DimDate is marked as Date table and FullDateAlternateKey is a Date type
var dateCol = "'DimDate'[FullDateAlternateKey]";
Make("Sales YTD",          "TOTALYTD([Sales], " + dateCol + ")",                  F_Time, "#,0");
Make("Profit YTD",         "TOTALYTD([Gross Profit], " + dateCol + ")",           F_Time, "#,0");
Make("Sales PY",           "CALCULATE([Sales], DATEADD(" + dateCol + ", -1, YEAR))",        F_Time, "#,0");
Make("Profit PY",          "CALCULATE([Gross Profit], DATEADD(" + dateCol + ", -1, YEAR))", F_Time, "#,0");
Make("Sales YoY %",        "DIVIDE([Sales]-[Sales PY],[Sales PY])",               F_Time, "0.00%");
Make("Profit YoY %",       "DIVIDE([Gross Profit]-[Profit PY],[Profit PY])",      F_Time, "0.00%");
Make("Sales MTD",          "TOTALMTD([Sales], " + dateCol + ")",                  F_Time, "#,0");
Make("Profit MTD",         "TOTALMTD([Gross Profit], " + dateCol + ")",           F_Time, "#,0");
Make("Sales QTD",          "TOTALQTD([Sales], " + dateCol + ")",                  F_Time, "#,0");
Make("Profit QTD",         "TOTALQTD([Gross Profit], " + dateCol + ")",           F_Time, "#,0");
Make("Sales Rolling 12M",  "CALCULATE([Sales], DATESINPERIOD(" + dateCol + ", MAX(" + dateCol + "), -12, MONTH))",         F_Time, "#,0");
Make("Profit Rolling 12M", "CALCULATE([Gross Profit], DATESINPERIOD(" + dateCol + ", MAX(" + dateCol + "), -12, MONTH))",  F_Time, "#,0");

// summary to Output window
Console.WriteLine("Created: " + created + ", Updated: " + updated);

// AdventureWorksDW 2022 - Starter measures (create or update)
// Target table: FactResellerSales
// Folders: Measures\Sales Performance, \Customer Insights, \Product Performance, \Time Intelligence

var t = Model.Tables["FactResellerSales"];
if (t == null) throw new Exception("Table 'FactResellerSales' not found.");

// create-or-update helper
int created = 0, updated = 0;
System.Action<string,string,string,string> Make = (name, expr, folder, fmt) =>
{
    Measure m = null;
    foreach (var mm in t.Measures) { if (mm.Name == name) { m = mm; break; } }
    if (m == null) { m = t.AddMeasure(name, expr); created++; } else { m.Expression = expr; updated++; }
    if (!string.IsNullOrWhiteSpace(folder)) m.DisplayFolder = folder;
    if (!string.IsNullOrWhiteSpace(fmt))    m.FormatString  = fmt;
};

// folder constants
var F_Sales = "Measures\\Sales Performance";
var F_Cust  = "Measures\\Customer Insights";
var F_Prod  = "Measures\\Product Performance";
var F_Time  = "Measures\\Time Intelligence";

// SALES PERFORMANCE
Make("Sales",               "SUM(FactResellerSales[SalesAmount])",                F_Sales, "#,0");
Make("Total Cost",          "SUM(FactResellerSales[TotalProductCost])",           F_Sales, "#,0");
Make("Gross Profit",        "[Sales] - [Total Cost]",                             F_Sales, "#,0");
Make("Gross Margin %",      "DIVIDE([Gross Profit],[Sales])",                     F_Sales, "0.00%");
Make("Order Count",         "DISTINCTCOUNT(FactResellerSales[SalesOrderNumber])", F_Sales, "#,0");
Make("Units Sold",          "SUM(FactResellerSales[OrderQuantity])",              F_Sales, "#,0");
Make("Avg Order Value",     "DIVIDE([Sales],[Order Count])",                      F_Sales, "#,0.00");
Make("Avg Unit Price",      "DIVIDE([Sales],[Units Sold])",                       F_Sales, "#,0.00");
Make("Avg Unit Cost",       "DIVIDE([Total Cost],[Units Sold])",                  F_Sales, "#,0.00");
Make("Profit per Unit",     "DIVIDE([Gross Profit],[Units Sold])",                F_Sales, "#,0.00");
Make("Profit per Order",    "DIVIDE([Gross Profit],[Order Count])",               F_Sales, "#,0.00");

// CUSTOMER INSIGHTS
Make("Distinct Customers",  "DISTINCTCOUNT(FactResellerSales[CustomerKey])",      F_Cust,  "#,0");
Make("Sales per Customer",  "DIVIDE([Sales],[Distinct Customers])",               F_Cust,  "#,0.00");
Make("Orders per Customer", "DIVIDE([Order Count],[Distinct Customers])",         F_Cust,  "#,0.00");
Make("Units per Customer",  "DIVIDE([Units Sold],[Distinct Customers])",          F_Cust,  "#,0.00");
Make("Profit per Customer", "DIVIDE([Gross Profit],[Distinct Customers])",        F_Cust,  "#,0.00");

// PRODUCT PERFORMANCE
Make("Distinct Products",   "DISTINCTCOUNT(FactResellerSales[ProductKey])",       F_Prod,  "#,0");
Make("Sales per Product",   "DIVIDE([Sales],[Distinct Products])",                F_Prod,  "#,0.00");
Make("Profit per Product",  "DIVIDE([Gross Profit],[Distinct Products])",         F_Prod,  "#,0.00");
Make("Sales Category Share","DIVIDE([Sales], CALCULATE([Sales], ALL('DimProductCategory')))",          F_Prod, "0.00%");
Make("Profit Category Share","DIVIDE([Gross Profit], CALCULATE([Gross Profit], ALL('DimProductCategory')))", F_Prod, "0.00%");

// TIME INTELLIGENCE
// Make sure DimDate is marked as Date table and FullDateAlternateKey is a Date type
var dateCol = "'DimDate'[FullDateAlternateKey]";
Make("Sales YTD",          "TOTALYTD([Sales], " + dateCol + ")",                  F_Time, "#,0");
Make("Profit YTD",         "TOTALYTD([Gross Profit], " + dateCol + ")",           F_Time, "#,0");
Make("Sales PY",           "CALCULATE([Sales], DATEADD(" + dateCol + ", -1, YEAR))",        F_Time, "#,0");
Make("Profit PY",          "CALCULATE([Gross Profit], DATEADD(" + dateCol + ", -1, YEAR))", F_Time, "#,0");
Make("Sales YoY %",        "DIVIDE([Sales]-[Sales PY],[Sales PY])",               F_Time, "0.00%");
Make("Profit YoY %",       "DIVIDE([Gross Profit]-[Profit PY],[Profit PY])",      F_Time, "0.00%");
Make("Sales MTD",          "TOTALMTD([Sales], " + dateCol + ")",                  F_Time, "#,0");
Make("Profit MTD",         "TOTALMTD([Gross Profit], " + dateCol + ")",           F_Time, "#,0");
Make("Sales QTD",          "TOTALQTD([Sales], " + dateCol + ")",                  F_Time, "#,0");
Make("Profit QTD",         "TOTALQTD([Gross Profit], " + dateCol + ")",           F_Time, "#,0");
Make("Sales Rolling 12M",  "CALCULATE([Sales], DATESINPERIOD(" + dateCol + ", MAX(" + dateCol + "), -12, MONTH))",         F_Time, "#,0");
Make("Profit Rolling 12M", "CALCULATE([Gross Profit], DATESINPERIOD(" + dateCol + ", MAX(" + dateCol + "), -12, MONTH))",  F_Time, "#,0");

// summary to Output window
Console.WriteLine("Created: " + created + ", Updated: " + updated);

This is better; they are neatly organised in folders and have formatting applied. All I need to do now is review my multiple DAX measures and save them. The only thing that didn’t work was the formatting.

Now that it worked, we’ll ask for some more measures:

// -----------------------------------------------------------------------------
// AdventureWorksDW 2022 – Extended Measure Library
// Creates ±35 measures for FactResellerSales with proper folders & formatting.
// -----------------------------------------------------------------------------

var table = Model.Tables["FactResellerSales"];
if (table == null) throw new Exception("Table 'FactResellerSales' not found.");

// helper delegate
System.Action<string,string,string> Make = (name, expr, folderPath) =>
{
    var m = table.AddMeasure(name, expr);
    m.DisplayFolder = folderPath;
};

// -----------------------------------------------------------------------------
// SALES PERFORMANCE
// -----------------------------------------------------------------------------
Make("Sales",            "SUM(FactResellerSales[SalesAmount])",              "Measures\\Sales Performance");
Make("Total Cost",       "SUM(FactResellerSales[TotalProductCost])",         "Measures\\Sales Performance");
Make("Gross Profit",     "[Sales] - [Total Cost]",                            "Measures\\Sales Performance");
Make("Gross Margin %",   "DIVIDE([Gross Profit],[Sales])",                    "Measures\\Sales Performance");
Make("Order Count",      "DISTINCTCOUNT(FactResellerSales[SalesOrderNumber])","Measures\\Sales Performance");
Make("Units Sold",       "SUM(FactResellerSales[OrderQuantity])",             "Measures\\Sales Performance");
Make("Avg Order Value",  "DIVIDE([Sales],[Order Count])",                     "Measures\\Sales Performance");
Make("Avg Unit Price",   "DIVIDE([Sales],[Units Sold])",                      "Measures\\Sales Performance");
Make("Avg Unit Cost",    "DIVIDE([Total Cost],[Units Sold])",                 "Measures\\Sales Performance");
Make("Profit per Unit",  "DIVIDE([Gross Profit],[Units Sold])",               "Measures\\Sales Performance");
Make("Profit per Order", "DIVIDE([Gross Profit],[Order Count])",              "Measures\\Sales Performance");

// -----------------------------------------------------------------------------
// CUSTOMER INSIGHTS
// -----------------------------------------------------------------------------
Make("Distinct Customers", "DISTINCTCOUNT(FactResellerSales[CustomerKey])",  "Measures\\Customer Insights");
Make("Sales per Customer", "DIVIDE([Sales],[Distinct Customers])",           "Measures\\Customer Insights");
Make("Orders per Customer","DIVIDE([Order Count],[Distinct Customers])",     "Measures\\Customer Insights");
Make("Units per Customer", "DIVIDE([Units Sold],[Distinct Customers])",      "Measures\\Customer Insights");
Make("Profit per Customer","DIVIDE([Gross Profit],[Distinct Customers])",    "Measures\\Customer Insights");

// -----------------------------------------------------------------------------
// PRODUCT PERFORMANCE
// -----------------------------------------------------------------------------
Make("Distinct Products",   "DISTINCTCOUNT(FactResellerSales[ProductKey])",  "Measures\\Product Performance");
Make("Sales per Product",   "DIVIDE([Sales],[Distinct Products])",           "Measures\\Product Performance");
Make("Profit per Product",  "DIVIDE([Gross Profit],[Distinct Products])",    "Measures\\Product Performance");
Make("Top Product Sales",   "TOPN(1, VALUES('DimProductSubcategory'[EnglishProductSubcategoryName]), [Sales])", "Measures\\Product Performance");
Make("Sales Category Share","DIVIDE([Sales], CALCULATE([Sales], ALL('DimProductCategory')))", "Measures\\Product Performance");
Make("Profit Category Share","DIVIDE([Gross Profit], CALCULATE([Gross Profit], ALL('DimProductCategory')))", "Measures\\Product Performance");

// -----------------------------------------------------------------------------
// TIME INTELLIGENCE  (requires DimDate marked as Date Table)
// -----------------------------------------------------------------------------
Make("Sales YTD",        "TOTALYTD([Sales],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Profit YTD",       "TOTALYTD([Gross Profit],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Sales PY",         "CALCULATE([Sales],DATEADD('DimDate'[FullDateAlternateKey],-1,YEAR))", "Measures\\Time Intelligence");
Make("Profit PY",        "CALCULATE([Gross Profit],DATEADD('DimDate'[FullDateAlternateKey],-1,YEAR))", "Measures\\Time Intelligence");
Make("Sales YoY %",      "DIVIDE([Sales]-[Sales PY],[Sales PY])", "Measures\\Time Intelligence");
Make("Profit YoY %",     "DIVIDE([Gross Profit]-[Profit PY],[Profit PY])", "Measures\\Time Intelligence");
Make("Sales MTD",        "TOTALMTD([Sales],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Profit MTD",       "TOTALMTD([Gross Profit],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Sales QTD",        "TOTALQTD([Sales],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Profit QTD",       "TOTALQTD([Gross Profit],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Sales Rolling 12M","CALCULATE([Sales],DATESINPERIOD('DimDate'[FullDateAlternateKey],MAX('DimDate'[FullDateAlternateKey]),-12,MONTH))","Measures\\Time Intelligence");
Make("Profit Rolling 12M","CALCULATE([Gross Profit],DATESINPERIOD('DimDate'[FullDateAlternateKey],MAX('DimDate'[FullDateAlternateKey]),-12,MONTH))","Measures\\Time Intelligence");

// -----------------------------------------------------------------------------
// AdventureWorksDW 2022 – Extended Measure Library
// Creates ±35 measures for FactResellerSales with proper folders & formatting.
// -----------------------------------------------------------------------------

var table = Model.Tables["FactResellerSales"];
if (table == null) throw new Exception("Table 'FactResellerSales' not found.");

// helper delegate
System.Action<string,string,string> Make = (name, expr, folderPath) =>
{
    var m = table.AddMeasure(name, expr);
    m.DisplayFolder = folderPath;
};

// -----------------------------------------------------------------------------
// SALES PERFORMANCE
// -----------------------------------------------------------------------------
Make("Sales",            "SUM(FactResellerSales[SalesAmount])",              "Measures\\Sales Performance");
Make("Total Cost",       "SUM(FactResellerSales[TotalProductCost])",         "Measures\\Sales Performance");
Make("Gross Profit",     "[Sales] - [Total Cost]",                            "Measures\\Sales Performance");
Make("Gross Margin %",   "DIVIDE([Gross Profit],[Sales])",                    "Measures\\Sales Performance");
Make("Order Count",      "DISTINCTCOUNT(FactResellerSales[SalesOrderNumber])","Measures\\Sales Performance");
Make("Units Sold",       "SUM(FactResellerSales[OrderQuantity])",             "Measures\\Sales Performance");
Make("Avg Order Value",  "DIVIDE([Sales],[Order Count])",                     "Measures\\Sales Performance");
Make("Avg Unit Price",   "DIVIDE([Sales],[Units Sold])",                      "Measures\\Sales Performance");
Make("Avg Unit Cost",    "DIVIDE([Total Cost],[Units Sold])",                 "Measures\\Sales Performance");
Make("Profit per Unit",  "DIVIDE([Gross Profit],[Units Sold])",               "Measures\\Sales Performance");
Make("Profit per Order", "DIVIDE([Gross Profit],[Order Count])",              "Measures\\Sales Performance");

// -----------------------------------------------------------------------------
// CUSTOMER INSIGHTS
// -----------------------------------------------------------------------------
Make("Distinct Customers", "DISTINCTCOUNT(FactResellerSales[CustomerKey])",  "Measures\\Customer Insights");
Make("Sales per Customer", "DIVIDE([Sales],[Distinct Customers])",           "Measures\\Customer Insights");
Make("Orders per Customer","DIVIDE([Order Count],[Distinct Customers])",     "Measures\\Customer Insights");
Make("Units per Customer", "DIVIDE([Units Sold],[Distinct Customers])",      "Measures\\Customer Insights");
Make("Profit per Customer","DIVIDE([Gross Profit],[Distinct Customers])",    "Measures\\Customer Insights");

// -----------------------------------------------------------------------------
// PRODUCT PERFORMANCE
// -----------------------------------------------------------------------------
Make("Distinct Products",   "DISTINCTCOUNT(FactResellerSales[ProductKey])",  "Measures\\Product Performance");
Make("Sales per Product",   "DIVIDE([Sales],[Distinct Products])",           "Measures\\Product Performance");
Make("Profit per Product",  "DIVIDE([Gross Profit],[Distinct Products])",    "Measures\\Product Performance");
Make("Top Product Sales",   "TOPN(1, VALUES('DimProductSubcategory'[EnglishProductSubcategoryName]), [Sales])", "Measures\\Product Performance");
Make("Sales Category Share","DIVIDE([Sales], CALCULATE([Sales], ALL('DimProductCategory')))", "Measures\\Product Performance");
Make("Profit Category Share","DIVIDE([Gross Profit], CALCULATE([Gross Profit], ALL('DimProductCategory')))", "Measures\\Product Performance");

// -----------------------------------------------------------------------------
// TIME INTELLIGENCE  (requires DimDate marked as Date Table)
// -----------------------------------------------------------------------------
Make("Sales YTD",        "TOTALYTD([Sales],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Profit YTD",       "TOTALYTD([Gross Profit],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Sales PY",         "CALCULATE([Sales],DATEADD('DimDate'[FullDateAlternateKey],-1,YEAR))", "Measures\\Time Intelligence");
Make("Profit PY",        "CALCULATE([Gross Profit],DATEADD('DimDate'[FullDateAlternateKey],-1,YEAR))", "Measures\\Time Intelligence");
Make("Sales YoY %",      "DIVIDE([Sales]-[Sales PY],[Sales PY])", "Measures\\Time Intelligence");
Make("Profit YoY %",     "DIVIDE([Gross Profit]-[Profit PY],[Profit PY])", "Measures\\Time Intelligence");
Make("Sales MTD",        "TOTALMTD([Sales],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Profit MTD",       "TOTALMTD([Gross Profit],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Sales QTD",        "TOTALQTD([Sales],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Profit QTD",       "TOTALQTD([Gross Profit],'DimDate'[FullDateAlternateKey])", "Measures\\Time Intelligence");
Make("Sales Rolling 12M","CALCULATE([Sales],DATESINPERIOD('DimDate'[FullDateAlternateKey],MAX('DimDate'[FullDateAlternateKey]),-12,MONTH))","Measures\\Time Intelligence");
Make("Profit Rolling 12M","CALCULATE([Gross Profit],DATESINPERIOD('DimDate'[FullDateAlternateKey],MAX('DimDate'[FullDateAlternateKey]),-12,MONTH))","Measures\\Time Intelligence");

Now we have 4 sets of measures neatly placed in 4 folders.

Now I can review the numbers.

Creating Multiple DAX Measures with TMDL

TMDL is Microsoft’s YAML-based representation of a tabular model, which is fully editable, meaning you can define:

1. Tables
2. Columns
3. Measures
4. Relationships
5. Calculation groups

You can use it in the Tabular editor and Power BI Desktop, and it can be used to create multiple DAX measures in one go.

You can learn more about TMDL on the Microsoft site

Right-click on the Reseller table and select script TMDL to Script tab:

Here we can add multiple DAX measures as follows, before the partition block.
Here, we can define the expression, the format string, and the display folder, which will live in the FactReseller table.
And it worked nicely.

For more DAX measure guidance, don’t forget to check out the DAX functions reference

October 10, 2025

Analyzing DAX Server Timings in DAX Studio

Improving DAX measure speed. We can take a measure like the one customer order measure, add it to a card visual, and use the performance analyser to give us the DAX query, which we can paste into the performance analyser. However, we may see different results if we use it in a matrix, so we will test that too.

The original DAX measure is as follows, and the test measures are found here, which use different functions to create the virtual tables used in the methods. The first measure is below, and we will look at the output of the server timings in more detail, using AI to help us analyse the output.

You can download DAX Studio here

DAX measure Code example to be used for improving DAX measure speed

CustomersWithOneOrder = 
VAR _CustomerOrders = SUMMARIZE('FactInternetSales', -- Table 
                            'FactInternetSales'[CustomerKey], -- Groupby  
                             "Orders", DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber]) 
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1)

RETURN
COUNTROWS(_OneOrder)

CustomersWithOneOrder = 
VAR _CustomerOrders = SUMMARIZE('FactInternetSales', -- Table 
                            'FactInternetSales'[CustomerKey], -- Groupby  
                             "Orders", DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber]) 
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1)

RETURN
COUNTROWS(_OneOrder)

The DAX query created by the Formula Engine (copied from the performance analyser) is as follows for a card visual (total only):

// DAX Query
EVALUATE
    ROW(
    "CustomersWithOneOrder", '_Measures'[CustomersWithOneOrder]
)

// DAX Query
EVALUATE
    ROW(
    "CustomersWithOneOrder", '_Measures'[CustomersWithOneOrder]
)

In DAX Studio, we can paste in the code, clear the cache first, select server time, and all the options to show us each step of the query plan

The first part of the server timings shows us that the query took 12 ms in the Formula engine (FE) and 5 ms in the storage engine (SE) a total of 12 ms to run. In addition, we can see that a total of 2 SE queries were parsed to the Vertipaq engine.

This SQL-like language is called xmlSQL and here are the steps that we can see in DAX Studio.

-- Line 1, Subclass Internal, Duration 2 rows 0

SET DC_KIND="GEC32";   -- Selecting distinct count strategy DC_KIND
SELECT
    'FactInternetSales'[CustomerKey],
    'FactInternetSales'[SalesOrderNumber]
FROM 'FactInternetSales';

-----------------------------------
Line 2, Subclass: Internal,  Duration 3 rows 27,659   --- builds a datacache projects of the 2 columns it needs

SET DC_KIND="GEC32";
SELECT
    'FactInternetSales'[CustomerKey],
    'FactInternetSales'[SalesOrderNumber]
FROM 'FactInternetSales';

Estimated size: rows = 27,659  bytes = 110,636
---------------------------------
Line 3, Subclass: Internal,  Duration 3, rows 0  ---use the disinct cache count from the data cached.


SET DC_KIND="C32";
SELECT
    'FactInternetSales'[CustomerKey],
    COUNT () 
FROM $DCOUNT_DATACACHE USING ( 'FactInternetSales' ) ;

----------------------------
Line 4, Subclass: Scan, Duration 4, Rows 18,484   --- this is the real storage engine scan with a DCOUNT aggregation, group by customerkey

SET DC_KIND="AUTO";
SELECT
    'FactInternetSales'[CustomerKey],
    DCOUNT ( 'FactInternetSales'[SalesOrderNumber] )
FROM 'FactInternetSales';


Estimated size: rows = 18,484  bytes = 221,808

------------------------------------------

Line 5, Subclass Internal, Duration 1, rows 0   --- auzillary count to help the optimizer (density/cardinatlity hints).

SET DC_KIND="DENSE";
SELECT
    'FactInternetSales'[CustomerKey],
    COUNT () 
FROM 'FactInternetSales';

------------------------------------------

Line 6, Subclass Scan, Duration 1, Rows 18,487  --another quick scan of customerkey likely to get the distinct key

SET DC_KIND="AUTO";
SELECT
    'FactInternetSales'[CustomerKey]
FROM 'FactInternetSales';


Estimated size: rows = 18,487  bytes = 147,896

------------------------------------------

Line 7, Duration 0 refers to the execution metrics

-- Line 1, Subclass Internal, Duration 2 rows 0

SET DC_KIND="GEC32";   -- Selecting distinct count strategy DC_KIND
SELECT
    'FactInternetSales'[CustomerKey],
    'FactInternetSales'[SalesOrderNumber]
FROM 'FactInternetSales';

-----------------------------------
Line 2, Subclass: Internal,  Duration 3 rows 27,659   --- builds a datacache projects of the 2 columns it needs

SET DC_KIND="GEC32";
SELECT
    'FactInternetSales'[CustomerKey],
    'FactInternetSales'[SalesOrderNumber]
FROM 'FactInternetSales';

Estimated size: rows = 27,659  bytes = 110,636
---------------------------------
Line 3, Subclass: Internal,  Duration 3, rows 0  ---use the disinct cache count from the data cached.


SET DC_KIND="C32";
SELECT
    'FactInternetSales'[CustomerKey],
    COUNT () 
FROM $DCOUNT_DATACACHE USING ( 'FactInternetSales' ) ;

----------------------------
Line 4, Subclass: Scan, Duration 4, Rows 18,484   --- this is the real storage engine scan with a DCOUNT aggregation, group by customerkey

SET DC_KIND="AUTO";
SELECT
    'FactInternetSales'[CustomerKey],
    DCOUNT ( 'FactInternetSales'[SalesOrderNumber] )
FROM 'FactInternetSales';


Estimated size: rows = 18,484  bytes = 221,808

------------------------------------------

Line 5, Subclass Internal, Duration 1, rows 0   --- auzillary count to help the optimizer (density/cardinatlity hints).

SET DC_KIND="DENSE";
SELECT
    'FactInternetSales'[CustomerKey],
    COUNT () 
FROM 'FactInternetSales';

------------------------------------------

Line 6, Subclass Scan, Duration 1, Rows 18,487  --another quick scan of customerkey likely to get the distinct key

SET DC_KIND="AUTO";
SELECT
    'FactInternetSales'[CustomerKey]
FROM 'FactInternetSales';


Estimated size: rows = 18,487  bytes = 147,896

------------------------------------------

Line 7, Duration 0 refers to the execution metrics

Note that the actual result is 11619, which is not shown in the server timings. The server timings show the intermediary steps.

So that was all very interesting, but what do we do with it? Well, first, we can use it to compare measures as follows:

Card Visual (Total) DAX query performance

Table function used in measures	FE	SE	Total	SE Queries	Approx. Peak Mem Consumption (Bytes)
SUMMARIZE()	12	0	13	2	3407
SELECTCOLUMS()	10	7	17	2	2379
SUMMARIZECOLUMNS()	8	10	18	1	3376
FILTER()	6	7	13	1	3357

In the first test, which was based on the DAX query total on a card visual, all the measures ran very fast, with the summarize() and filter() functions performing faster. So based on this, we might run off and decide to use one of these, but then what if we want to use them in a different visual? Will the performance be the same?

Matrix Visual DAX Query performance

Table function used in measures	FE	SE	Total	SE Queries	Approx. Peak Mem Consumption (Bytes)
Summarize()	79	250	329	5	39,884
Selectcolumns()	41	49	90	6	11,968
SummarizeColumns()	19	26	45	3	99,54
FILTER()	24	73	97	5	10,445

So, as we can see, in the DAX query taken from the matrix test, the SUMMARIZE() function performed very poorly, with the measure that used the SUMMARIZECOLUMNS performing best. As you can see, this adds a whole different level of complexity to testing.

A solution for this is to use the ISINSCOPE() function to determine the scope of how the measure is being filtered and use it to switch between 2 different measures, but of course, that makes the model more complex and is a topic for another day.

We delve more deeply into performance optimisation with the post Dax Optimisation – Analysing the query plan and storage retrieval.

October 10, 2025

Testing DISTINCTCOUNT() Speed in DAX

The DISTINCTCOUNT() function can slow down reports, but if it’s required, then what are the alternatives?
Here we test DISTINCTCOUNT DAX speed against other DAX Options.

Test 1: Adventure Works

-- Method 1: DISTINCTCOUNT()

Customers = DISTINCTCOUNT('FactInternetSales'[CustomerKey])

-- Method 2: COUNTROWS(VALUES()

Customers2 = COUNTROWS(VALUES('FactInternetSales'[CustomerKey]))

-- Method 3: SUMX()

Customers3 = SUMX(VALUES('FactInternetSales'[CustomerKey]), 1)

-- Method 1: DISTINCTCOUNT()

Customers = DISTINCTCOUNT('FactInternetSales'[CustomerKey])

-- Method 2: COUNTROWS(VALUES()

Customers2 = COUNTROWS(VALUES('FactInternetSales'[CustomerKey]))

-- Method 3: SUMX()

Customers3 = SUMX(VALUES('FactInternetSales'[CustomerKey]), 1)

The measures all give the same results, but which one is faster?

We can test these measures by opening the above matrix for all years to give us a bit more data. The results are too close to call, so we need more data to work with.

Test 2: Contoso Database:
The FactOnlineSales table in the Contoso data warehouse has over 12 million rows of data, which should give our unique count measures a better run.

We use the same measure calculations but in the new table as follows:

CTSO Customers = DISTINCTCOUNT('FactOnlineSales'[CustomerKey])

CTSO Customers2 = COUNTROWS(VALUES('FactOnlineSales'[CustomerKey]))

CTSO Customers3 = SUMX(VALUES('FactOnlineSales'[CustomerKey]), 1)

CTSO Customers = DISTINCTCOUNT('FactOnlineSales'[CustomerKey])

CTSO Customers2 = COUNTROWS(VALUES('FactOnlineSales'[CustomerKey]))

CTSO Customers3 = SUMX(VALUES('FactOnlineSales'[CustomerKey]), 1)

We can then test the speed of the measure using the performance analyzer, and DISTINCTCOUNT distinctly loses!

Why is DISTINCTCOUNT() slower in the test results?

The brains as SQL BI have a good article about DISTINCTCOUNT() vs. SUMX, and they refer to using DAX Studio to get into the details of what’s going on with the DAX query.
We can start by using the measure in a simple card visual in Power BI to get the simplest DAX to analyze.

We can copy the code from the performance analyzer into DAX Studio and select the options for getting our output.
1. Clear on run (clears the cache each time on run). Note this is session-based so it won’t clear the whole cache for you dataset.

2. Query plan that relates to the 2 engines: Formula Engine and Storage engine.

There are two query plans generated from a DAX measure:

Formula Engine plan — defines logical steps (filters, joins, iterators).
Storage Engine plan(s) — defines physical scans (which columns, which filters, aggregations).

If the DAX query is simple, it only needs to run the Formula Engine (FE) plan, but there is less information available in DAX Studio. Note that the storage engine is faster than the formula engine at retrieving results.

Engine Memory

Formula Engine	Memory Type	Description

Persistent, compressed memory (the VertiPaq model)

The actual stored dataset — physical columns and segments

Ephemeral, uncompressed memory (working memory)

Virtual tables, intermediate results, cached expressions

Engine Speed

Engine	Type of work	Speed	Threads	Data format
Storage Engine (SE)	Scanning, filtering, grouping, aggregating	⚡ Very fast	Multi-threaded	Compressed, columnar
Formula Engine (FE)	Evaluating DAX expressions, context transitions, row-by-row logic	🐢 Slower	Single-threaded	Uncompressed, row-based virtual tables

If we look at server timings, we can see there is no time for the SE, so the

For a complete guide to DAX functions, check out the DAX Function Reference

October 10, 2025

Measuring DAX Performance of DAX Table Functions

We can test the DAX performance in measures using the performance analyzer in Power BI to see which table function used in a formula performed best. Below are the results of 10 tests per measure. SUMMARIZECOLUMNS() and SELECTCOLUMNS() are neck and neck, with the old SUMMARIZE() function finishing last.

We will check out the results first and then run through the methodology.

DAX Performance Test Results

Typically, when working with a large semantic model, you will not be able to see the actual data you are working with, which requires you to visualize the tables in your mind when creating virtual tables in DAX measures.

It is good to learn the different structures of creating DAX measures, which you can assign to variables. Remember, a measure is always scalar and never a table, so it returns a value within whatever filter context it is in, e.g., months in a matrix, the measure being sales.

But you can still create a virtual table in a DAX measure, but the return result needs to be a scalar; this is typically done with formulas, such as COUNTROWS(), SUMX, AVERAGEX(), MINX(), MAX(), and COUNTX().

We can use the FactInternetSales table and the DimDate table from the AdventureWorks Database.
FactInternetSales is joined to DimDate using the OrderDate.

Let’s say we wanted to get the number of customers with only one order and the average number of orders per customer. We can create the following measures:

-- First Measure Count of unique Customers
Customers = DISTINCTCOUNT('FactInternetSales'[CustomerKey])

--Second Measure Customers with one order

CustomersWithOneOrder = 
VAR _CustomerOrders = SUMMARIZE('FactInternetSales', -- Table 
                            'FactInternetSales'[CustomerKey], -- Groupby  
                             "Orders", DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber]) 
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1)

RETURN
COUNTROWS(_OneOrder)

-- Third measure % of customers that only have one order

% One Order = DIVIDE([CustomersWithOneOrder], [Customers],0)

--Average Orders per Customer
Orders Per Customer = 
                VAR _Customers = DISTINCTCOUNT(FactInternetSales[CustomerKey])
                VAR _Orders = DISTINCTCOUNT(FactInternetSales[SalesOrderNumber])
                
                RETURN
                DIVIDE(_Orders, _Customers, 0)

-- First Measure Count of unique Customers
Customers = DISTINCTCOUNT('FactInternetSales'[CustomerKey])

--Second Measure Customers with one order

CustomersWithOneOrder = 
VAR _CustomerOrders = SUMMARIZE('FactInternetSales', -- Table 
                            'FactInternetSales'[CustomerKey], -- Groupby  
                             "Orders", DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber]) 
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1)

RETURN
COUNTROWS(_OneOrder)

-- Third measure % of customers that only have one order

% One Order = DIVIDE([CustomersWithOneOrder], [Customers],0)

--Average Orders per Customer
Orders Per Customer = 
                VAR _Customers = DISTINCTCOUNT(FactInternetSales[CustomerKey])
                VAR _Orders = DISTINCTCOUNT(FactInternetSales[SalesOrderNumber])
                
                RETURN
                DIVIDE(_Orders, _Customers, 0)

We can put the measure into a matrix with dates, and now we have a useful metric that we can use as a KPI for the business.

For the average orders per customer table, we create a virtual table in our measure to get the results we need, but there are different methods of creating virtual tables in measures, so it’s good to test them all to get the best DAX performance.

Method 1: Summarize()

CustomersWithOneOrder = 
VAR _CustomerOrders = SUMMARIZE('FactInternetSales', -- Table 
                            'FactInternetSales'[CustomerKey], -- Groupby  
                             "Orders", DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber])  -- unique count 
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1)

RETURN
COUNTROWS(_OneOrder)

CustomersWithOneOrder = 
VAR _CustomerOrders = SUMMARIZE('FactInternetSales', -- Table 
                            'FactInternetSales'[CustomerKey], -- Groupby  
                             "Orders", DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber])  -- unique count 
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1)

RETURN
COUNTROWS(_OneOrder)

Method 2: SELECTCOLUMNS()
Note that, as the SELECTCOLUMNS() function iterates row by row, we have to use CALCULATE to overcome this

CustomersWithOneOrder2 = 
VAR _CustomerOrders = SELECTCOLUMNS(VALUES('FactInternetSales'[Customerkey]), --Unique column group by
                            "Customerkey", [CustomerKey], -- 
                             "Orders", CALCULATE(DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber])) -- unique order count
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1) 

RETURN
COUNTROWS(_OneOrder)

CustomersWithOneOrder2 = 
VAR _CustomerOrders = SELECTCOLUMNS(VALUES('FactInternetSales'[Customerkey]), --Unique column group by
                            "Customerkey", [CustomerKey], -- 
                             "Orders", CALCULATE(DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber])) -- unique order count
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1) 

RETURN
COUNTROWS(_OneOrder)

METHOD 3: SUMMARIZECOLUMNS()
The measure using summarize columns is the same as SUMMARIZE()

CustomersWithOneOrder3 = 
VAR _CustomerOrders = SUMMARIZECOLUMNS( 'FactInternetSales'[Customerkey], -- group by 
                             "Orders", CALCULATE(DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber]))
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1) 

RETURN
COUNTROWS(_OneOrder)

CustomersWithOneOrder3 = 
VAR _CustomerOrders = SUMMARIZECOLUMNS( 'FactInternetSales'[Customerkey], -- group by 
                             "Orders", CALCULATE(DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber]))
                                )
VAR _OneOrder = FILTER(_CustomerOrders, [Orders] = 1) 

RETURN
COUNTROWS(_OneOrder)

METHOD 4: FILTER()
This is the simplest way and perhaps the fastest, but we should test it.

CustomersWithOneOrder4 = 

COUNTROWS(
    FILTER(
        VALUES('FactInternetSales'[CustomerKey]),
        CALCULATE(DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber])) = 1
    )
)

CustomersWithOneOrder4 = 

COUNTROWS(
    FILTER(
        VALUES('FactInternetSales'[CustomerKey]),
        CALCULATE(DISTINCTCOUNT('FactInternetSales'[SalesOrderNumber])) = 1
    )
)

The winners of the DAX performance are back at the top of the page! Scroll back up!

For a more in-depth look at optimizing DAX measure performance, check out the guide on analyzing DAX server timings

October 10, 2025

Power BI Model Performance Optimization
1. Model Optimization

a) Remove unnecessary columns and tables
- Use DAX Studio Vertipaq analyzer to understand what the largest table and column are in your dataset.
- Remove columns that are duplicated in other parts of your dataset.
- Use the Measure Killer tool to find columns and tables not used in reports for deletion from the model.
b) Use correct data types:
- Use the smallest datatypes (you can check with Vertipaq analyzer (e.g., fixed decimal)
- Reduce precision (e.g., 0 decimal points instead of two).
c) Use Star schemas
- Use dimensional modelling (Fact-Dimension) instead of a snowflake or flat table.
- The further the number of joins from the fact table, the slower the filtering by dimension tables.
- Use low cardinality fields in relationships to minimize search.
d) Reduce cardinality
- Avoid unique strings (e.g., GUIDs, transaction IDs) in visuals or joins (although sometimes unavoidable).
- Split or group detailed columns (e.g., “Day” instead of full timestamp).
e) Disable auto date/time tables. Turn this feature off in options, as it bloats the model with hidden additional data. Use a central date table instead.

f) Aggregation tables
- Create summary tables (Import mode) for high-grain DirectQuery data. Try to avoid transaction-level tables in your model. If transaction-level data is required, consider using direct query, so as not to increase the size of your model (although reports will be slower).
- Power BI automatically uses them when queries match the aggregation grain.
g) Incremental refresh to reduce the dataset refresh time.
- Set up incremental refresh on large tables where data does not change historically, or if it does,s make use of an updated date field to use in your range parameters so as to ensure historical changes are also kept up to date (instead of e.g. the last day/week).
- Although incremental refresh may spike memory at some point, the reduced time will, in general, free up more memory time for report consumption.
h) Check model relationships are optimized.
- Try to keep as many relationships as one-to-many as a rule.
- Avoid bi-directional cross-filtering that can destabilize the model. Use only where necessary.
- Avoid many-to-many relationships that will slow down the model.
2. DAX measure optimization
- Avoid row-by-row iteration unless necessary. SUMX, FILTER, can be heavy; use SUM, AVERAGE, COUNTROWS where possible.
- Use variables in DAX measures to allow expressions to be reused in measures
- Consider using IF(ISINSCOPE() to use different measures for totals than filters. More complex, but worth testing.
- Avoid nested IF, SWITCH, or FILTER chains on large tables.
- Avoid DISTINCTCOUNT on large tables, you can try SUMX(VALUES(‘table'[column]), 1)
- Pre-calculate in ETL (views, etc., rather than putting load in DAX measures)
- Use KEEPFILTERS() and REMOVEFILTERS() carefully, as each adds cost.
- Test multiple variations when using complex measures, use SELECTCOLUMNS() instead of SUMMARIZE or GROUPBY()
3. Data Source / ETL Optimization
- Filter early in Power Query (or table source)
- Push filters to the source so DB does the work.
- Materialize complex transformations in the source. Avoid heavy joins, grouping, and parsing in Power Query. Push it back to SQL views, staging tables, or earlier in the pipeline.
- Compress categorical data – Replace text fields with dimension keys, e.g., ProductID, not ProductName
- Optimize refresh frequency if possible. Find the best time of day before peak report usage times.
- Use Incremental refresh to reduce refresh load and refresh times.
4. Report Layer Optimization
- Limit visuals per page. Each visual is one or more DAX queries. Try to reduce queries to around 8 per page.
- Use ‘Apply all filters’ to prevent visuals from refreshing on each slicer change. Wait for the user to finish slicing before applying the query, to prevent repetitive queries from being issued.
- Turn off unnecessary interactions between visuals using ‘Edit Interactions’.
- Pre-calculate totals in DAX or SQL. Cards with complex measures can be replaced by base measures or pre-aggregated values.
- Use bookmarks or buttons to toggle between visuals, to reduce the number of charts showing at one time (reduces the number of DAX queries being issued).
- Avoid visual-level calculations like TOP N Filters. Move logic into measures or calculated tables.
5. Service Level – For Power BI Premium / Fabric capacities.
- Enable a large dataset storage format. Reduces memory fragmentation and improves refresh concurrency.
- Use Aggregations + DirectQuery to split heavy workloads. Import summaries, query details on demand.
- Enable Query caching in capacity settings. Caches query results across users for faster loading. Note cache is cleared after a dataset refresh and potentially every hour as it checks for changes every hour.
- Use the XMLA endpoint to connect SSAS to the model and analyze refresh with SQL Server Profiler.
- Consider upgrading to higher capacities if the model is optimized, but still slow.
- Note that with Premium capacity, you can monitor CPU and memory usage using the Power BI Premium capacity metrics app.
October 9, 2025

Comparison of Full Data Pipelines from Data Ingestion to Data Science

A comparison of three types of data pipelines.

Technology data flow
Code data flow

Technology Data flow

Stage	Path 1 — Microsoft / Fabric	Path 2 — Snowflake + dbt (Cloud-agnostic)	Path 3 — Google Cloud (GCP)
Sources & Ingestion	Azure Data Factory (ADF) Fabric Dataflows Gen2 Event Hubs / IoT Hub (stream) ADF Copy Activity, REST, ODBC/JDBC	Snowpipe (auto-ingest) + Stages Fivetran / Stitch / Airbyte Kafka / Kinesis via connectors AWS Glue jobs (optional)	Cloud Data Fusion (GUI ETL) Pub/Sub (stream) Dataflow (Beam) ingestion Storage Transfer / Transfer Service
Raw Landing / Data Lake	Azure Data Lake Storage Gen2 OneLake (Fabric) Delta/Parquet zones: /raw /bronze	External Stages on S3/Azure/GCS Internal Stages (Snowflake-managed) Raw files (CSV/JSON/Parquet)	Google Cloud Storage (GCS) Raw buckets (landing) Formats: Avro/Parquet/JSON
Orchestration	ADF Pipelines & Triggers Fabric Pipelines Azure Functions (events) Azure DevOps/GitHub Actions (runs)	Airflow / Dagster / Prefect Snowflake Tasks & Streams dbt Cloud scheduler CI via GitHub Actions	Cloud Composer (Airflow) Workflows / Cloud Scheduler Dataform (dbt-like) scheduling
Transform (ELT / ETL)	Fabric Data Engineering (Spark) Azure Databricks (Delta) T-SQL in Fabric Warehouse Synapse SQL/Spark (legacy)	dbt models (SQL + Jinja) Snowflake SQL (MERGE/Tasks) Snowpark (Python/Scala) Streams for CDC	BigQuery SQL (ELT) Dataflow (Beam) for heavy lift Dataproc (Spark) when needed Dataform/dbt for modeling
Curated / Serving Warehouse	Fabric Warehouse / Lakehouse Dedicated SQL Pools (Synapse) Delta tables (silver/gold)	Snowflake (Databases/Schemas) Time Travel, Cloning Materialized Views	BigQuery Datasets Partitioned & clustered tables Materialized Views
Semantic Layer / Modeling	Power BI Datasets (Tabular) Calculation Groups (TE) Row-Level Security (RLS) Power BI Deployment Pipelines	dbt semantic models & metrics Headless BI (Cube/Virt.) RLS via Snowflake roles/policies DirectQuery/Live connections	Looker (LookML semantic layer) Looker Explore/Views/Models BigQuery Authorized Views Row/column policy tags
BI / Visualization & Analysis	Power BI (Desktop/Service) Paginated Reports (RDL) Excel over Power BI	Power BI / Tableau / Looker Studio Sigma / Mode (optional) Embedded analytics	Looker (first-class) Looker Studio (lightweight) Data Catalog-linked exploration
Data Science / ML	Azure ML (AutoML, MLOps) Databricks ML + MLflow SynapseML / ONNX	Snowpark ML / UDFs External: SageMaker / Databricks Feature Store via Snowflake/Feast	Vertex AI (AutoML, pipelines) BigQuery ML (in-SQL models) Feature Store (Vertex)
Data Quality / Governance	Microsoft Purview (Catalog/Lineage) Power BI lineage & sensitivity Great Expectations (optional)	Snowflake RBAC, Tags, Masking dbt tests, Great Expectations Monte Carlo/Bigeye (obs.)	Dataplex (governance) Data Catalog (metadata) DQ via Dataform tests / GE
DevOps / CI-CD & Infra	Azure DevOps / GitHub Actions Power BI Deployment Pipelines IaC: Bicep / Terraform	GitHub Actions + dbt CI schemachange / SnowChange IaC: Terraform / Pulumi	Cloud Build / Cloud Deploy Dataform CI, dbt CI IaC: Terraform
Monitoring / Cost Control	Azure Monitor / Log Analytics Fabric Workspace metrics Cost Mgmt + Budgets	Snowflake Resource Monitors Query History, Access History 3rd-party cost dashboards	Cloud Monitoring & Logging BigQuery INFORMATION_SCHEMA Budgets + Alerts

Code Data Flow

Stage	Microsoft / Fabric	Snowflake + dbt	Google Cloud (GCP)
Ingestion Code	Python ETL (requests, pyodbc) ADF / Fabric pipeline JSON Dataflow Gen2 JSON	CREATE PIPE / CREATE STAGE Airbyte / Fivetran configs (YAML) COPY OPTIONS	Apache Beam (Py/Java) Cloud Data Fusion JSON Pub/Sub schema JSON
Raw Landing Config	ADLS / OneLake folder layout Parquet / Delta write options Access policies (JSON)	Stages & File format DDL CSV / JSON / Parquet Grants & policies	GCS bucket layout Lifecycle rules JSON BQ external table DDL
Orchestration Code	ADF pipeline JSON + triggers Fabric Pipeline YAML Azure Functions (Python)	Airflow DAGs (Python) Prefect flows (Python) Snowflake TASKS SQL	Cloud Composer DAGs (Python) Cloud Scheduler jobs Dataform schedules
Transform / Modeling	Databricks notebooks (Py/Spark) Delta Live Tables pipelines T-SQL stored procs	dbt models (.sql) dbt Jinja macros (.sql) Snowpark (Python) UDFs	BigQuery SQL models (.sql) Dataform/dbt .sqlx + yaml Dataproc Spark notebooks
CDC / Merge to Curated	MERGE INTO (T-SQL) PySpark notebook jobs Delta OPTIMIZE/VACUUM	MERGE INTO curated.* SQL Streams for CDC Materialized Views	MERGE INTO USING staging Partition / Cluster DDL Stored procedures
Semantic Layer	Tabular model (TMDL) Calc groups (TE script) RLS DAX expressions	dbt semantic models (YAML) metrics.yaml / exposures Masking policies (SQL)	LookML view/model files Explores & joins Policy tags
BI / Report Code	Power BI PBIX / PBIT Paginated RDL XML PowerQuery M scripts	Tableau / Power BI BI SQL views Sigma workbooks	Looker dashboards (lkml) Looker Studio reports BQ UDFs (JS)
Data Science Code	Azure ML notebooks (Python) MLflow tracking code ONNX export	Snowpark-ML notebooks (Py) UDF registration SQL MLflow registry	Vertex AI notebooks (Python) BQML CREATE MODEL SQL Vertex pipelines (YAML)
Tests & Data Quality	Great Expectations suites Power BI model tests (DAX) Custom pytest checks	dbt tests (schema.yml) Great Expectations suites SQL anomaly checks	Dataform tests (assertions) Great Expectations in Beam INFORMATION_SCHEMA queries
CI/CD Config	GitHub Actions YAML Power BI Deployment Pipelines Bicep steps	dbt Cloud job YAML GitHub Actions for dbt Terraform scripts	Cloud Build YAML BQ deploy scripts Terraform modules
Infra as Code	Bicep / Terraform templates Azure DevOps variable groups	Terraform (Snowflake provider) SnowChange / schemachange	Terraform (GCS, BQ, VPC) IAM/Secrets configs

October 3, 2025

SQL Code comparisons: SQL Server, Snowflake, BigQuery

When jumping from one project to another, it can be useful to be able to compare common code structures.
I couldn’t find anything like this out there, so here it is magically created.

Contents:
Common Code Structures
Working with Dates
Window Functions
Error Handling
Casting
Joining Tables
CTE (Common Table Expressions)

Common Code Structures

Working with Dates

Date Topic	SQL Server	Snowflake	BigQuery
Build Date from Parts	SELECT DATEFROMPARTS(2025, 10, 3) AS d;	SELECT DATE_FROM_PARTS(2025, 10, 3) AS d;	SELECT DATE(2025, 10, 3) AS d;
Parse Date from String	SELECT TRY_CONVERT(date, '03/10/2025', 103); -- DD/MM/YYYY SELECT CONVERT(date, '2025-10-03', 23); -- ISO	SELECT TO_DATE('03/10/2025', 'DD/MM/YYYY'); SELECT TO_DATE('2025-10-03');	SELECT PARSE_DATE('%d/%m/%Y','03/10/2025'); SELECT DATE '2025-10-03';
Current Date / Timestamp	SELECT CAST(GETDATE() AS date) AS current_date, GETDATE() AS current_datetime;	SELECT CURRENT_DATE AS current_date, CURRENT_TIMESTAMP AS current_ts;	SELECT CURRENT_DATE() AS current_date, CURRENT_TIMESTAMP() AS current_ts;
Start of Month	SELECT DATEADD(MONTH, DATEDIFF(MONTH, 0, OrderDate), 0) FROM Orders;	SELECT DATE_TRUNC('MONTH', OrderDate) FROM Orders;	SELECT DATE_TRUNC(OrderDate, MONTH) FROM `project.dataset.Orders`;
End of Month	SELECT EOMONTH(OrderDate) FROM Orders;	SELECT LAST_DAY(OrderDate, 'MONTH') FROM Orders;	SELECT LAST_DAY(OrderDate, MONTH) FROM `project.dataset.Orders`;
Add Days	SELECT DATEADD(DAY, 7, OrderDate) FROM Orders;	SELECT DATEADD(DAY, 7, OrderDate) FROM Orders;	SELECT DATE_ADD(OrderDate, INTERVAL 7 DAY) FROM `project.dataset.Orders`;
Add Months	SELECT DATEADD(MONTH, 3, OrderDate) FROM Orders;	SELECT DATEADD(MONTH, 3, OrderDate) FROM Orders;	SELECT DATE_ADD(OrderDate, INTERVAL 3 MONTH) FROM `project.dataset.Orders`;
Difference (Days)	SELECT DATEDIFF(DAY, OrderDate, ShippedDate) FROM Orders;	SELECT DATEDIFF('DAY', OrderDate, ShippedDate) FROM Orders;	SELECT DATE_DIFF(ShippedDate, OrderDate, DAY) FROM `project.dataset.Orders`;
Difference (Months)	SELECT DATEDIFF(MONTH, StartDate, EndDate) AS months_between;	SELECT DATEDIFF('MONTH', StartDate, EndDate) AS months_between;	SELECT DATE_DIFF(EndDate, StartDate, MONTH) AS months_between;
Truncate to Week (Week Start)	-- Week starting Sunday SELECT DATEADD(WEEK, DATEDIFF(WEEK, 0, OrderDate), 0) FROM Orders;	-- ISO week (Mon start) SELECT DATE_TRUNC('WEEK', OrderDate) FROM Orders;	-- Default WEEK (Sun). Use WEEK(MONDAY) if needed SELECT DATE_TRUNC(OrderDate, WEEK) FROM `project.dataset.Orders`;
Truncate to Quarter	SELECT DATEADD(QUARTER, DATEDIFF(QUARTER, 0, OrderDate), 0) FROM Orders;	SELECT DATE_TRUNC('QUARTER', OrderDate) FROM Orders;	SELECT DATE_TRUNC(OrderDate, QUARTER) FROM `project.dataset.Orders`;
First Day of Year	SELECT DATEADD(YEAR, DATEDIFF(YEAR, 0, OrderDate), 0) FROM Orders;	SELECT DATE_TRUNC('YEAR', OrderDate) FROM Orders;	SELECT DATE_TRUNC(OrderDate, YEAR) FROM `project.dataset.Orders`;
Last Day of Year	SELECT EOMONTH(DATEFROMPARTS(YEAR(OrderDate), 12, 1)) FROM Orders;	SELECT LAST_DAY(OrderDate, 'YEAR') FROM Orders;	SELECT LAST_DAY(OrderDate, YEAR) FROM `project.dataset.Orders`;
Extract Year / Month / Day	SELECT YEAR(OrderDate) AS y, MONTH(OrderDate) AS m, DAY(OrderDate) AS d FROM Orders;	SELECT YEAR(OrderDate) AS y, MONTH(OrderDate) AS m, DAY(OrderDate) AS d FROM Orders;	SELECT EXTRACT(YEAR FROM OrderDate) AS y, EXTRACT(MONTH FROM OrderDate) AS m, EXTRACT(DAY FROM OrderDate) AS d FROM `project.dataset.Orders`;
Day Name / Weekday	SELECT DATENAME(WEEKDAY, OrderDate) AS day_name, DATEPART(WEEKDAY, OrderDate) AS weekday_num FROM Orders;	SELECT DAYNAME(OrderDate) AS day_name, DAYOFWEEK(OrderDate) AS weekday_num FROM Orders;	SELECT FORMAT_DATE('%A', OrderDate) AS day_name, EXTRACT(DAYOFWEEK FROM OrderDate) AS weekday_num FROM `project.dataset.Orders`;
Between Dates (Inclusive)	SELECT * FROM Orders WHERE OrderDate BETWEEN '2025-10-01' AND '2025-10-31';	SELECT * FROM Orders WHERE OrderDate BETWEEN '2025-10-01' AND '2025-10-31';	SELECT * FROM `project.dataset.Orders` WHERE OrderDate BETWEEN DATE '2025-10-01' AND DATE '2025-10-31';
Last 30 Days Filter	SELECT * FROM Orders WHERE OrderDate >= DATEADD(DAY, -30, CAST(GETDATE() AS date));	SELECT * FROM Orders WHERE OrderDate >= DATEADD(DAY, -30, CURRENT_DATE);	SELECT * FROM `project.dataset.Orders` WHERE OrderDate >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY);

Window Functions

Error Handling

Error Handling Topic	SQL Server	Snowflake	BigQuery
Replace NULL with Default	SELECT ISNULL(Amount, 0) AS SafeAmt, COALESCE(CustomerName, 'Unknown') AS SafeName FROM Sales;	SELECT IFNULL(Amount, 0) AS SafeAmt, COALESCE(CustomerName, 'Unknown') AS SafeName FROM Sales;	SELECT IFNULL(Amount, 0) AS SafeAmt, COALESCE(CustomerName, 'Unknown') AS SafeName FROM `project.dataset.Sales`;
Safe Division by Zero	SELECT CASE WHEN Denominator = 0 THEN NULL ELSE Numerator * 1.0 / Denominator END AS Ratio FROM Data;	SELECT DIV0(Numerator, Denominator) AS Ratio FROM Data;	SELECT SAFE_DIVIDE(Numerator, Denominator) AS Ratio FROM `project.dataset.Data`;
Safe Cast / Conversion	SELECT TRY_CAST(Value AS INT) AS SafeInt FROM RawData;	SELECT TRY_TO_NUMBER(Value) AS SafeInt FROM RawData;	SELECT SAFE_CAST(Value AS INT64) AS SafeInt FROM `project.dataset.RawData`;
Null If	SELECT NULLIF(Amount, 0) AS NullIfZero FROM Sales;	SELECT NULLIF(Amount, 0) AS NullIfZero FROM Sales;	SELECT NULLIF(Amount, 0) AS NullIfZero FROM `project.dataset.Sales`;
Error Catch / Try	BEGIN TRY SELECT 1/0; END TRY BEGIN CATCH SELECT ERROR_MESSAGE(); END CATCH;	-- Snowflake doesn’t have TRY/CATCH in SQL. -- Use TRY_* functions (TRY_TO_NUMBER, TRY_CAST) to avoid errors.	-- BigQuery doesn’t support TRY/CATCH in SQL. -- Use SAFE_CAST, SAFE_DIVIDE, IFERROR(expr, alt) in some contexts.
IfError Style (Return fallback if error)	-- Not native in T-SQL. -- Wrap in TRY/CATCH or use CASE + TRY_CAST.	SELECT TRY_TO_NUMBER(Value, 0) AS SafeVal;	SELECT IFERROR(1/0, NULL) AS SafeVal;

Casting

Casting Topic	SQL Server	Snowflake	BigQuery
Basic CAST	SELECT CAST(Amount AS DECIMAL(12,2)) AS amt_dec;	SELECT CAST(Amount AS NUMBER(12,2)) AS amt_dec;	SELECT CAST(Amount AS NUMERIC(12,2)) AS amt_dec;
Alt Syntax	SELECT CONVERT(DECIMAL(12,2), Amount) AS amt_dec;	SELECT Amount::NUMBER(12,2) AS amt_dec;	-- Standard CAST only (no ::) SELECT CAST(Amount AS NUMERIC(12,2));
Safe Cast	SELECT TRY_CAST(Value AS INT) AS safe_int;	SELECT TRY_TO_NUMBER(Value) AS safe_num;	SELECT SAFE_CAST(Value AS INT64) AS safe_int;
String → INT	SELECT TRY_CONVERT(INT, '123') AS i;	SELECT TRY_TO_NUMBER('123')::INT AS i;	SELECT SAFE_CAST('123' AS INT64) AS i;
String → DECIMAL	SELECT TRY_CAST('123.45' AS DECIMAL(10,2)) AS d;	SELECT TRY_TO_DECIMAL('123.45',10,2) AS d;	SELECT SAFE_CAST('123.45' AS NUMERIC(10,2)) AS d;
String → DATE (format)	-- DD/MM/YYYY SELECT TRY_CONVERT(date, '03/10/2025', 103);	SELECT TO_DATE('03/10/2025','DD/MM/YYYY');	SELECT PARSE_DATE('%d/%m/%Y', '03/10/2025');
String → TIMESTAMP	SELECT TRY_CONVERT(datetime2, '2025-10-03T14:05:00Z', 127);	SELECT TO_TIMESTAMP_TZ('2025-10-03T14:05:00Z');	SELECT TIMESTAMP('2025-10-03T14:05:00Z');
Epoch Seconds → TS	SELECT DATEADD(SECOND, 1696341900, '1970-01-01');	SELECT TO_TIMESTAMP(1696341900); -- seconds	SELECT TIMESTAMP_SECONDS(1696341900);
Time Zone Convert	SELECT (YourDT AT TIME ZONE 'UTC') AT TIME ZONE 'Europe/Malta' AS local_dt;	SELECT CONVERT_TIMEZONE('UTC','Europe/Malta', YourTS) AS local_ts;	-- Convert UTC timestamp to Europe/Malta datetime SELECT DATETIME(TIMESTAMP(YourDT), 'Europe/Malta') AS local_dt;
String → BOOLEAN	-- No direct parse; map via CASE SELECT CASE WHEN LOWER(val) IN ('true','1','y','yes') THEN 1 ELSE 0 END AS bitval;	SELECT TRY_TO_BOOLEAN(val) AS b;	SELECT SAFE_CAST(val AS BOOL) AS b;
To String (format)	SELECT CONVERT(varchar(10), OrderDate, 23) AS iso_date; -- YYYY-MM-DD	SELECT TO_VARCHAR(OrderDate, 'YYYY-MM-DD') AS iso_date;	SELECT FORMAT_DATE('%F', OrderDate) AS iso_date;
String → JSON	-- No JSON type. Keep NVARCHAR, use OPENJSON to shred: SELECT * FROM OPENJSON(@json);	SELECT PARSE_JSON('{"a":1,"b":"x"}') AS j; -- VARIANT	SELECT PARSE_JSON('{"a":1,"b":"x"}') AS j; -- JSON -- Safe variant: SELECT SAFE.PARSE_JSON(json_str) AS j;
Any → JSON String	-- Build JSON via FOR JSON: SELECT * FROM T FOR JSON AUTO;	SELECT TO_JSON(OBJECT_CONSTRUCT('a',1,'b','x')) AS json_s;	SELECT TO_JSON(STRUCT(1 AS a, 'x' AS b)) AS json_s;
Date/Time Families	-- date, datetime, datetime2, time, smalldatetime	-- DATE, TIME, TIMESTAMP_NTZ/LTZ/TTZ	-- DATE, DATETIME, TIME, TIMESTAMP

Joining Tables

Join Type	SQL Server	Snowflake	BigQuery
INNER JOIN	SELECT c.CustomerName, o.OrderID FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM Customers c JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM `project.dataset.Customers` c JOIN `project.dataset.Orders` o ON c.CustomerID = o.CustomerID;
LEFT JOIN	SELECT c.CustomerName, o.OrderID FROM Customers c LEFT JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM Customers c LEFT JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM `project.dataset.Customers` c LEFT JOIN `project.dataset.Orders` o ON c.CustomerID = o.CustomerID;
RIGHT JOIN	SELECT c.CustomerName, o.OrderID FROM Customers c RIGHT JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM Customers c RIGHT JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM `project.dataset.Customers` c RIGHT JOIN `project.dataset.Orders` o ON c.CustomerID = o.CustomerID;
FULL OUTER JOIN	SELECT c.CustomerName, o.OrderID FROM Customers c FULL OUTER JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM Customers c FULL OUTER JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM `project.dataset.Customers` c FULL OUTER JOIN `project.dataset.Orders` o ON c.CustomerID = o.CustomerID;
CROSS JOIN	SELECT c.CustomerName, p.ProductName FROM Customers c CROSS JOIN Products p;	SELECT c.CustomerName, p.ProductName FROM Customers c CROSS JOIN Products p;	SELECT c.CustomerName, p.ProductName FROM `project.dataset.Customers` c CROSS JOIN `project.dataset.Products` p;
SELF JOIN	SELECT e1.EmployeeName, e2.ManagerName FROM Employees e1 JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID;	SELECT e1.EmployeeName, e2.ManagerName FROM Employees e1 JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID;	SELECT e1.EmployeeName, e2.ManagerName FROM `project.dataset.Employees` e1 JOIN `project.dataset.Employees` e2 ON e1.ManagerID = e2.EmployeeID;
Semi Join (Exists)	SELECT c.CustomerName FROM Customers c WHERE EXISTS ( SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID );	SELECT c.CustomerName FROM Customers c WHERE EXISTS ( SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID );	SELECT c.CustomerName FROM `project.dataset.Customers` c WHERE EXISTS ( SELECT 1 FROM `project.dataset.Orders` o WHERE o.CustomerID = c.CustomerID );
Anti Join (Not Exists)	SELECT c.CustomerName FROM Customers c WHERE NOT EXISTS ( SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID );	SELECT c.CustomerName FROM Customers c WHERE NOT EXISTS ( SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID );	SELECT c.CustomerName FROM `project.dataset.Customers` c WHERE NOT EXISTS ( SELECT 1 FROM `project.dataset.Orders` o WHERE o.CustomerID = c.CustomerID );

CTE

CTE Topic	SQL Server	Snowflake	BigQuery
Basic CTE	WITH TopSales AS ( SELECT CustomerID, Amount FROM Sales WHERE Amount > 1000 ) SELECT * FROM TopSales;	WITH TopSales AS ( SELECT CustomerID, Amount FROM Sales WHERE Amount > 1000 ) SELECT * FROM TopSales;	WITH TopSales AS ( SELECT CustomerID, Amount FROM `project.dataset.Sales` WHERE Amount > 1000 ) SELECT * FROM TopSales;
Multiple CTEs	WITH f AS ( SELECT * FROM Sales WHERE Amount > 1000 ), g AS ( SELECT CustomerID, SUM(Amount) AS Total FROM f GROUP BY CustomerID ) SELECT * FROM g;	WITH f AS ( SELECT * FROM Sales WHERE Amount > 1000 ), g AS ( SELECT CustomerID, SUM(Amount) AS Total FROM f GROUP BY CustomerID ) SELECT * FROM g;	WITH f AS ( SELECT * FROM `project.dataset.Sales` WHERE Amount > 1000 ), g AS ( SELECT CustomerID, SUM(Amount) AS Total FROM f GROUP BY CustomerID ) SELECT * FROM g;
CTE with INSERT	WITH agg AS ( SELECT CustomerID, SUM(Amount) AS Total FROM Sales GROUP BY CustomerID ) INSERT INTO CustTotals(CustomerID, Total) SELECT CustomerID, Total FROM agg;	WITH agg AS ( SELECT CustomerID, SUM(Amount) AS Total FROM Sales GROUP BY CustomerID ) INSERT INTO CustTotals (CustomerID, Total) SELECT CustomerID, Total FROM agg;	WITH agg AS ( SELECT CustomerID, SUM(Amount) AS Total FROM `project.dataset.Sales` GROUP BY CustomerID ) INSERT INTO `project.dataset.CustTotals` (CustomerID, Total) SELECT CustomerID, Total FROM agg;
CTE with UPDATE/DELETE	-- UPDATE via CTE WITH d AS ( SELECT CustomerID, SUM(Amount) AS Total FROM Sales GROUP BY CustomerID ) UPDATE c SET c.Total = d.Total FROM CustTotals c JOIN d ON c.CustomerID = d.CustomerID; -- DELETE via CTE WITH old AS (SELECT * FROM Logs WHERE CreatedAt < '2025-01-01') DELETE FROM old;	-- UPDATE via CTE WITH d AS ( SELECT CustomerID, SUM(Amount) AS Total FROM Sales GROUP BY CustomerID ) UPDATE CustTotals c SET Total = d.Total FROM d WHERE c.CustomerID = d.CustomerID; -- DELETE via CTE WITH old AS (SELECT * FROM Logs WHERE CreatedAt < '2025-01-01') DELETE FROM Logs USING old WHERE Logs.id = old.id;	-- UPDATE via CTE WITH d AS ( SELECT CustomerID, SUM(Amount) AS Total FROM `project.dataset.Sales` GROUP BY CustomerID ) UPDATE `project.dataset.CustTotals` c SET Total = d.Total FROM d WHERE c.CustomerID = d.CustomerID; -- DELETE via CTE WITH old AS (SELECT id FROM `project.dataset.Logs` WHERE CreatedAt < DATE '2025-01-01') DELETE FROM `project.dataset.Logs` WHERE id IN (SELECT id FROM old);
Recursive CTE (Hierarchy)	WITH EmpCTE AS ( SELECT EmployeeID, ManagerID, 0 AS lvl FROM Employees WHERE ManagerID IS NULL UNION ALL SELECT e.EmployeeID, e.ManagerID, c.lvl + 1 FROM Employees e JOIN EmpCTE c ON e.ManagerID = c.EmployeeID ) SELECT * FROM EmpCTE OPTION (MAXRECURSION 100);	WITH RECURSIVE EmpCTE AS ( SELECT EmployeeID, ManagerID, 0 AS lvl FROM Employees WHERE ManagerID IS NULL UNION ALL SELECT e.EmployeeID, e.ManagerID, c.lvl + 1 FROM Employees e JOIN EmpCTE c ON e.ManagerID = c.EmployeeID ) SELECT * FROM EmpCTE;	WITH RECURSIVE EmpCTE AS ( SELECT EmployeeID, ManagerID, 0 AS lvl FROM `project.dataset.Employees` WHERE ManagerID IS NULL UNION ALL SELECT e.EmployeeID, e.ManagerID, c.lvl + 1 FROM `project.dataset.Employees` e JOIN EmpCTE c ON e.ManagerID = c.EmployeeID ) SELECT * FROM EmpCTE;
Recursive CTE (Date Series)	WITH Dates AS ( SELECT CAST('2025-01-01' AS date) AS d UNION ALL SELECT DATEADD(DAY, 1, d) FROM Dates WHERE d < '2025-01-31' ) SELECT * FROM Dates OPTION (MAXRECURSION 0);	WITH RECURSIVE Dates AS ( SELECT TO_DATE('2025-01-01') AS d UNION ALL SELECT DATEADD(day, 1, d) FROM Dates WHERE d < TO_DATE('2025-01-31') ) SELECT * FROM Dates;	WITH RECURSIVE Dates AS ( SELECT DATE '2025-01-01' AS d UNION ALL SELECT DATE_ADD(d, INTERVAL 1 DAY) FROM Dates WHERE d < DATE '2025-01-31' ) SELECT * FROM Dates;
CTE vs Materialization	-- CTE is not materialized. -- Use #temp or @table for reuse: SELECT ... INTO #t FROM ...; -- or: WITH c AS (...) SELECT ... FROM c;	-- CTE not materialized. -- Use TEMP TABLE or transient table: CREATE TEMP TABLE t AS SELECT ...; WITH c AS (...) SELECT ... FROM c;	-- CTE not materialized by default. -- BigQuery supports MATERIALIZED CTE hint: WITH c AS MATERIALIZED (SELECT ... ) SELECT ... FROM c;
Notes & Scope	-- Scope: single statement. -- Name must be unique within WITH. -- Recursive needs OPTION(MAXRECURSION ...).	-- Scope: single statement. -- Use WITH RECURSIVE for recursion. -- Can precede DML/DDL that supports SELECT.	-- Scope: single statement. -- WITH RECURSIVE supported. -- MATERIALIZED can improve reuse/perf.

October 3, 2025

Testing Dual mode in Power BI and BigQuery

In this test, we will try to see the performance improvement of using dual mode in a Power BI dimension table using a cloud database and a shared dataset, so we don’t get the performance issues experienced in the first dual mode test on a laptop.

Test 1
In the first test, we will use a single dimension table to test and a single fact table set to direct query mode.

I will use a public dataset from Google BigQuery.
Get Data > Google Big Query. I enter the project ID and am prompted for my Google login.

I can then navigate to the dataset I want to work with.

We choose a table from the New York taxi trips dataset, which has over 16 million rows.

This is configured with a Direct Query connection.

I will then add some simple measures for testing as follows:

Next, we’ll add the tax-zone_geom as a dimension table and add it with Direct Query mode initially. It has 263 rows.

I find there are duplicates in my dimension, which is a pain, so I need a new view in BigQuery to fix this.
I will create a simple view as follows (replace myproject_id with your own):

CREATE OR REPLACE VIEW `customer-myprojectid.pbitesting.DimZone` AS

SELECT
DISTINCT Zone_id, Zone_name
FROM
bigquery-public-data.new_york_taxi_trips.taxi_zone_geom

I can then connect Power BI to BigQuery using the different ProjectID and select my de-duped dimension table.

Next, I change the zone_id and pickup_location_id fields to whole numbers to save space.

I will then connect my DimZone table with my tlc_green_trips_2016 table using zone_id and pick_up_location_id.

I then changed my tax_zone_geom table to dual mode.

Then I publish my dataset to the Power BI service and we can connect to it with a new report.

Get Data > Power BI semantic models >

When I try to connect, I get the following BigQuery authentication error:

DataSource.Error: ADBC: The service bigquery has thrown an exception. HttpStatusCode is Unauthorized. Request is missing the required authentication credentials. Expected OAuth 2 access token, login cookie, or other valid authentication credential

So I need to add the credentials for accessing BigQuery to my dataset as follows in the dataset settings, and then I will need to sign in with my Google Account.

My settings then look like this:

I can then return to my Power BI Desktop report and connect to my semantic model.
I can now view the tables in my shared dataset.

I can then build a simple matrix with my measures from my tlc_green_tips_2016 fact table and the Zone_name from my dimension table, as well as adding a Zone_name slicer to my model.
It takes about 40 seconds to load.

Initial testing results are interesting. Even though caching is switched off in the dataset, we still see caching on testing with performance analyzer. Apparently this is due to session level caching, which can’t be switched off.

We didn’t see the session-level caching in the previous test, so we will have to select different selections each time to do the dual mode via import mode test. Also, we can’t do the full table test as this is also impacted by session-level caching.

In the first test, we will use a one-dimensional table and a fact table in direct query mode. We will test the dimension table in dual mode and then in import mode, and see the impact on performance.

Dual Mode Test 1 Results: 1 Dimension Table and 1 Fact Table in Direct Query Mode

After excluding outliers, we can see the import mode is trending faster than the dual mode.
After outliers are removed, which look more like anomalies due to resource limitations, rather than as part of the norm,
the T-test shows a significant difference between the 2 sets of observations.

Test 2

In the next test, we will test dual mode and 2 different fact tables. One is in import mode and the other is in direct query mode. This is where we should see the performance improvement of using dual mode.
We start by adding an additional fact table to the model, but this time in import mode.

The tlc_green_trips_2017 table. This table is 12.7M rows.
Then I rebuild the model. The 2016 is in direct query mode, the DimZone table is in dual mode, and the 2017 table. I then add some additional measures for each table to use in the Matrix test.

My new test matrix now has some measures from both my fact table 2016 (import), 2017 (direct query)

Test Results:
The trend appears to favor the import mode again, which was kind of disappointing as i wanted to see a similar improvement as i saw using SQL server locally.

Running a Test using Python in Power BI
We can use the Python visual to create some Python graphs and a T-Test.
I use AI to generate the code for me, and i add it into the Python script editor.
And i add the Zone, mode, and load time to the value wells of the visualisations pane.

The results are shown below, above the box plot, with the p-value = 0.0001 or a 1 in a thousand chance of this happening by chance. It seems that import mode is faster again. Perhaps BigQuery doesn’t work as well. It seems testing is the only way of finding out. Models may be faster in some circumstances than in others.

September 14, 2025

Testing Dual Mode Impact using the performance analyzer

Dual mode can potentially speed up queries by enabling the cache in certain situations. In theory, it can speed up queries in certain situations by utilizing the cache.
Microsoft’s description is here

In the following example, we start with the V_FactOnline sales view. It is a large table with over 12M rows and 19 columns. We could import it into our dataset, but it will increase the refresh time of our dataset and use valuable memory in our model that we may want to conserve.

To get an idea of the size of the table in our model, we can use the Vertipaq analyzer in DAX Studio.
I have imported 3 tables into our data model so far.

1. V_FactOnlineSales in import mode
2. V_DimProduct in import mode
3. FactOnlineSales table in direct query mode (which is the same size as V_FactOnlineSales

The total size of our dataset now is 233MB, and we can see that the imported V_FactOnlineSales table is taking up nearly all that space. When we have added another 50 tables to our model, we might decide to switch the v_fact_online sales table to a direct query model to reduce refresh time and free up memory. You will also notice that the FactSalesOnline table in direct query mode uses no memory.

After deleting the imported V_FactOnlineSales table, we can import it again, but in import mode, so it does not take up any space in our dataset. We have a one-to-many relationship between V_dimProduct and V_FactOnline Sales, with a single cross-filter direction, so only v_DimProduct and filter V_FactOnlineSales and not the other way around. They are joined by the ProductKey.

one to many relationship between dimension and fact table.

We’ll start by creating some simple measures from the FactOnlineSales table as follows:

Sales = SUM(V_FactOnlineSales[SalesAmount])
Cost = SUM(V_FactOnlineSales[TotalCost])
Profit = [Sales]-[Cost]
Profit Margin = DIVIDE([Profit], [Sales], 0)

We can organise them in the home table of V_FactOnline Sales and give them a folder to keep them together.

We can then add our measures to a matrix visual and a slicer on product name using the measures and the product name field from the dimproduct table. We can then use the performance analyzer and refresh the visuals. It takes about 5 seconds for the matrix to load. If I clear the query results and refresh the performance analyzer. I have a nice benchmark of about 5 seconds to work with. The performance analyzer is designed not to cache the queries, so this seems reliable to work with.

Power BI matrix showing product sales and profit.

If I select a single product from the slicer, it takes about 1 second to load the matrix.

Now, if I remove a product from my slicer, it takes longer (about 6 seconds, consistently).

Performance analyzer results one prouct removed

Next, I could move on to testing the DimProduct table in dual mode. Note I had to recreate the table in Direct Query mode and switch it to dual mode, as I wasn’t allowed to switch from import mode to dual mode.

A comparison of the tests between the two modes is in the table below.
I found I had quite consistent results, although when the full table was selected, I did run into performance problems on my laptop, and it was more of an issue in Dual mode, which does use more memory and some additional CPU (it creates a copy of the table import mode into memory to execute the query). Therefore, I will need to do a cloud-based test to get a more reliable picture. In this test, I’m using Power BI locally with SQL Server on an aging laptop.

The most obvious difference I found between the 2 modes was in individual selections in the slicer, with the Dual mode performing over twice the speed vs. import mode, which is consistent with what should be expected by using dual mode.

Matrix loading speed using the performance analyzer

Product Slicer Selection	DimProduct: Import FactOnelineSales: Direct Query	DimProduct: Dual Mode FactOnlineSales: Direct Query
Select All:	5-6 seconds	6 -8 seconds (CPU maxing out)
One Product selected	1.5 seconds	0.5 seconds
Exclude one product	6 seconds	1.5 seconds

To summarize, performance improvements were apparent for slicing with a dimension using dual mode, but a cloud-based test is required, in particular, to see the speed of the full unfiltered matrix refresh.

September 13, 2025

Power BI Report Builder Report Page formatting basics

While many people use Power BI, but never try report builder, but report builder is a handy tool for certain projects. If you have large volumes of detailed data that would take up too much space in your data model, or you find that direct queries are too slow, paginated reports can provide an alternative.

In this example, we’ll grab some data from Kaggle. I want a big customer file that I can work with. In real life, you’re most likely to be importing this from a Data Warehouse, but the only difference here is the connection and import; all the methods of work will be the same once you’ve imported the data into your dataset.
Here is the file I will use, entitled Bank Customer Segmentation 1M+ rows.
https://www.kaggle.com/datasets/shivamb/bank-customer-segmentation

The file we have is from a fictitious bank; someone has randomly generated a date for.
On inspection in Excel, the file is 1,048,569 rows

You can download the report builder for free from here:

https://www.microsoft.com/en-us/download/details.aspx?id=10594

Open the report builder after you’ve installed it and select a blank report to get started:

No the first thing we want to do is get our data, so right click on the Datasets folder and select ‘Get Data’, and you can log in your Power BI account here. If you don’t have one, there is a free trial. I’ll let you figure that bit out.

Once you’ve logged in, you’ll get the data source screen where you can select your data source. There are a whole host of data connector files here to choose from, including data warehouses such as Google BigQuery, SQL Server, Snowflake, as well as other tools such as Google Analytics, and in this case, because I’m lazy, I will just use the CSV file that I downloaded, but it doesn’t matter for learning report builder.

On the next screen, you need to enter your data connection information, but in this example, there is the option of uploading a file at the top below connection settings.

The main difference here is that with a connection to a data warehouse or whatever, once you publish your report to the Power BI service, the report will refresh with the latest data when the report is reloaded, whereas I will need to upload my file again.

Once my file is uploaded, I get a preview of the data, and I can change a few things, like the delimiter, which is set to comma-delimited by default.

Next, I click on Transform Data, which takes me down to a scaled-down version of Power Query compared to the version included with Power BI Desktop.

Here you can do things like add custom columns and format your data.
Click Create to import your dataset from Power Query into the Report Builder design window.

So let’s start by building a simple paginated report. I will use a simple table: Insert a table, and then I will drag my fields into the table.

Design and formatting the paginated report

First, we need to turn on the properties window (View > Properties), where we can easily edit things like fonts and background colours.

I got our mutual friend to create a logo for me and supplied the background colours in RGB format,

For the logo, I will add it to the head of the report (Insert Header). I add a rectangle and move the image into the rectangle, so I can control its properties. The design is now coming to life (Roar).

I can preview the report to test how it works by selecting Home > Run:
It takes a while to load as it loads the whole dataset, not page by page.

Below is the screenshot of the report in preview mode. There are a million rows, so it breaks the page across pages by default: It paginates the report. The logo in the header displays across pages, but the headers do not, so I need to fix that.

Fixing Row Headers in place with Power BI Report Builder

To fix the row headers in place, we need to do the following:
Select the table (Tablix) and select advanced mode in the bottom right corner. The row and column groups will display at the bottom of the screen.

In the row groups box (bottom left), I can control the positioning of the rows during pagination and scrolling.
I set the following for the Title row and the headers row:

FixedData = True. This will fix the row in position when I scroll.
KeepWithGroup = After. This is to stop the row from breaking from the row below.
RepeatOnNewPage = True. This I to repeat the row on each page.

I set these on both the title row and the headers row (click on static for each, and the properties will show on the right). To turn on properties, click View> Properties from the main ribbon.

Now, when I scroll, the header and title are fixed at the top of the page and also stay in place when I navigate pages.

Now we have a basic report working, let’s publish it to the service. Remember, if we had a connection to a data warehouse table, the report would grab the latest data for us, but for now, we’re lazy and just using a static file that won’t change.

To do so, Select File Publish and select the workspace you want to publish to:

Once I’ve published my report, I can view it on the service and give others access:
The report takes a while to load the 1 M-plus rows. Once it’s loaded, you can see some of the features available, such as export to Excel, which is a popular option. So can also set up a subscription to the report so the report will run and email it to you. This can be a useful option for some reports.

July 4, 2025
SQL Window Functions
Summary

1. Aggregate Window Functions
These functions perform calculations across a set of table rows that are somehow related to the current row

SUM() OVER(…) – Running total or sum per partition
AVG() OVER(…) – Average per partition
COUNT() OVER(…) – Count rows in partition
MIN() OVER(…) – Minimum value in partition
MAX() OVER(…) – Maximum value in partition

2. Ranking Window Functions
These assign a rank or row number to each row within a partition.

ROW_NUMBER() OVER(…) – Unique sequential number per row in partition
RANK() OVER(…) – Rank with gaps for ties
DENSE_RANK() OVER(…) – Rank without gaps for ties
NTILE(n) OVER(…) – Divides partition into n buckets and assigns a bucket number

3. Value Navigation (Analytic) Functions
These functions return values from other rows in the window frame relative to the current row.

LAG(expression, offset, default) OVER(…) – Value from a previous row
LEAD(expression, offset, default) OVER(…) – Value from a following row
FIRST_VALUE(expression) OVER(…) – First value in the window frame
LAST_VALUE(expression) OVER(…) – Last value in the window frame
NTH_VALUE(expression, n) OVER(…) – The nth value in the window frame

Examples

1. Aggregate Window Functions

These perform calculations over a window of rows but return a value for each row.

Name Sales Date
Alice 100 2025-01-01
Alice 150 2025-01-03
Bob 200 2025-01-01
Alice 50 2025-01-05
Bob 300 2025-01-04

SUM() OVER()
```
SELECT
  Name,
  Sales,
  SUM(Sales) OVER (PARTITION BY Name ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal
FROM SalesData;
```
The window function SUM(Sales) OVER (...) calculates a running total of sales for each Name.

The window is partitioned by Name (so calculations are done separately for each person).

The rows are ordered by Date within each partition.

The frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means the sum includes all rows from the first date up to the current row’s date.

Output:

Name Sales RunningTotal
Alice 100 100
Alice 150 250
Alice 50 300
Bob 200 200
Bob 300 500

AVG() OVER()

Customer_id sales
101 200
102 150
101 300
103 400
102 250
101 100
```
SELECT 
customer_id, 
sales, 
AVG(sales) OVER (PARTITION BY customer_id) AS AvgSales 
FROM orders;
```
- The window function AVG(sales) OVER (PARTITION BY customer_id) calculates the average sales for each customer_id.
- The average is computed over all rows with the same customer_id.
- The result is shown on every row corresponding to that customer.
customer_id sales AvgSales
101 200 200
101 300 200
101 100 200
102 150 200
102 250 200
103 400 400

COUNT(*) OVER

department employee_id
Sales 101
Sales 102
HR 201
HR 202
HR 203
IT 301
```
SELECT
  department,
  employee_id,
  COUNT(*) OVER (PARTITION BY department) AS DeptCount
FROM employees;
```
- The window function COUNT(*) OVER (PARTITION BY department) counts the total number of employees in each department.
- This count is repeated on every row for that department.
- No ordering is required here because the count is the same for all rows in the partition.
department employee_id DeptCount
Sales 101 2
Sales 102 2
HR 201 3
HR 202 3
HR 203 3
IT 301 1

2. Ranking Functions

Assign ranks or row numbers within partitions.

ROW_NUMBER() — Assigns unique row numbers

employee_id department salary
101 Sales 5000
102 Sales 7000
103 Sales 6000
201 HR 4500
202 HR 4800
301 IT 8000
```
SELECT
  employee_id,
  salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS RowNum
FROM employees;
```
employee_id salary RowNum
102 7000 1
103 6000 2
101 5000 3
202 4800 1
201 4500 2
301 8000 1

The window function ROW_NUMBER() assigns a unique sequential number to each row within the partition defined by department.

Rows are ordered by salary in descending order within each department.

The highest salary in each department gets RowNum = 1, the next highest gets 2, and so on.

RANK() — Assigns rank, with gaps for ties

employee_id department salary
101 Sales 7000
102 Sales 7000
103 Sales 6000
201 HR 4800
202 HR 4500
301 IT 8000
302 IT 8000
303 IT 7500
```
SELECT
  employee_id,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS Rank
FROM employees;
```
The RANK() function assigns a rank to each employee within their department, ordering by salary descending.

Employees with the same salary get the same rank.

The next rank after a tie skips the appropriate number of positions (i.e., gaps in ranking).

employee_id salary Rank
101 7000 1
102 7000 1
103 6000 3
201 4800 1
202 4500 2
301 8000 1
302 8000 1
303 7500 3

DENSE_RANK() — Like RANK but no gaps

employee_id department salary
101 Sales 7000
102 Sales 7000
103 Sales 6000
201 HR 4800
202 HR 4500
301 IT 8000
302 IT 8000
303 IT 7500
```
SELECT
  employee_id,
  salary,
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS DenseRank
FROM employees;
```
- DENSE_RANK() assigns ranks within each department, ordered by salary descending.
- Employees with the same salary get the same rank.
- Unlike RANK(), DENSE_RANK() does not skip ranks after ties. The next distinct value gets the next consecutive rank.
employee_id salary DenseRank
101 7000 1
102 7000 1
103 6000 2
201 4800 1
202 4500 2
301 8000 1
302 8000 1
303 7500 2

3. Value Navigation Functions

Access values from other rows relative to the current row.

LAG() — Value from previous row

employee_id department salary
101 Sales 5000
102 Sales 6000
103 Sales 7000
201 HR 4500
202 HR 4800
301 IT 8000
```
SELECT
  employee_id,
  salary,
  LAG(salary, 1) OVER (PARTITION BY department ORDER BY salary) AS PrevSalary
FROM employees;
```
LAG(salary, 1) returns the salary value from the previous row within the same department, based on the ascending order of salary.

For the first row in each department, there is no previous salary, so the result is NULL.

employee_id salary PrevSalary
101 5000 NULL
102 6000 5000
103 7000 6000
201 4500 NULL
202 4800 4500
301 8000 NULL

LEAD() — Value from next row

employee_id department salary
101 Sales 5000
102 Sales 6000
103 Sales 7000
201 HR 4500
202 HR 4800
301 IT 8000
```
SELECT
  employee_id,
  salary,
  LEAD(salary, 1) OVER (PARTITION BY department ORDER BY salary) AS NextSalary
FROM employees;
```
LEAD(salary, 1) returns the salary value from the next row within the same department, based on the ascending order of salary.

For the last row in each department, there is no next salary, so the result is NULL

5000 6000
102 6000 7000
103 7000 NULL
201 4500 4800
202 4800 NULL
301 8000 NULL

LAST_VALUE() — Last value in the window frame

employee_id department salary
101 Sales 5000
102 Sales 7000
103 Sales 6000
201 HR 4500
202 HR 4800
301 IT 8000
```
SELECT
  employee_id,
  salary,
  LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS MaxSalary
FROM employees;
```
- LAST_VALUE(salary) returns the last salary value in the ordered partition.
- The window frame ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ensures the entire partition is considered (not just up to the current row).
- When ordered by salary ascending, the last value is the highest salary in the department.
employee_id salary MaxSalary
101 5000 7000
103 6000 7000
102 7000 7000
201 4500 4800
202 4800 4800
301 8000 8000
May 31, 2025
Testing Power BI DAX Measures Speed with DAX Studio
DAX Expression > DAX query > Vertipaq query.

Test 1: Testing the Impact of USERELATIONSHIP() in a DAX Measure

Does using USERELATIONSHIP() with inactive relationships slow down a DAX measure? I’m not sure so I thought I would test it out.

To test this I will create a measure using the existing between my date table and my FactOnlineSales table and then I created a copy of the date table and joined it in the same way to the FactOnlineSales table, but making the relationship inactive.

I use DAX studio to create 2 measures for calculating total sales and then I throw in some filters for year and continent as follows:
DEFINE MEASURE '_Measures'[test1] = CALCULATE ( SUMX ( 'FactOnlineSales',[Sales Amount] ), 'DimDate'[CalendarYear] = 2009, 'DimGeography'[ContinentName] = "Europe" ) MEASURE '_Measures'[test2] = CALCULATE ( SUMX ( 'FactOnlineSales',[Sales Amount] ), 'DimDate (2)'[CalendarYear] = 2009, 'DimGeography'[ContinentName] = "Europe", USERELATIONSHIP('DimDate (2)'[Datekey], FactOnlineSales[DateKey]) ) EVALUATE ROW ( "test", [test2] )
```
DEFINE
    MEASURE '_Measures'[test1] =
        CALCULATE (
            SUMX (
                'FactOnlineSales',[Sales Amount]
            ),
            
            'DimDate'[CalendarYear] = 2009,
            'DimGeography'[ContinentName] = "Europe"
            ) 

    MEASURE '_Measures'[test2] =
        CALCULATE (
            SUMX (
                'FactOnlineSales',[Sales Amount]
            ),
            'DimDate (2)'[CalendarYear] = 2009,
          	'DimGeography'[ContinentName] = "Europe",
            USERELATIONSHIP('DimDate (2)'[Datekey], FactOnlineSales[DateKey])
        )
EVALUATE
ROW (
    "test", [test2] 
)
```
Below is a screenshot of my DAX studio settings. As you can see I’ve got the ‘Clear cache on Run’ button selected. This will help ensure each test is equal. I also have the use ‘Server Timings’ button selected. Note be careful using ‘Clear cache on run’ on a connection to a production model as it will clear the cache (hence slowing down reports).

For the original sale measure that has an active relationship to the data table, I get a time of 2,197 which is broken down into FE (Formula engine) is 624 ms and 1,573 ms for the SE (Storage engine -that’s the Vertipaq engine). The FE turns the DAX expression into a DAX query and creates a query plan that it sends to the SE to retrieve the results of the query. The FE also caches the results of queries (this is why I have turned on the clear cache when the DAX query is run, otherwise the subsequent queries will return results very quickly).

Comparing the 2 measures
Here we run the query on each measure 5 times to get an average and as you can see there was very little difference in performance. Indeed the first 3 tests on the measure with the active relationship took longer (but this may have been because my laptop was busy doing something). My results are below, I think they are pretty inconclusive, but they are interesting none the less, and the method would be interesting to test with a larger model.

In the server timings you can view the Vertipaq query that is executed by the Analysis services:

Each step of the execution is below:
The first step selects the data from FactOnlineSales with the 2 filters returning 1.4M rows of data of 133 KB.
The second step does the sum of the sales amount and returns one row. The third step returns the start and end time as well as other information (if you mouse over the query).

The Par. field stands for Parrallism, which represents the number of parallel threads used by the storage engine to retrieve the result at each step.
January 1, 2025
Analysing the Power BI Data Model to Improve Refresh Rates with DAX Studio

I’ll start with some references:

Definitive Guide to DAX Book by Marco Russo and Alberto Ferrari (the definitive experts).
The Guy in the Cube video on model refresh slow:
Chat GPT (I don’t know who that is!)
My own testing and experience and demo using Power BI model using data from the Contoso Retail data warehouse

First things: Connecting DAX Studio to your data model in the Power BI service
1. Get the XMLA entry point URL from Power BI by selecting workspace settings > License info:
You will need to log in with your Power BI login to connect.

2. You can connect to your model in the service with DAX studio as follows (you may need to download and install it first). You can open it either independently or using the external tools within Power BI.

File and Model Size
Typical models use data imported into physical storage from a data warehouse and potentially other sources, such as Google Analytics. The Vertipaq engine stores the model in virtual memory. My Contoso Retail Power BI file is 193 MB (that’s not very big). It is compressed to reduce storage requirements.

Model Size in DAX Studio (Advanced > View Metrics > Summary)
When I look at the model summary from the advanced Metrics in DAX studio, the model loaded into virtual memory by the Vertipaq engine is 253 MB. Also of interest are the 37 tables and 366 columns. My model only has 8 tables and 1 calculated table, so it will be interesting later to understand what these other tables are.

So why is this different?

In-memory Compression: Power BI compresses data in the .pbix file to reduce storage, so the file size on disk is often smaller than its in-memory model size. However, once loaded, Power BI expands it for efficient query performance. There is still compression (next section), it’s just not as much as in the stored file.

Metadata Overhead: The model size shown in DAX Studio includes metadata, indexes, dictionaries, and structures needed for DAX calculations and relationships that Power BI uses while executing queries, which aren’t directly reflected in the file’s storage size.

Cache and Temp Data: DAX Studio may include caches or other temporary data generated during analysis, which increases the apparent size of the model in memory.

Unused Columns or Tables: Any tables or columns loaded but not used may also contribute to the model’s size in DAX Studio, while Power BI might not fully load them in the saved file.

Note: According to the model settings, the model is not using caching (which is off by default), so I can rule out one of the four possibilities.

Power BI License requirements based on model size

The Power BI Pro license allows for models up to 1GB in size, 10 GB of native storage, and 8 refreshes a day
The Power BI Premium license allows for models up to 100GB in size and 100 TB of native storage and up to 48 refreshes a day.

Note: Beyond just loading the raw model, Power BI needs additional memory overhead for processing, caching, and temporary storage during data refreshes and calculations. This means that even if your model fits within the 1 GB limit, you’re likely to experience slow performance or even errors during complex operations, especially if near that cap.

Semantic Model Refresh Speed (Vertipaq engine: imported tables)

We can easily access the model refresh times in the Power BI workspace (Select ‘Refresh history’ from the 3 dots menu next to the model entry. This gives us our total model refresh benchmarks (we can average them (or remove outliers as required at known busy times).

So what are the stages of the model refresh and which ones can we troubleshoot and optimize?

1. Data Import time: This one we can measure. Adam from Guy in the Cube video gives us a simple suggestion on how we can benchmark our table’s load times, by simply doing a SELECT * on each table in the data warehouse and making a benchmark of the time (repeat process at different times of the day (perhaps remove outliers at busy times) and average. I guess there is also a network latency speed to consider with large volumes of data, but let’s just do the: Total Refresh time – Table SELECT * time:

Here is an example from my Contoso data warehouse tables:
1. DimCurrency: Time 0: Rows: 28
2. DimCustomer: Time 0: Rows: 18,869
3. DimDate: Time 0: rows 648
4. DimGeography: Time 0: Rows 274
5. DimProduct: Time 0: Rows: 2467
6. DimProductCAtegory: Time 0: Rows 8
7. DimProductSubcategory: Time 0: Rows: 48
8. DimStore: Time 0: Rows 306
9. FactOnlineSales: 2 Mins 49 Secs, Rows: 12.6M

As you can see only the FactOnline sales table took any real time to SELECT.
But then you might say – hey, why are you using tables in your models and not views? If a new column appears the model refresh will fail.
So you’re right in which case you can run the tests on your views.

Now I don’t have a gateway setup to connect to my laptop today, so I need to set up something in the cloud to perform the test but let’s say the model refreshed in 4 minutes and your SELECT * benchmarks totaled 2 Minutes and 49 seconds, then you can say the for 1 Minute and 11 seconds the power bi service is doing something else.

So what is happening was the data has been imported? Well, that’s when we need to get into the Vertipaq engine. Note, that the engine is proprietary and so has secrets that we will never learn, but what we do know about the steps it performs are as follows:

Vertipaq engine steps
1. Loads the tables into columns
2. Check each column for the best sort order
3. Applys compression to create the smallest fastest model
4. Loads the model into Memory

Note: the Vertipaq engine only deals with imported data. Direct Queries are processed by the Direct Query engine.

DAX formula engine
1. Creates calculated columns and tables (in memory) – not compressed as after Vertipaq
2. Creates relationships between columns

So we need to learn more about what these two engines do.

The Vertipaq Engine
1. Stores imported data into a columnar database (for faster data retrieval).

2 . Sorts the columns by the best order for compression techniques (such as RLE)

3. Compresses the columns using various (proprietary techniques), such as:

a) Value encoding (e.g. deducting a constant value from each column entry to create a value with fewer bits and hence less space.

b) Hash encoding (dictionary coding), creating a table similar to normalization technique where an ID and the value are stored in a table in memory (e.g. product sizes, 0: small 1: medium, and 2: large). The 0, 1, and 2 will replace the sizes small, medium, and large resulting in fewer bits being stored in the table.

c) Run Length encoding (RLE): When values are repeated in a column entries 1-2000 are all Q1, then this can be put in a table of Q1: 2000 (first 50 rows are 50). Entries 2001 to 3500 are Q2. This can be put into the table as Q2: 1500 and so on.
Using this technique the Vertipaq engine can compute the results very quickly. Sorting is therefore very important for RLE-type compression. The engine does do sorting, but you can potentially help it by sorting the data by columns with the lowest cardinality first.
the
Factors impacting compression are therefore as follows:
1. Cardinality (the more unique values, the less compression can be done, hence increasing the size of the column in memory
2. Number of repetitive values – The more, the better the RLE compression possible.
3. The number of rows is important, but the cardinality and repetitive values are more important for compression
4. The data type – e.g. automatic date/time should be turned off as it created a larger entry size than the date field alone. Firstly the size will be smaller and if removing time results in much lower cardinality the column can be compressed much further using either RLE or Hash encoding.

Primary Keys such as user ID integer types will most likely use value encoding.
Numbers such as sales amounts that don’t have massive outliers will probably also be value encoded otherwise dictionary encoding may be used.

To determine the best sort order, a sample of the first x number of rows (I think its a million), is taken, so it’s important that the first x number of rows are of the best quality. You can help the Vertipaq engine therefore by ensuring your initial data is representative of the whole dataset or the engine will have to re-encode data e.g. if it finds outliers and finds its value compression doesn’t work (hence slowing down the process).

We can look at the type of compression used, by using DAX Studio (Advanced > View Metrics)

Firstly, you’ll see an overview of the tables. Each column in a table is encoded differently, so you’ll see the encoding type as ‘Many’.
You need to drill down to get to the column level (or you could just select the column tab on the left-hand side).

Now, let’s have a look at the FactOnlineSales table.

The table is sorted by default by the column size. This is the size in Bytes that the column is consuming in memory.

The first column: OnlineSalesKey has 12, 6127,608 rows and a cardinality of 12, 627,608 and as the name suggests is a unique key, hence it has the same cardinality (unique values) as rows.
It also is the largest column consuming 84MB of memory and 31.77% of the database.
It is encoding as VALUE type,

Rows: 12627608
Cardinarlity: 12627608
Col Size: 84156128 (Data + Dictionary + HierSize)
Data 33, 645560
Dictionary: 128
Hier Size: 50, 5101440
% Table: 33.07%

Unique keys are generally required fields in most tables, so it’s not something you can usually safe space by optimizing (unless you can reduce the size of the key somehow).

By applying the same analysis you can determine which columns and tables are having the biggest impact on your model size and hence slowing your model refresh time.

January 1, 2025

Name	Sales	Date
Alice	100	2025-01-01
Alice	150	2025-01-03
Bob	200	2025-01-01
Alice	50	2025-01-05
Bob	300	2025-01-04

Name	Sales	RunningTotal
Alice	100	100
Alice	150	250
Alice	50	300
Bob	200	200
Bob	300	500

Customer_id	sales
101	200
102	150
101	300
103	400
102	250
101	100

customer_id	sales	AvgSales
101	200	200
101	300	200
101	100	200
102	150	200
102	250	200
103	400	400

department	employee_id
Sales	101
Sales	102
HR	201
HR	202
HR	203
IT	301

department	employee_id	DeptCount
Sales	101	2
Sales	102	2
HR	201	3
HR	202	3
HR	203	3
IT	301	1

employee_id	department	salary
101	Sales	5000
102	Sales	7000
103	Sales	6000
201	HR	4500
202	HR	4800
301	IT	8000

employee_id	salary	RowNum
102	7000	1
103	6000	2
101	5000	3
202	4800	1
201	4500	2
301	8000	1

employee_id	department	salary
101	Sales	7000
102	Sales	7000
103	Sales	6000
201	HR	4800
202	HR	4500
301	IT	8000
302	IT	8000
303	IT	7500

employee_id	salary	Rank
101	7000	1
102	7000	1
103	6000	3
201	4800	1
202	4500	2
301	8000	1
302	8000	1
303	7500	3

employee_id	department	salary
101	Sales	7000
102	Sales	7000
103	Sales	6000
201	HR	4800
202	HR	4500
301	IT	8000
302	IT	8000
303	IT	7500

employee_id	salary	DenseRank
101	7000	1
102	7000	1
103	6000	2
201	4800	1
202	4500	2
301	8000	1
302	8000	1
303	7500	2

employee_id	department	salary
101	Sales	5000
102	Sales	6000
103	Sales	7000
201	HR	4500
202	HR	4800
301	IT	8000

employee_id	salary	PrevSalary
101	5000	NULL
102	6000	5000
103	7000	6000
201	4500	NULL
202	4800	4500
301	8000	NULL

How to use the Window and OFFSET Functions in DAX Measures:

The DAX Window function and DAX Offset function make life easier than trying to access a specific slice of a table in a DAX measure.

The DAX Window Function

To use the Window function, we can start by creating a virtual table of customer sales by sales key using the CALCULATETABLE function and then creating a window on the table to return only the results we want by position. We can make use of a DAX query to do this.

DEFINE

VAR CustomerSales =  CalculateTable(SUMMARIZECOLUMNS('V_FactOnlineSales'[CustomerKey], "Sales", SUMX('V_FactOnlineSales', V_FactOnlineSales[SalesAmount])), NOT(ISBLANK(V_FactOnlineSales[CustomerKey])))

VAR CustomerSalesWindow = 

    WINDOW(
        1,ABS,3,ABS,
      CustomerSales, 
        ORDERBY([Sales], DESC), 
    )
    
EVALUATE
CustomerSalesWindow


DEFINE

VAR CustomerSales =  CalculateTable(SUMMARIZECOLUMNS('V_FactOnlineSales'[CustomerKey], "Sales", SUMX('V_FactOnlineSales', V_FactOnlineSales[SalesAmount])), NOT(ISBLANK(V_FactOnlineSales[CustomerKey])))

VAR CustomerSalesWindow = 

    WINDOW(
        1,ABS,3,ABS,
      CustomerSales, 
        ORDERBY([Sales], DESC), 
    )
    
EVALUATE
CustomerSalesWindow

As you can see from the code above, we start off by creating a table in a variable called CustomerSales, which has a column for CustomerKey and a summarized sales column. The table is filtered so that the customer is not blank.

We then create a second variable called ‘CustomerSalesWindow’ to filter our CustomerSales table using the Window function.
The initial customer sales table is as follows

We then create the window using the following code:

VAR CustomerSalesWindow = 

    WINDOW(
        1,ABS,3,ABS,
      CustomerSales, 
        ORDERBY([Sales], DESC), 
    )

EVALUATE
CustomerSalesWindow


VAR CustomerSalesWindow = 

    WINDOW(
        1,ABS,3,ABS,
      CustomerSales, 
        ORDERBY([Sales], DESC), 
    )

EVALUATE
CustomerSalesWindow

As you can see the original table is sorted by sales in descending order (we can also use partition, but not In this example).

The syntax for the window function is as follows:
WINDOW ( from[, from_type], to[, to_type][, <relation> or <axis>][, <orderBy>][, <blanks>][, <partitionBy>][, <matchBy>][, <reset>] )

So you can see that we have:
WINDOW(
from: 1
from-type: ABS
to: 1
to-type: ABS
relation: CustomerSales
ORDERBY([Sales[, DESC))

As you can see the ORDERBY and PARTITION are very similar to SQL in that it is sorting the sales field in descending order.
The positions from ABS: 1 to ABS: 3 give us the top 3 sales results.

Of course, we could have used something like TOPN() function to get the top 3 sales, but if we wanted a specific location e.g., the 3rd to 5th positions, then the WINDOW() function would make it much easier.

The DAX OFFSET Function

The OFFSET() function returns a single row that is positioned either before or after the current row within the same table, by a given offset.

In this example, we use OFFSET() to display the previous month’s sales in a table. We start off by creating a _SummarySales table and then add a second column _SalesOffest, which uses -1 as the delta, which shows the previous month’s sales when the ORDERBY clause on CalanderMonth is set to ascending order (ASC). We can then add the final SalesMonthOnMonth column to the table to get the final summary.

AFDEFINE
	-- Syntax:
	-- OFFSET ( <delta>[, <relation> or <axis>][, <orderBy>][, <blanks>][, <partitionBy>][, <matchBy>][, <reset>] )
	
	-- Create Sales by moth table
	
	VAR _SummarySales = SUMMARIZECOLUMNS(
		DimDate[CalendarMonth],
		"Sales", SUMX(
			RELATEDTABLE(FactOnlineSales),
			FactOnlineSales[SalesAmount]
		)
	)

-- Add the offset column by one month

	VAR _SalesOffset =
	ADDCOLUMNS(
		_SummarySales,
		"PreviousMonth", SELECTCOLUMNS(
			OFFSET(
				-1,
				_SummarySales,
				ORDERBY(
					[CalendarMonth],
					ASC
				)
			),
			[Sales]
		)
	)

-- Add a month on month difference column

	VAR _SalesMonthOnMonth =
	ADDCOLUMNS(
		_SalesOffset,
		"Month on Month", [Sales] - [PreviousMonth]
	)

EVALUATE
	_SalesMonthOnMonth

AFDEFINE
	-- Syntax:
	-- OFFSET ( <delta>[, <relation> or <axis>][, <orderBy>][, <blanks>][, <partitionBy>][, <matchBy>][, <reset>] )
	
	-- Create Sales by moth table
	
	VAR _SummarySales = SUMMARIZECOLUMNS(
		DimDate[CalendarMonth],
		"Sales", SUMX(
			RELATEDTABLE(FactOnlineSales),
			FactOnlineSales[SalesAmount]
		)
	)

-- Add the offset column by one month

	VAR _SalesOffset =
	ADDCOLUMNS(
		_SummarySales,
		"PreviousMonth", SELECTCOLUMNS(
			OFFSET(
				-1,
				_SummarySales,
				ORDERBY(
					[CalendarMonth],
					ASC
				)
			),
			[Sales]
		)
	)

-- Add a month on month difference column

	VAR _SalesMonthOnMonth =
	ADDCOLUMNS(
		_SalesOffset,
		"Month on Month", [Sales] - [PreviousMonth]
	)

EVALUATE
	_SalesMonthOnMonth

After adding the previous month column, we can then add a month-on-month column.
This gives us a table as follows:

You can check out my DAX functions reference here

December 1, 2024

How to filter DAX Measures In Power BI

To filter DAX measures, the application of various filter arguments is crucial to controlling the output of DAX measures when visual filters are present on a report page. When CALCULATE() is used in an expression, any filters applied will override any existing filters on the filter being applied.
e.g. CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimProduct[BrandName]=”Litware”)
Any filter on V_DimProduct[BrandName]is removed and the new filter is applied. So a slicer on that column would stop working.

In this example, we create a number of measures in a matrix, using different filter functions. There are 3 slicers that can be applied to the matrix: Year, BrandName, and ProductCategoryName. The original simple measure is sales amount, which is simply just a sum of sales amount from the FactOnlineSales table from the Contoso Retail Datawarehouse.

Functions to Filter DAX Measures

Simple SUMX Measure:
The original measure with no filter arguments will just be affected by the current filter context applied by both slicers and matrix or table rows and columns
Sales Amount =
SUM(V_FactOnlineSales[SalesAmount])

CALCULATE(): Using the powerful Calculate() function, we can apply filters to the original measure to change the way the visual filters affect the measure.

Sales Amount (Filter Brand Litware) =CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimProduct[BrandName]=”Litware”)

In the next example, we apply a filter using calculate, but on the year, which is a filter in the matrix. As you can see, the filter on year is removed from the columns and the sum of sales for 2008 is repeated for each year from 2007 to 2009.

Sales Amount (Filter Year 2008) = CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimDate[CalendarYear]=2008).

To fix this issue, we can use KEEPFILTERS()

KEEPFILTERS(): The KEEPFILTERS() function helps us keep the filters on Year in the matrix:
Sales Amount (KeepFilters) =

CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimDate[CalendarYear]=2008, KEEPFILTERS(V_DimDate))

REMOVEFILTERS(): By using the REMOVEFILTERS() function along with CALCULATE() we can remove all the filters applied to the report. As you can see in the table above, this removes the filters from the Year, Brand, and Product Category Name columns. Essentially, giving us the total sales for the company for all time.

Sales Amount (RemoveFilters: all) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS())

REMOVEFILTERS(‘TableName’): We can also use the REMOVEFILTERS() function to remove filters only on a specific table (not all tables). In this example, we remove any filters on the V_DimProduct table. Other filters will continue to filter the measure:

Sales Amount (RemoveFilters: Table) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS(V_DimDate))

REMOVEFILTERS(‘TableName'[ColumnName]): A more granular method is to remove filters just on specific columns. In this example, we remove any filtering applied by V_DimProductCategory'[ProductCategoryName]

Sales Amount (RemoveFilters: Column) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS(‘V_DimProductCategory'[ProductCategoryName]))

ALL(‘TableName’): Removes all filters from a specified table. This is similar to REMOVEFILTERS(‘TableNAme’), but it works differently.

Sales Amount (All: DimProductCategory Table) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALL(V_DimProductCategory))

ALL(‘ColumnName’): Remove all filters from specified columns:

ALLEXCEPT(): Remove all filters except the specified column or columns. In this example, we remove all filters except from the V_DimProductCategory'[ProductCategoryName] column.

Sales Amount (AllExcept: ProductCategoryName) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALLEXCEPT(V_DimProductCategory, ‘V_DimProductCategory'[ProductCategoryName]))

ALLNONBLANKROWS(): If your data has blank rows in it, then this may be useful to you. If it doesn’t then you don’t need it. You can apply it to a table or columns:

SalesTableRows = SalesTableRows (non blank) = COUNTROWS(ALLNOBLANKROW(‘V_FactOnlineSales’))
SalesOrderLines (non blank) = COUNTROWS(ALLNOBLANKROW(‘V_FactOnlineSales'[SalesOrderLineNumber]))

ALLSELECTED(‘TableName’): Removes filters coming from within the visual. So in this example, ALLSELECTED() removes the filters on year within the matrix i.e the columns, but if you use the slicer on year, it will still work. You can also apply to columns.

Sales Amount (AllSelected DimDate) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALLSELECTED(V_DimDate))

Using multiple Tables and Columns in Filter Expressions

The above use of filter functions affecting visuals can be further expanded to include multiple tables and multiple columns as in the examples below:

CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimProduct[BrandName]=”Litware” && V_DimProduct[BrandName]=”Northwind Traders”)

Sales Amount (Filter Multiple Year) =

CALCULATE(SUM(V_FactOnlineSales[SalesAmount]), V_DimDate[CalendarYear] IN {2008, 2009})

Sales Amount (RemoveFilters multiple) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), REMOVEFILTERS(‘V_DimProductCategory'[ProductCategoryName], ‘V_DimProductCategory'[ProductCategoryDescription]))

Sales Amount (All: multiple) = CALCULATE(SUMX(V_FactOnlineSales, V_FactOnlineSales[SalesAmount]), ALL(V_DimProductCategory), ALL(V_DimDate))

Recommended Book: The Definitive Guide to DAX by Marco Russo and Alberto Ferrari

Also see my DAX Function Reference.

December 1, 2024

How to Create Multiple Measures With Power BI DAX Query Editor

Creating multiple measures in DAX Query Editor

It may be useful to create templates for creating DAX models, using commonly used DAX measures. Here we create a template for commonly used sales measures, such as Sales YTD, Sales Last Year, Sales Year on Year, and Sales Year on Year %, we can then apply the same logic for QTD and MTD.

The full code for creating the measures, I will add to the bottom of this page.

For this example, I am using just 2 tables from the Contoso Retail Data warehouse: DimDate and FactSales. They are joined on the DateKey.

We start with the Year sales measures as shown below in the DAX query editor. To add them to the model, we just click ‘Update model: Add new measure’, but first we want to format the code, using the Format Query button.

Here is the code, with the DAX formatted.

We can then click the 4 ‘Update mode: Add new measure’ texts and it will add the 4 measures to the model..

We can then create similar measures for QTD and MTD as follows:

Here is the code for the Quarterly measures:

The code for creating the Monthly measures is as follows:

That gives me the 12 measures in record time!

As promised here is the full code that can be copied and pasted. Of course, you’ll need to change the table names as required. Note, I have created an empty ‘_Meusures’ table to act as a container for the measures.

// Learn more about DAX queries at https://aka.ms/dax-queries
DEFINE
//Year Measures
	MEASURE '_Measures'[Sales_YTD] = CALCULATE(
			SUM(FactSales[SalesAmount]),
			DATESYTD('DimDate'[Datekey])
		)
	MEASURE '_Measures'[Sales_LY_YTD] = CALCULATE(
			SUM(FactSales[SalesAmount]),
			SAMEPERIODLASTYEAR(DATESYTD('DimDate'[Datekey]))
		)
	MEASURE '_Measures'[Sales_YOY] = '_Measures'[Sales_YTD] - '_Measures'[Sales_LY_YTD]
	MEASURE '_Measures'[Sales_YOY%] = ('_Measures'[Sales_YTD] - '_Measures'[Sales_LY_YTD]) / '_Measures'[Sales_LY_YTD]
	
//QTD Measures
	MEASURE '_Measures'[Sales_QTD] = CALCULATE(
			SUM(FactSales[SalesAmount]),
			DATESQTD('DimDate'[Datekey])
		)
	MEASURE '_Measures'[Sales_LY_QTD] = CALCULATE(
			SUM(FactSales[SalesAmount]),
			SAMEPERIODLASTYEAR(DATESQTD('DimDate'[Datekey]))
		)
	MEASURE '_Measures'[Sales_QTD_YOY] = '_Measures'[Sales_QTD] - '_Measures'[Sales_LY_QTD]
	MEASURE '_Measures'[Sales_QTD_YOY%] = ('_Measures'[Sales_QTD] - '_Measures'[Sales_LY_QTD]) / '_Measures'[Sales_LY_QTD]
	
	//MTD Measures
	MEASURE '_Measures'[Sales_MTD] = CALCULATE(
			SUM(FactSales[SalesAmount]),
			DATESMTD('DimDate'[Datekey])
		)
	MEASURE '_Measures'[Sales_LY_MTD] = CALCULATE(
			SUM(FactSales[SalesAmount]),
			SAMEPERIODLASTYEAR(DATESMTD('DimDate'[Datekey]))
		)
	MEASURE '_Measures'[Sales_MTD_YOY] = '_Measures'[Sales_MTD] - '_Measures'[Sales_LY_MTD]
	MEASURE '_Measures'[Sales_MTD_YOY%] = ('_Measures'[Sales_MTD] - '_Measures'[Sales_LY_MTD]) / '_Measures'[Sales_LY_MTD]

August 26, 2024

Topic	SQL Server	Snowflake	BigQuery
Select & Filtering	SELECT TOP 10 * FROM Sales WHERE Amount > 1000;	SELECT * FROM Sales WHERE Amount > 1000 LIMIT 10;	SELECT * FROM `project.dataset.Sales` WHERE Amount > 1000 LIMIT 10;
String Functions	SELECT LEFT(CustomerName, 5), LEN(CustomerName) FROM Customers;	SELECT LEFT(CustomerName, 5), LENGTH(CustomerName) FROM Customers;	SELECT SUBSTR(CustomerName, 1, 5), LENGTH(CustomerName) FROM `project.dataset.Customers`;
Date Functions	SELECT GETDATE() AS CurrentDate, DATEADD(DAY, 7, GETDATE());	SELECT CURRENT_DATE, DATEADD(DAY, 7, CURRENT_DATE);	SELECT CURRENT_DATE(), DATE_ADD(CURRENT_DATE(), INTERVAL 7 DAY);
Joins	SELECT c.CustomerName, o.OrderID FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM Customers c JOIN Orders o ON c.CustomerID = o.CustomerID;	SELECT c.CustomerName, o.OrderID FROM `project.dataset.Customers` c JOIN `project.dataset.Orders` o ON c.CustomerID = o.CustomerID;
Aggregations & Group By	SELECT CustomerID, SUM(Amount) AS TotalAmount FROM Sales GROUP BY CustomerID HAVING SUM(Amount) > 1000;	SELECT CustomerID, SUM(Amount) AS TotalAmount FROM Sales GROUP BY CustomerID HAVING SUM(Amount) > 1000;	SELECT CustomerID, SUM(Amount) AS TotalAmount FROM `project.dataset.Sales` GROUP BY CustomerID HAVING SUM(Amount) > 1000;
Window Functions	SELECT CustomerID, SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalByCustomer FROM Sales;	SELECT CustomerID, SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalByCustomer FROM Sales;	SELECT CustomerID, SUM(Amount) OVER (PARTITION BY CustomerID) AS TotalByCustomer FROM `project.dataset.Sales`;
Creating Tables	CREATE TABLE Customers ( CustomerID INT PRIMARY KEY, CustomerName NVARCHAR(100) );	CREATE TABLE Customers ( CustomerID INT PRIMARY KEY, CustomerName STRING );	CREATE TABLE dataset.Customers ( CustomerID INT64, CustomerName STRING );
Insert	INSERT INTO Customers (CustomerID, CustomerName) VALUES (1, 'John Doe');	INSERT INTO Customers (CustomerID, CustomerName) VALUES (1, 'John Doe');	INSERT INTO dataset.Customers (CustomerID, CustomerName) VALUES (1, 'John Doe');
Update	UPDATE Customers SET CustomerName = 'Jane Doe' WHERE CustomerID = 1;	UPDATE Customers SET CustomerName = 'Jane Doe' WHERE CustomerID = 1;	UPDATE dataset.Customers SET CustomerName = 'Jane Doe' WHERE CustomerID = 1;
Delete	DELETE FROM Customers WHERE CustomerID = 1;	DELETE FROM Customers WHERE CustomerID = 1;	DELETE FROM dataset.Customers WHERE CustomerID = 1;
Case When Statements	SELECT OrderID, CASE WHEN Amount >= 1000 THEN 'High' WHEN Amount >= 500 THEN 'Medium' ELSE 'Low' END AS OrderCategory FROM Sales;	SELECT OrderID, CASE WHEN Amount >= 1000 THEN 'High' WHEN Amount >= 500 THEN 'Medium' ELSE 'Low' END AS OrderCategory FROM Sales;	SELECT OrderID, CASE WHEN Amount >= 1000 THEN 'High' WHEN Amount >= 500 THEN 'Medium' ELSE 'Low' END AS OrderCategory FROM `project.dataset.Sales`;
Start of Month	SELECT OrderDate, DATEADD(MONTH, DATEDIFF(MONTH, 0, OrderDate), 0) AS StartOfMonth FROM Orders;	SELECT OrderDate, DATE_TRUNC('MONTH', OrderDate) AS StartOfMonth FROM Orders;	SELECT OrderDate, DATE_TRUNC(OrderDate, MONTH) AS StartOfMonth FROM `project.dataset.Orders`;
End of Month	SELECT OrderDate, EOMONTH(OrderDate) AS EndOfMonth FROM Orders;	SELECT OrderDate, LAST_DAY(OrderDate, 'MONTH') AS EndOfMonth FROM Orders;	SELECT OrderDate, LAST_DAY(OrderDate, MONTH) AS EndOfMonth FROM `project.dataset.Orders`;
Null Handling	SELECT CustomerID, COALESCE(CustomerName, 'Unknown') AS SafeName FROM Customers;	SELECT CustomerID, COALESCE(CustomerName, 'Unknown') AS SafeName FROM Customers;	SELECT CustomerID, COALESCE(CustomerName, 'Unknown') AS SafeName FROM `project.dataset.Customers`;
Conditional Aggregation	SELECT CustomerID, SUM(CASE WHEN Amount > 1000 THEN Amount ELSE 0 END) AS HighValueSales FROM Sales GROUP BY CustomerID;	SELECT CustomerID, SUM(CASE WHEN Amount > 1000 THEN Amount ELSE 0 END) AS HighValueSales FROM Sales GROUP BY CustomerID;	SELECT CustomerID, SUM(CASE WHEN Amount > 1000 THEN Amount ELSE 0 END) AS HighValueSales FROM `project.dataset.Sales` GROUP BY CustomerID;
Date Difference	SELECT OrderID, DATEDIFF(DAY, OrderDate, ShippedDate) AS DaysToShip FROM Orders;	SELECT OrderID, DATEDIFF(DAY, OrderDate, ShippedDate) AS DaysToShip FROM Orders;	SELECT OrderID, DATE_DIFF(ShippedDate, OrderDate, DAY) AS DaysToShip FROM `project.dataset.Orders`;

Window Topic	SQL Server	Snowflake	BigQuery
Running Total	SELECT CustomerID, OrderDate, Amount, SUM(Amount) OVER ( PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS RunningTotal FROM Sales;	SELECT CustomerID, OrderDate, Amount, SUM(Amount) OVER ( PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS RunningTotal FROM Sales;	SELECT CustomerID, OrderDate, Amount, SUM(Amount) OVER ( PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS RunningTotal FROM `project.dataset.Sales`;
Moving Avg (7 rows)	SELECT OrderDate, Amount, AVG(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS MovAvg7 FROM Sales;	SELECT OrderDate, Amount, AVG(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS MovAvg7 FROM Sales;	SELECT OrderDate, Amount, AVG(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS MovAvg7 FROM `project.dataset.Sales`;
ROW_NUMBER	SELECT *, ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS rn FROM Sales;	SELECT *, ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS rn FROM Sales;	SELECT *, ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderDate DESC ) AS rn FROM `project.dataset.Sales`;
RANK	SELECT CustomerID, Amount, RANK() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS rnk FROM Sales;	SELECT CustomerID, Amount, RANK() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS rnk FROM Sales;	SELECT CustomerID, Amount, RANK() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS rnk FROM `project.dataset.Sales`;
DENSE_RANK	SELECT CustomerID, Amount, DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS drnk FROM Sales;	SELECT CustomerID, Amount, DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS drnk FROM Sales;	SELECT CustomerID, Amount, DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY Amount DESC) AS drnk FROM `project.dataset.Sales`;
NTILE (Quartiles)	SELECT CustomerID, Amount, NTILE(4) OVER (ORDER BY Amount DESC) AS quartile FROM Sales;	SELECT CustomerID, Amount, NTILE(4) OVER (ORDER BY Amount DESC) AS quartile FROM Sales;	SELECT CustomerID, Amount, NTILE(4) OVER (ORDER BY Amount DESC) AS quartile FROM `project.dataset.Sales`;
LAG (Prev Row)	SELECT OrderDate, Amount, LAG(Amount, 1) OVER (ORDER BY OrderDate) AS PrevAmt FROM Sales;	SELECT OrderDate, Amount, LAG(Amount, 1) OVER (ORDER BY OrderDate) AS PrevAmt FROM Sales;	SELECT OrderDate, Amount, LAG(Amount, 1) OVER (ORDER BY OrderDate) AS PrevAmt FROM `project.dataset.Sales`;
LEAD (Next Row)	SELECT OrderDate, Amount, LEAD(Amount, 1) OVER (ORDER BY OrderDate) AS NextAmt FROM Sales;	SELECT OrderDate, Amount, LEAD(Amount, 1) OVER (ORDER BY OrderDate) AS NextAmt FROM Sales;	SELECT OrderDate, Amount, LEAD(Amount, 1) OVER (ORDER BY OrderDate) AS NextAmt FROM `project.dataset.Sales`;
FIRST_VALUE	SELECT OrderDate, Amount, FIRST_VALUE(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS FirstAmt FROM Sales;	SELECT OrderDate, Amount, FIRST_VALUE(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS FirstAmt FROM Sales;	SELECT OrderDate, Amount, FIRST_VALUE(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS FirstAmt FROM `project.dataset.Sales`;
LAST_VALUE*	SELECT OrderDate, Amount, LAST_VALUE(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS LastAmt FROM Sales;	SELECT OrderDate, Amount, LAST_VALUE(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS LastAmt FROM Sales;	SELECT OrderDate, Amount, LAST_VALUE(Amount) OVER ( ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS LastAmt FROM `project.dataset.Sales`;
PERCENT_RANK()	SELECT Amount, PERCENT_RANK() OVER (ORDER BY Amount) AS pct_rank FROM Sales;	SELECT Amount, PERCENT_RANK() OVER (ORDER BY Amount) AS pct_rank FROM Sales;	SELECT Amount, PERCENT_RANK() OVER (ORDER BY Amount) AS pct_rank FROM `project.dataset.Sales`;
CUME_DIST()	SELECT Amount, CUME_DIST() OVER (ORDER BY Amount) AS cume FROM Sales;	SELECT Amount, CUME_DIST() OVER (ORDER BY Amount) AS cume FROM Sales;	SELECT Amount, CUME_DIST() OVER (ORDER BY Amount) AS cume FROM `project.dataset.Sales`;
Share of Total	SELECT CustomerID, Amount, Amount * 1.0 / SUM(Amount) OVER ( PARTITION BY CustomerID ) AS ShareOfCust FROM Sales;	SELECT CustomerID, Amount, Amount / SUM(Amount) OVER ( PARTITION BY CustomerID ) AS ShareOfCust FROM Sales;	SELECT CustomerID, Amount, Amount / SUM(Amount) OVER ( PARTITION BY CustomerID ) AS ShareOfCust FROM `project.dataset.Sales`;
COUNT Over Window	SELECT CustomerID, OrderDate, COUNT(*) OVER ( PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS CntToDate FROM Sales;	SELECT CustomerID, OrderDate, COUNT(*) OVER ( PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS CntToDate FROM Sales;	SELECT CustomerID, OrderDate, COUNT(*) OVER ( PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS CntToDate FROM `project.dataset.Sales`;
Distinct Count in Window	-- COUNT(DISTINCT) OVER is not supported. -- Workaround: distinct per partition via DENSE_RANK. WITH x AS ( SELECT CustomerID, ProductID, DENSE_RANK() OVER ( PARTITION BY CustomerID ORDER BY ProductID ) AS r FROM Sales ) SELECT CustomerID, MAX(r) AS DistinctProducts FROM x GROUP BY CustomerID;	-- COUNT(DISTINCT) OVER not supported in Snowflake windows. -- Use DENSE_RANK or COUNT(DISTINCT) without OVER at final grouping.	-- COUNT(DISTINCT) OVER not supported in BigQuery windows. -- Use DENSE_RANK or COUNT(DISTINCT) at GROUP BY level.

Author: CB

Table of Contents

Converting and Checking Data Types with DAX Functions

DAX Functions for Date, Date Arithmetic, and Time Intelligence

DAX Functions for Changing Table Relationships in Power BI

Arithmetic and Logical DAX functions

DAX Functions using Calculate()

DAX Functions using CALCULATETABLE()

The FILTER() DAX Function

DAX REMOVFILTERS() Function

DAX Table Functions

DAX Code Examples

Power BI Git Integration – GitHub vs. Azure DevOps

Data Retrieval Steps – knowledge to Optimize DAX Measures

Inspecting xmSQL in DAX Studio to Optimize DAX Measures

Setting up Power BI using Invoke Custom Function

Table of Contents

Creating multiple DAX measures with DAX Query

Creating multiple DAX measures with Tabular Editor C# Script

Creating Multiple DAX Measures with TMDL

DAX measure Code example to be used for improving DAX measure speed

DAX Performance Test Results

1. Model Optimization a) Remove unnecessary columns and tables

The DAX Window Function

The DAX OFFSET Function

Functions to Filter DAX Measures

1. Model Optimization

a) Remove unnecessary columns and tables