Ever wondered how to do a Multi-Column-JOIN in Power Query M?
It is so simple to join TABLE_A to TABLE_B in SQL using multiple conditions on multiple columns:
SELECT
*
FROM TABLE_A A
JOIN TABLE_B B
ON A.ID = B.ID
AND A.DATE BETWEEN B.STARTDATE AND B.ENDDATESQLBut how do we replicate this in Power Query M?
Turns out we can add a column using the Power Query M function Table.SelectRows. Take a look at the image below.
One downside to this approach is that the performance is terrible for large datasets. This is likely due to the fact that as far as I understand, a nested loop is used: each row of TABLE_A is checked against each row of TABLE_B.
In SQL, JOINs like these perform much better for several reasons: indexing, stats, more efficient query plans like hash JOINs, etc. One thing worth exploring in order to improve performance of this method in Power Query M might be to first JOIN TABLE_A to TABLE_B using the ID column, and then apply the second date-based condition. I haven’t explored this option yet, but if you’re limited to Power Query M and need to make a multi column JOIN happen, that is something I would try.
let
Source = ...
etc.
// Start logic multi condition JOIN
#"AddColumnMultiConditionJoin" = Table.AddColumn(
// Giving the fact table the "alias": "c" (which is technically not an alias, but a lambda function parameter representing rows of the fact table. It functions as a loop where each row from the fact table "p" is checked against each row from the dimension table "c")
#"TableFromPreviousStep", "Custom", (c)=>
Table.SelectRows(
// Giving the dimension table containing an ID-column and a start- and enddate column an "alias": "p"
#"DimensionTableWithIdColumnAndStartdateAndEnddate", (p)=>
// Multi condition JOIN using...
c[ID_COLUMN] = p[ID_COLUMN] // ... ID_COLUMN and...
and c[DATE] >= p[STARTDATE] // ... DATE must be larger than STARTDATE and...
and c[DATE] <= p[ENDDATE] // ... DATE must be smaller than ENDDATE
)
)
in
#"AddColumnMultiConditionJoin"Power Query