Multi column or multi condition JOIN using Power Query M

Ever wondered how to do a Multi-Column-JOIN in Power Query M?

It is so simple to join TABLE_A to TABLE_B in SQL using multiple conditions on multiple columns:

SQL
SELECT 
*
FROM TABLE_A A
JOIN TABLE_B B 
  ON A.ID = B.ID
  AND A.DATE BETWEEN B.STARTDATE AND B.ENDDATE
SQL

But how do we replicate this in Power Query M?

Turns out we can add a column using the Power Query M function Table.SelectRows. Take a look at the image below.

One downside to this approach is that the performance is terrible for large datasets. This is likely due to the fact that as far as I understand, a nested loop is used: each row of TABLE_A is checked against each row of TABLE_B.

In SQL, JOINs like these perform much better for several reasons: indexing, stats, more efficient query plans like hash JOINs, etc. One thing worth exploring in order to improve performance of this method in Power Query M might be to first JOIN TABLE_A to TABLE_B using the ID column, and then apply the second date-based condition. I haven’t explored this option yet, but if you’re limited to Power Query M and need to make a multi column JOIN happen, that is something I would try.

Power Query
let
  Source = ...

  etc.

  // Start logic multi condition JOIN
  #"AddColumnMultiConditionJoin" = Table.AddColumn(
    // Giving the fact table the "alias": "c" (which is technically not an alias, but a lambda function parameter representing rows of the fact table. It functions as a loop where each row from the fact table "p" is checked against each row from the dimension table "c")
    #"TableFromPreviousStep", "Custom", (c)=>             
      Table.SelectRows(
        // Giving the dimension table containing an ID-column and a start- and enddate column an "alias": "p"
        #"DimensionTableWithIdColumnAndStartdateAndEnddate", (p)=> 
            // Multi condition JOIN using...
            c[ID_COLUMN] = p[ID_COLUMN]                     // ... ID_COLUMN and...
            and c[DATE] >= p[STARTDATE]                     // ... DATE must be larger than STARTDATE and... 
            and c[DATE] <= p[ENDDATE]                       // ... DATE must be smaller than ENDDATE
            )
          )
in
  #"AddColumnMultiConditionJoin"
Power Query

Leave Comment