Using packages in dbt to automate tedious tasks like writing your own sources.yml

Are you a lazy engineer? Then you are good engineer.

Being lazy/good at automation of processes helps save work, time and money.

In Data Build Tool (dbt) there are several packages that can help automate tedious tasks.

The dbt-codegen package provides you with the macro’s generate_base_model and generate_source, among others.

Macro’s in dbt are essentially functions that make your life easier.

The generate_base_model macro creates a SQL statement {{ including( ‘relevant jinja’ ) }} (see what I did there) that includes all the column names.

In staging layers, it is best practise to explicitly write out all the column names that you want to include in the resulting model (meaning, no “select *…” but “select column1, column2, etc…”). This is because whenever there are schema changes, like extra columns added, they get picked up automatically by “select *…”, whether you want it or not. This can result in errors downstream, among other things.

Writing out all column names can become really annoying, real fast though, when you’re dealing with tables with hundreds of columns.

Having to write all those columns yourself or jumping through hoops like Snowflake > add columns in editor > format > copy > etc. = BIG sad.

The generate_base_model macro does this for you. It creates a SQL statement with all the columns.

Then there is the generate_source macro. This macro generates sources.yml, when you have not created it yourself yet. Also very handy and efficient. Moral of the story is, keep being lazy.