TIL how to generate a dbt staging model description from snowflake table
⋅ 2 minute read
In my team we use dbt to create, document, and test data models in our data warehouse, snowflake . We use different layers of models that start from the raw data and in each layer increase complexity and specialization for the target use case:
- source data in a raw table
- thin staging models (standardization of columns, no joins, no aggregations)
- intermediate models (reoccurring building blocks that are defined once and used across models in the next layer)
- mart models (most detailed models organised by department / business context / application)
Problem
The creation of staging models can sometimes be a bit mind-numbing, especially when the raw table has many columns. Let’s take a small example table USERS
:
id | first_name | last_name | country | is_active |
---|---|---|---|---|
1 | Peter | Clark | US | True |
2 | Sarah | Svenson | Sweden | False |
3 | Roberto | Gonzales | Spain | True |
and dbt staging model stg_users.sql
:
1select
2id::number as id,
3first_name::varchar as first_name,
4last_name::varchar as last_name,
5country::varchar as country,
6is_active::boolean as is_active
7from {{ source('RAW_DATA', 'USERS') }}
The staging model definition typically follows the pattern: raw-table-column-name::type as staging-model-column-name
. I have used vim macros in the past to quickly write the staging model definition, but often end up copy-pasting and looking up the inferred types in snowflake for ten minutes.
Solution
We can run two SQL queries in snowflake to generate most of the dbt model defintion:
1desc table "RAW_DATA"."DATA_SCHEMA"."USERS";
2select CONCAT("name", '::', "type", ' as ', "name", ',') as dbt_model_def from table(result_scan(LAST_QUERY_ID()))
The first line (DESCRIBE TABLE
) outputs the table definition. The second command takes the column name and type information about the table and concatenates them into the right format:
LAST_QUERY_ID()
: returns the id of the desc table queryRESULT_SCAN()
: returns the result of the last commandTABLE()
: converts the output to a queryable table
The output of the query is a table with each row formatted with the above pattern. We can copy-paste the output in one step into our staging model file.
If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.