staticnotes.org

TIL how to generate a dbt staging model description from snowflake table

⋅ 2 minute read

In my team we use dbt external link to create, document, and test data models in our data warehouse, snowflake external link . We use different layers of models that start from the raw data and in each layer increase complexity and specialization for the target use case:

  1. source data in a raw table
  2. thin staging models (standardization of columns, no joins, no aggregations)
  3. intermediate models (reoccurring building blocks that are defined once and used across models in the next layer)
  4. mart models (most detailed models organised by department / business context / application)

Problem

The creation of staging models can sometimes be a bit mind-numbing, especially when the raw table has many columns. Let’s take a small example table USERS:

id first_name last_name country is_active
1 Peter Clark US True
2 Sarah Svenson Sweden False
3 Roberto Gonzales Spain True

and dbt staging model stg_users.sql:

1select 
2id::number as id, 
3first_name::varchar as first_name, 
4last_name::varchar as last_name, 
5country::varchar as country, 
6is_active::boolean as is_active
7from {{ source('RAW_DATA', 'USERS') }}

The staging model definition typically follows the pattern: raw-table-column-name::type as staging-model-column-name. I have used vim macros in the past to quickly write the staging model definition, but often end up copy-pasting and looking up the inferred types in snowflake for ten minutes.

Solution

We can run two SQL queries in snowflake to generate most of the dbt model defintion:

1desc table "RAW_DATA"."DATA_SCHEMA"."USERS";
2select CONCAT("name", '::', "type", ' as ', "name", ',') as dbt_model_def from table(result_scan(LAST_QUERY_ID()))

The first line (DESCRIBE TABLE) outputs the table definition. The second command takes the column name and type information about the table and concatenates them into the right format:

The output of the query is a table with each row formatted with the above pattern. We can copy-paste the output in one step into our staging model file.

If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.

#dbt   #snowflake   #sql