TIL how to generate a dbt staging model description from snowflake table

⋅ 2 minute read


In my team we use dbt to create, document, and test data models in our data warehouse, snowflake . We use different layers of models that start from the raw data and in each layer increase complexity and specialization for the target use case:

  1. source data in a raw table
  2. thin staging models (standardization of columns, no joins, no aggregations)
  3. intermediate models (reoccurring building blocks that are defined once and used across models in the next layer)
  4. mart models (most detailed models organised by department / business context / application)

Problem

The creation of staging models can sometimes be a bit mind-numbing, especially when the raw table has many columns. Let’s take a small example table USERS:

id first_name last_name country is_active
1 Peter Clark US True
2 Sarah Svenson Sweden False
3 Roberto Gonzales Spain True

and dbt staging model stg_users.sql:

select 
id::number as id, 
first_name::varchar as first_name, 
last_name::varchar as last_name, 
country::varchar as country, 
is_active::boolean as is_active
from {{ source('RAW_DATA', 'USERS') }}

The staging model definition typically follows the pattern: raw-table-column-name::type as staging-model-column-name. I have used vim macros in the past to quickly write the staging model definition, but often end up copy-pasting and looking up the inferred types in snowflake for ten minutes.

Solution

We can run two SQL queries in snowflake to generate most of the dbt model defintion:

desc table "RAW_DATA"."DATA_SCHEMA"."USERS";
select CONCAT("name", '::', "type", ' as ', "name", ',') as dbt_model_def from table(result_scan(LAST_QUERY_ID()))

The first line (DESCRIBE TABLE) outputs the table definition. The second command takes the column name and type information about the table and concatenates them into the right format:

  • LAST_QUERY_ID(): returns the id of the desc table query
  • RESULT_SCAN(): returns the result of the last command
  • TABLE(): converts the output to a queryable table

The output of the query is a table with each row formatted with the above pattern. We can copy-paste the output in one step into our staging model file.


If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.

Tags:
#dbt   #snowflake   #sql  

Related: