TIL about the QUALIFY statement in SQL

2024-09-16 ⋅ 2 minute read

Today I came across the QUALIFY clause which is supported in some SQL dialects. It’s not part of the SQL standard but supported by main analytical databases like BigQuery, Snowflake, Oracle, Databricks, DuckDB, etc. It’s part of Snowflake’s dialect which is the data warehouse that I use at work. The QUALIFY statement lets me filter the result of a query based on the result of a window function.

The order of execution is the following:

-- SELECT <columns>
--   FROM <table>
--   [GROUP BY ...]
--   [HAVING ...]
--   QUALIFY <window function predicate>

QUALIFY is for window functions what HAVING is for group-by statements. It allows to filter based on the outputs of these operations without having to use a subquery or CTE. Therefore I can write more concise SQL.

I often deduplicate data using the row_number() window function. Sometimes I just want the latest event for a particular user in a dataset.

An example would be:

-- logs with some user events and timestamps
with logs_annotated as (
    select
        logs.*,
        row_number() over (partition by user_id order by event_timestamp desc) as row_number
    from logs
)
select * from logs_annotated where row_number = 1

With QUALIFY I can write one statement instead:

select * from logs
qualify row_number() over (partition by user_id order by event_timestamp desc) = 1

This works because the QUALIFY statement is evaluated after the window function result has been computed and can therefore be used in filtering.

If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.

Tags:
#sql #snowflake