TIL about the QUALIFY statement in SQL
⋅ 2 minute read
Today I came across the QUALIFY
clause which is supported in some SQL dialects. It’s not part of the SQL standard but supported by main analytical databases like BigQuery, Snowflake, Oracle, Databricks, DuckDB, etc. It’s part of
Snowflake’s
dialect which is the data warehouse that I use at work.
The QUALIFY
statement lets me filter the result of a query based on the result of a window function.
The order of execution is the following:
-- SELECT <columns>
-- FROM <table>
-- [GROUP BY ...]
-- [HAVING ...]
-- QUALIFY <window function predicate>
QUALIFY
is for window functions what HAVING
is for group-by statements. It allows to filter based on the outputs of these operations without having to use a subquery or CTE. Therefore I can write more concise SQL.
I often deduplicate data using the row_number()
window function. Sometimes I just want the latest event for a particular user in a dataset.
An example would be:
1-- logs with some user events and timestamps
2with logs_annotated as (
3 select
4 logs.*,
5 row_number() over (partition by user_id order by event_timestamp desc) as row_number
6 from logs
7)
8select * from logs_annotated where row_number = 1
With QUALIFY
I can write one statement instead:
1select * from logs
2qualify row_number() over (partition by user_id order by event_timestamp desc) = 1
This works because the QUALIFY
statement is evaluated after the window function result has been computed and can therefore be used in filtering.
If you have any thoughts, questions, or feedback about this post, I would love to hear it. Please reach out to me via email.