Clustering depth of tables in Snowflake: a potential indicator of clustering health of a table

Clustering depth indicates if micro-partitions are overlapping with one another. Ideally, micro-partitions are not overlapping and are contiguous. Being contiguous means that one micro-partition seamlessly connects with another micro-partition.

Sometimes however, micro-partitions are overlapping. A reason for this could be a less than optimal clustering key. Whatever the reason, this is not ideal, because now you run the risk of unnecessarily scanning the same data twice when scanning two micro-partitions. A possible solution for a high clustering depth is choosing a clustering key that is better suited for the job.

Below are some numbers to give you an idea of how clustering depth is defined in Snowflake.

When you have a table with no micro-partitions (a table with no data), the clustering depth is 0.

When you have a table that contains data and the micro-partitions are not overlapping and contiguous, the clustering depth is 1.

You can find out the clustering depth of a table by using the system function below.

SELECT SYSTEM$CLUSTERING_DEPTH(‘DATABASE.SCHEMA.TABLE’)

For the record, having a clustering depth of higher than 1 does not mean that you have to burn the entire database down, drop every table in existence and live a life as a potato farmer devoid of any computers from this moment on. Query performance is a better performance indicator.

The Snowflake documentation has a great picture that assists in visualizing clustering depth. See the link below.

https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions#clustering-depth-illustrated

Another link with some additional interesting info:

https://www.linkedin.com/pulse/demystifying-clustering-snowflake-guide-optimize-your-roshan-patil/

Data Dandies

Business Intelligence & Data Oplossingen

Clustering depth of tables in Snowflake: a potential indicator of clustering health of a table

Clustering depth of tables in Snowflake: a potential indicator of clustering health of a table

Leave Comment Reactie annuleren