Micro-partitions and clustering in Snowflake

Tables are stored within Snowflake as micro-partitions. Put simply, micro-partitions are basically horizontal slices of the table. These slices divide the table into subsets of rows. This process is called “clustering”.

Clustering is done to avoid unnecessarily reading micro-partitions that are not needed for the query. This leads to more efficient, faster and cheaper results.

A picture says more than a thousand words and the Snowflake documentation has one handy to aid in visualizing the clustering process. This picture also shows that the stored data is organized in a columnar fashion.

https://docs.snowflake.com/en/_images/tables-clustered1.png

Using the Query Profile you can check the amount of partitions scanned for the query and the total amount of queries. If you are using a filter in your query and you find that for a small amount of resulting records a whole lot of partitions were being scanned, you might consider defining an explicit clustering key.  

Leave Comment

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *