{"id":619,"date":"2025-02-01T07:30:48","date_gmt":"2025-02-01T07:30:48","guid":{"rendered":"https:\/\/datadandies.nl\/?p=619"},"modified":"2025-02-01T07:30:48","modified_gmt":"2025-02-01T07:30:48","slug":"reduce-storage-costs-compression-techniques-run-length-encoding-dictionary-encoding","status":"publish","type":"post","link":"https:\/\/datadandies.nl\/index.php\/2025\/02\/01\/reduce-storage-costs-compression-techniques-run-length-encoding-dictionary-encoding\/","title":{"rendered":"Reduce storage costs: compression techniques Run Length Encoding &#038; Dictionary Encoding"},"content":{"rendered":"\n<p>\ud835\udc02\ud835\udc28\ud835\udc26\ud835\udc29\ud835\udc2b\ud835\udc1e\ud835\udc2c\ud835\udc2c\ud835\udc22\ud835\udc27\ud835\udc20 \ud835\udc1d\ud835\udc1a\ud835\udc2d\ud835\udc1a \ud835\udc2b\ud835\udc1e\ud835\udc1d\ud835\udc2e\ud835\udc1c\ud835\udc1e\ud835\udc2c \ud835\udc2c\ud835\udc2d\ud835\udc28\ud835\udc2b\ud835\udc1a\ud835\udc20\ud835\udc1e (\ud835\udc1c\ud835\udc28\ud835\udc2c\ud835\udc2d\ud835\udc2c).&nbsp;<\/p>\n\n\n\n<p>Two compression techniques that are used a lot are called Run Length Encoding (RLE) and Dictionary Encoding (DE).<\/p>\n\n\n\n<p>These compression methods are used in the file format parquet but also by the Vertipaq engine in Power BI.<\/p>\n\n\n\n<p>Knowing about these compression techniques helps you better understand why the file size has decreased when you change a certain file from e.g. CSV to parquet. &nbsp;<\/p>\n\n\n\n<p>\ud835\udc11\ud835\udc2e\ud835\udc27 \ud835\udc0b\ud835\udc1e\ud835\udc27\ud835\udc20\ud835\udc2d\ud835\udc21 \ud835\udc04\ud835\udc27\ud835\udc1c\ud835\udc28\ud835\udc1d\ud835\udc22\ud835\udc27\ud835\udc20:<\/p>\n\n\n\n<p>Original Column \u2502 Run Length Encoded Column \u2502 Frequency<\/p>\n\n\n\n<p>\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/p>\n\n\n\n<p>A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 1<\/p>\n\n\n\n<p>B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 1<\/p>\n\n\n\n<p>A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 2<\/p>\n\n\n\n<p>A&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 1<\/p>\n\n\n\n<p>B&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 C&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 2<\/p>\n\n\n\n<p>C&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>C&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>\ud835\udc03\ud835\udc22\ud835\udc1c\ud835\udc2d\ud835\udc22\ud835\udc28\ud835\udc27\ud835\udc1a\ud835\udc2b\ud835\udc32 \ud835\udc04\ud835\udc27\ud835\udc1c\ud835\udc28\ud835\udc1d\ud835\udc22\ud835\udc27\ud835\udc20:<\/p>\n\n\n\n<p>Original Column&nbsp;&nbsp; \u2502 Dictionary Encoded Column \u2502 Frequency<\/p>\n\n\n\n<p>\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500<\/p>\n\n\n\n<p>Membership sales&nbsp; \u2502 Memb&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 2<\/p>\n\n\n\n<p>Membership sales&nbsp; \u2502 Car&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 1<\/p>\n\n\n\n<p>Car sales&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 Bike&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \u2502 1<\/p>\n\n\n\n<p>Bike sales&nbsp;&nbsp;&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\ud835\udc02\ud835\udc28\ud835\udc26\ud835\udc29\ud835\udc2b\ud835\udc1e\ud835\udc2c\ud835\udc2c\ud835\udc22\ud835\udc27\ud835\udc20 \ud835\udc1d\ud835\udc1a\ud835\udc2d\ud835\udc1a \ud835\udc2b\ud835\udc1e\ud835\udc1d\ud835\udc2e\ud835\udc1c\ud835\udc1e\ud835\udc2c \ud835\udc2c\ud835\udc2d\ud835\udc28\ud835\udc2b\ud835\udc1a\ud835\udc20\ud835\udc1e (\ud835\udc1c\ud835\udc28\ud835\udc2c\ud835\udc2d\ud835\udc2c).&nbsp; Two compression techniques that are used a lot are called Run Length Encoding (RLE) and Dictionary Encoding (DE). These compression methods are used in the file format parquet but also by the Vertipaq engine in Power BI. Knowing about these compression techniques helps you better understand why the file size&hellip;<\/p>\n<p class=\"more-link\"><a href=\"https:\/\/datadandies.nl\/index.php\/2025\/02\/01\/reduce-storage-costs-compression-techniques-run-length-encoding-dictionary-encoding\/\" class=\"themebutton\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[60,3],"class_list":["post-619","post","type-post","status-publish","format-standard","hentry","category-blog","tag-dataengineering","tag-powerbi"],"_links":{"self":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/619","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/comments?post=619"}],"version-history":[{"count":1,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/619\/revisions"}],"predecessor-version":[{"id":620,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/619\/revisions\/620"}],"wp:attachment":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/media?parent=619"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/categories?post=619"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/tags?post=619"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}