{"id":440,"date":"2024-07-14T05:57:43","date_gmt":"2024-07-14T05:57:43","guid":{"rendered":"https:\/\/datadandies.nl\/?p=440"},"modified":"2024-07-14T05:57:43","modified_gmt":"2024-07-14T05:57:43","slug":"append-files-in-a-folder-to-a-single-table-in-snowflake","status":"publish","type":"post","link":"https:\/\/datadandies.nl\/index.php\/2024\/07\/14\/append-files-in-a-folder-to-a-single-table-in-snowflake\/","title":{"rendered":"Append files in a folder to a single table in Snowflake"},"content":{"rendered":"\n<p>Want to append the data of several files into ONE table?<\/p>\n\n\n\n<p>You could do this using Power BI Dataflows. However, you can also do this in Snowflake using the COPY INTO command.<\/p>\n\n\n\n<p>Imagine you have an AWS S3 bucket with a folder containing a lot of files that need to be appended into a single table. The command below will do exactly that.<\/p>\n\n\n\n<p>COPY INTO database.schema.table<\/p>\n\n\n\n<p>FROM @my_stage\/directory<\/p>\n\n\n\n<p>FILE_FORMAT = &#8216;myfileformat&#8217; If new files arrive in the S3 bucket every day, you could schedule the above command using a Task. Using the metadata from the table in Snowflake and the files in the stage referencing the S3 bucket, Snowflake will not duplicate the data in the table, unless you force it to.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"--Only one file is uploaded to the AWS S3 bucket.\n--First the database and schema where the stage is located\nuse test.test\n;\n\n--Check which files are present in the stage; currently only 1\nlist @my_s3_stage\n;\n\n--Check out contents of the file\nselect $1, $2 from @my_s3_stage\n;\n\n--Create table to move the results to\ncreate table test.test.append_test\n(kolom1 int,\nkolom2 varchar)\n;\n\n--Check content of empty table\nselect * from test.test.append_test\n;\n\n--Create file format \ncreate file format fileformat_csv_comma\n    TYPE = 'CSV'\n    FIELD_DELIMITER = ','\n    SKIP_HEADER = 1\n;\n\ncopy into test.test.append_test\n    from @my_s3_stage\/appendtest \n    file_format = 'fileformat_csv_comma'\n;\n\n--Check content of table filled with first file\nselect * from test.test.append_test\n;\n\n--First add second file that needs to be appended to S3 bucket\n--Then COPY INTO again\n--No extra records; using the metadata of the table (which is available for 64 days), the files have not added to the table again, which would cause duplicate records\n--After 64 days the LOAD_UNCERTAIN_FILES will make sure that already loaded files are not loaded again\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Kopieer\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\">--Only one file is uploaded to the AWS S3 bucket.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--First the database and schema where the stage is located<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">use<\/span><span style=\"color: #D8DEE9FF\"> test.test<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Check which files are present in the stage; currently only 1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">list @my_s3_stage<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Check out contents of the file<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">select<\/span><span style=\"color: #D8DEE9FF\"> $<\/span><span style=\"color: #B48EAD\">1<\/span><span style=\"color: #D8DEE9FF\">, $<\/span><span style=\"color: #B48EAD\">2<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> @my_s3_stage<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Create table to move the results to<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">create<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">table<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">test<\/span><span style=\"color: #D8DEE9FF\">.test.append_test<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">(kolom1 <\/span><span style=\"color: #81A1C1\">int<\/span><span style=\"color: #D8DEE9FF\">,<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">kolom2 <\/span><span style=\"color: #81A1C1\">varchar<\/span><span style=\"color: #D8DEE9FF\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Check content of empty table<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">select<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">*<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> test.test.append_test<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Create file format <\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">create<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">file<\/span><span style=\"color: #D8DEE9FF\"> format fileformat_csv_comma<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">TYPE<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">CSV<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    FIELD_DELIMITER <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">,<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    SKIP_HEADER <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #B48EAD\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">copy<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">into<\/span><span style=\"color: #D8DEE9FF\"> test.test.append_test<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> @my_s3_stage<\/span><span style=\"color: #81A1C1\">\/<\/span><span style=\"color: #D8DEE9FF\">appendtest <\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">file_format<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">fileformat_csv_comma<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Check content of table filled with first file<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">select<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">*<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">from<\/span><span style=\"color: #D8DEE9FF\"> test.test.append_test<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--First add second file that needs to be appended to S3 bucket<\/span><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--Then COPY INTO again<\/span><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--No extra records; using the metadata of the table (which is available for 64 days), the files have not added to the table again, which would cause duplicate records<\/span><\/span>\n<span class=\"line\"><span style=\"color: #616E88\">--After 64 days the LOAD_UNCERTAIN_FILES will make sure that already loaded files are not loaded again<\/span><\/span><\/code><\/pre><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Want to append the data of several files into ONE table? You could do this using Power BI Dataflows. However, you can also do this in Snowflake using the COPY INTO command. Imagine you have an AWS S3 bucket with a folder containing a lot of files that need to be appended into a single&hellip;<\/p>\n<p class=\"more-link\"><a href=\"https:\/\/datadandies.nl\/index.php\/2024\/07\/14\/append-files-in-a-folder-to-a-single-table-in-snowflake\/\" class=\"themebutton\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":441,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[44,40],"class_list":["post-440","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-etl","tag-snowflake"],"_links":{"self":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/440","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/comments?post=440"}],"version-history":[{"count":1,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/440\/revisions"}],"predecessor-version":[{"id":442,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/440\/revisions\/442"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/media\/441"}],"wp:attachment":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/media?parent=440"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/categories?post=440"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/tags?post=440"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}