{"id":449,"date":"2024-08-09T09:34:11","date_gmt":"2024-08-09T09:34:11","guid":{"rendered":"https:\/\/datadandies.nl\/?p=449"},"modified":"2024-08-09T09:34:11","modified_gmt":"2024-08-09T09:34:11","slug":"find-each-occurrence-of-a-combination-of-words-in-a-text-and-include-the-subsequent-word-using-python-and-regex","status":"publish","type":"post","link":"https:\/\/datadandies.nl\/index.php\/2024\/08\/09\/find-each-occurrence-of-a-combination-of-words-in-a-text-and-include-the-subsequent-word-using-python-and-regex\/","title":{"rendered":"Find each occurrence of a combination of words in a text and include the subsequent word using Python and Regex"},"content":{"rendered":"\n<p>\u201cTHE DATABASE MOST USED IN YOUR QUERIES WILL BE DELETED!\u201d<\/p>\n\n\n\n<p>\u201cBut don\u2019t you worry pal, we are gonna migrate the tables and views you use to another database!\u201d<\/p>\n\n\n\n<p>\u201cIf you could just give me a list of the objects you use in the soon-to-be-history-database, that would be great.\u201d<\/p>\n\n\n\n<p>Migrations are a common animal in any data ecosystem. So chances are that you have received the question above. And you are more than willing to oblige. However, you dread having to look through hundreds of CREATE-statements.<\/p>\n\n\n\n<p>Fortunately, you are very savvy and laz\u2026 efficient! Efficient is the word.<\/p>\n\n\n\n<p>First, you will get all the CREATE-statements of all the objects that you created in your database in Snowflake.<\/p>\n\n\n\n<p>Then you will you use Python and regex to scoop up all the mentions of the to-be-deleted database including the attached objects.<\/p>\n\n\n\n<p>And because you are extremely \u2026 efficient! You grab this script from the internet \u2026 From this post even! Here we go:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"--In Snowflake\nselect get_ddl('database', 'CERTAIN_DATABASE')\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Kopieer\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\">--In Snowflake<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">select<\/span><span style=\"color: #D8DEE9FF\"> get_ddl(<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">database<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #D8DEE9FF\">, <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">CERTAIN_DATABASE<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #D8DEE9FF\">)<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"# In VS Code\n\nimport re\n\n# Path.\npath = r'C:\\Users\\highlyefficientuser\\result_of_snowflake_ddl.txt'\n\n# Open file.\nfile = open(path)\n\n# Turn it into a string.\nstring = str(file.read())\n\n# Create regex object for database CERTAIN_DATABASE and the subsequent table untill a space. Ignore casing.\nDBregex = re.compile(r'CERTAIN_DATABASE.\\w*', re.IGNORECASE)\n\n# Save all matching occurrences in a list.\nalleVoorkomens = DBregex.findall(string)\n\n# Lowercase all occurrences. \nfor i in range(len(alleVoorkomens)):\n    alleVoorkomens[i] = alleVoorkomens[i].lower()\n\n# Remove any duplicates using set().\nuniekeVoorkomens = list(set(alleVoorkomens))\n\n# Sort the list alphabetically.\nuniekeVoorkomens.sort()\n\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Kopieer\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #616E88\"># In VS Code<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> re<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Path.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">path <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">r<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #EBCB8B\">C:\\Users\\highlyefficientuser\\result_of_snowflake_ddl.txt<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Open file.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">file <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">path<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Turn it into a string.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">string <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">str<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">read<\/span><span style=\"color: #ECEFF4\">())<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Create regex object for database CERTAIN_DATABASE and the subsequent table untill a space. Ignore casing.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">DBregex <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> re<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">compile<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">r<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #EBCB8B\">CERTAIN_DATABASE.\\w<\/span><span style=\"color: #81A1C1\">*<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> re<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">IGNORECASE<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Save all matching occurrences in a list.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">alleVoorkomens <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> DBregex<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">findall<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">string<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Lowercase all occurrences. <\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">for<\/span><span style=\"color: #D8DEE9FF\"> i <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">range<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #88C0D0\">len<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">alleVoorkomens<\/span><span style=\"color: #ECEFF4\">)):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    alleVoorkomens<\/span><span style=\"color: #ECEFF4\">[<\/span><span style=\"color: #D8DEE9FF\">i<\/span><span style=\"color: #ECEFF4\">]<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> alleVoorkomens<\/span><span style=\"color: #ECEFF4\">[<\/span><span style=\"color: #D8DEE9FF\">i<\/span><span style=\"color: #ECEFF4\">].<\/span><span style=\"color: #88C0D0\">lower<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Remove any duplicates using set().<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">uniekeVoorkomens <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">list<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #88C0D0\">set<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">alleVoorkomens<\/span><span style=\"color: #ECEFF4\">))<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Sort the list alphabetically.<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">uniekeVoorkomens<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">sort<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><\/div>\n\n\n\n<p>A word of warning before I wish you good luck: this only works for fully qualified object names, meaning DATABASE_NAME.SCHEMA_NAME.OBJECT_NAME. When all you\u2019re using in your CREATE-statements are single object names without the database, then (depending on the tool you are using) this will not work.<\/p>\n\n\n\n<p>Good luck!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201cTHE DATABASE MOST USED IN YOUR QUERIES WILL BE DELETED!\u201d \u201cBut don\u2019t you worry pal, we are gonna migrate the tables and views you use to another database!\u201d \u201cIf you could just give me a list of the objects you use in the soon-to-be-history-database, that would be great.\u201d Migrations are a common animal in any&hellip;<\/p>\n<p class=\"more-link\"><a href=\"https:\/\/datadandies.nl\/index.php\/2024\/08\/09\/find-each-occurrence-of-a-combination-of-words-in-a-text-and-include-the-subsequent-word-using-python-and-regex\/\" class=\"themebutton\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[37,39,4],"class_list":["post-449","post","type-post","status-publish","format-standard","hentry","category-blog","tag-python","tag-regex","tag-sql"],"_links":{"self":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/comments?post=449"}],"version-history":[{"count":1,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/449\/revisions"}],"predecessor-version":[{"id":450,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/posts\/449\/revisions\/450"}],"wp:attachment":[{"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/media?parent=449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/categories?post=449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datadandies.nl\/index.php\/wp-json\/wp\/v2\/tags?post=449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}