Find each occurrence of a combination of words in a text and include the subsequent word using Python and Regex

“THE DATABASE MOST USED IN YOUR QUERIES WILL BE DELETED!”

“But don’t you worry pal, we are gonna migrate the tables and views you use to another database!”

“If you could just give me a list of the objects you use in the soon-to-be-history-database, that would be great.”

Migrations are a common animal in any data ecosystem. So chances are that you have received the question above. And you are more than willing to oblige. However, you dread having to look through hundreds of CREATE-statements.

Fortunately, you are very savvy and laz… efficient! Efficient is the word.

First, you will get all the CREATE-statements of all the objects that you created in your database in Snowflake.

Then you will you use Python and regex to scoop up all the mentions of the to-be-deleted database including the attached objects.

And because you are extremely … efficient! You grab this script from the internet … From this post even! Here we go:

--In Snowflake
select get_ddl('database', 'CERTAIN_DATABASE')
# In VS Code

import re

# Path.
path = r'C:\Users\highlyefficientuser\result_of_snowflake_ddl.txt'

# Open file.
file = open(path)

# Turn it into a string.
string = str(file.read())

# Create regex object for database CERTAIN_DATABASE and the subsequent table untill a space. Ignore casing.
DBregex = re.compile(r'CERTAIN_DATABASE.\w*', re.IGNORECASE)

# Save all matching occurrences in a list.
alleVoorkomens = DBregex.findall(string)

# Lowercase all occurrences. 
for i in range(len(alleVoorkomens)):
    alleVoorkomens[i] = alleVoorkomens[i].lower()

# Remove any duplicates using set().
uniekeVoorkomens = list(set(alleVoorkomens))

# Sort the list alphabetically.
uniekeVoorkomens.sort()

A word of warning before I wish you good luck: this only works for fully qualified object names, meaning DATABASE_NAME.SCHEMA_NAME.OBJECT_NAME. When all you’re using in your CREATE-statements are single object names without the database, then (depending on the tool you are using) this will not work.

Good luck!

Leave Comment

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *