Do you have what it takes to be a Data Engineer?

Currently reading “Data Pipelines Pocket Reference” by James Densmore.

Interesting little book.

In the beginning James gives a summarized checklist of what a Data Engineer needs to have to be succesful.

In this post I’m trying to see if I have what it takes to be a Data Engineer according to the prerequisites as listed in the book.

— SQL and Data- Warehousing & Modeling Fundamentals —

I have developed several data warehouses. This included designing the data model. I have most experience with the Kimball data modeling approach however, I have used the “one big flat table” approach as well when the situation lended itself for it.

Although I have worked with several data warehouses that were built using a data vault data modeling approach, I have never designed a data vault myself. This is a gap in my knowledge that I plan to fill at one point.

— Python and/or Java —

I have experience with Python, but no experience whatsoever with Java. Up till now, I have not needed Java yet. Would be cool to pick up another programming language though, so whenever the need arises I would not mind learning more about the language.

— Distributed computing —

I know of, but have never implemented a distributed computing platform like Hadoop or Apache Spark. This is another gap in my knowledge according to the book.

— Basic system administration (Linux command line) —

Expectations of skills that are mentioned in the book that fall under this header include:

–         Analyze application logs: I have some experience with that, especially with containers that run in the cloud however, I would definitely need to increase my knowledge in this area.
–         Schedule cron jobs: I should be fine here.
–         Troubleshoot firewall and other security settings: I have done this so many times but honestly, it barely ever feels easy. Some problems can be as simple as not being granted access yet to a certain object. Other problems are more complex; Entra ID not being synced with Snowflake which is why a user is not granted a certain role yet, proxy has not yet been set in the environment variables which is why Power BI cannot connect with Snowflake, etc.

— Goal-oriented mentality —

Defined as mainly soft skills in the book: talking to data- analysts, scientists and stakeholders: I do have a lot of experience with this particular demand. I started out building reports in Power BI and Excel which required a lot of communication with stakeholders. I find that talking often and thoroughly with the people for who you develop a certain solution helps a lot with actually developing something that will be useful & used.

How do you score in this checklist?

Leave Comment