Currently reading “Data Pipelines Pocket Reference” by James Densmore.
Interesting little book.
In the beginning James gives a summarized checklist of what a Data Engineer needs to have to be succesful.
In this post I’m trying to see if I have what it takes to be a Data Engineer according to the prerequisites as listed in the book.
— SQL and Data- Warehousing & Modeling Fundamentals —
I have developed several data warehouses. This included designing the data model. I have most experience with the Kimball data modeling approach however, I have used the “one big flat table” approach as well when the situation lended itself for it.
Although I have worked with several data warehouses that were built using a data vault data modeling approach, I have never designed a data vault myself. This is a gap in my knowledge that I plan to fill at one point.
— Python and/or Java —
I have experience with Python, but no experience whatsoever with Java. Up till now, I have not needed Java yet. Would be cool to pick up another programming language though, so whenever the need arises I would not mind learning more about the language.
— Distributed computing —
I know of, but have never implemented a distributed computing platform like Hadoop or Apache Spark. This is another gap in my knowledge according to the book.
— Basic system administration (Linux command line) —
Expectations of skills that are mentioned in the book that fall under this header include:
– Analyze application logs: I have some experience with that, especially with containers that run in the cloud however, I would definitely need to increase my knowledge in this area.
– Schedule cron jobs: I should be fine here.
– Troubleshoot firewall and other security settings: I have done this so many times but honestly, it barely ever feels easy. Some problems can be as simple as not being granted access yet to a certain object. Other problems are more complex; Entra ID not being synced with Snowflake which is why a user is not granted a certain role yet, proxy has not yet been set in the environment variables which is why Power BI cannot connect with Snowflake, etc.
— Goal-oriented mentality —
Defined as mainly soft skills in the book: talking to data- analysts, scientists and stakeholders: I do have a lot of experience with this particular demand. I started out building reports in Power BI and Excel which required a lot of communication with stakeholders. I find that talking often and thoroughly with the people for who you develop a certain solution helps a lot with actually developing something that will be useful & used.
How do you score in this checklist?