Increase development to production speed and ensure that your code meets quality standards before it’s deployed.

Photo by Jeremy Perkins on Unsplash

If you like tests — not writing a lot of them and their usefulness then you have come to the right place. I mostly write Spark code using Scala but I see that PySpark is becoming more and more dominant. Unfortunately I often see less tests when it comes to developing Spark code with Python. I think unit testing PySpark code is even easier than Spark-Scala code(unit testing Spark-Scala). Lets go trough an example and see how easy it is to unit test DataFrame transformations.

We will imagine a simple dataset but which is so big that it is more…


Photo by JJ Ying on Unsplash

Azure Data Factory is a great platform to solve data related scenarios. Whether migrating data from on premises to cloud, fetching data from an API, or aggregating some logs. You can even apply transformation while the data is on the move.

I have been using Azure Data Factory mostly for reoccurring (daily, weekly) batch jobs to load different datasets into the DataLake. And it was easy as all you need to do is to set up a pipeline by specifying the Source and Sink. Then add the necessary triggers and you are done:


Photo by Johannes Groll on Unsplash

As a Data Engineer, I often need to write different complexity DataFrame transformations. Often these manipulations can get so complex that on a larger dataset it can take hours or even maybe more to test. When you understand that this is not feasible you need to utilize one of Spark’s great traits — Runs Everywhere. So let us try running our tests in our IDE and see how we can test our DataFrame transformations easily.

First, we need to imagine some data that, for example, is too big for us to test on it directly:


Photo by Christopher Burns on Unsplash

In this part (Part 3) I will try to describe the technologies a Data Engineer should know. Please be aware that this is not a definitive guide that will land you a job or give you a title of a Data Engineer. This is the path I would take based off my current experience if I had to learn everything from scratch. Let’s begin!

Programming languages

In my opinion if you do not know at least some bits of “mother of all query languages” SQL then it can get challenging at some points. Although I have some good news that you will…


Photo by Christopher Burns on Unsplash

When I started out my Data Engineering career I had no idea (Part 1) what does Data and Engineering stand for in IT. As this position was the first and only one in my IT career I had to learn from absolute zero knowledge. I hope I can interpret these words for those who are eying for this position or just interested in finding out what they mean.

Data

The first thing that came to my mind starting to learn Data Engineering is do you pronounce it “Data” or “Data” (interesting how we automatically pronounce it differently). To be honest I…


Photo by Christopher Burns on Unsplash

I feel that at first I need to explain how I became one out of pure coincidence. The word “engineer” is derived from the Latin words ingeniare (“to create, generate, contrive, devise”) and ingenium (“cleverness”) has always attracted my attention and I wanted to become one. So I followed my calling and after finishing my university studies and acquiring a medical device engineer’s degree I worked in this field for about five years.

What I discovered after working in this field for so long is that in my country, unfortunately, a medical device engineer is not exactly an engineer but…

Eriks Dombrovskis

Sharing concepts, ideas, and codes.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store