This post is part of Data Engineering and ETL Series. In this post, I am going to introduce another ETL tool for your Python applications, called Apache Beam. What is Apache Beam? According to Wikipedia: Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.. Unlike Airflow and Luigi, Apache Beam is not a server. It is rather a programming model that contains a set of APIs. Currently, they are available for Java, Python and Go programming languages. A typical Apache Beam based pipeline looks like below: (Image Source: https://beam.apache.org/images/design-your-pipeline-linear.svg) From the left, the data is being…
-
Create your first ETL Pipeline in Apache Beam and Python