• Create your first ETL Pipeline in Apache Spark and Python

    In this post, I am going to discuss Apache Spark and how you can create simple but robust ETL pipelines in it. You will learn how Spark provides APIs to transform different data format into Data frames and SQL for analysis purpose and how one data source could be transformed into another without any hassle. What is Apache Spark? According to Wikipedia: Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.   From Official Website: Apache Spark™ is a unified analytics engine for large-scale data processing. In short, Apache Spark is a framework which is…