This post is part of the Data Engineering and ETL Series. I am taking a short break from Blockchain-related posts. In the past, I have covered Apache Airflow posts here. In this post, I am discussing how to use the CCXT library to grab BTC/USD data from exchanges and create an ETL for data analysis and visualization. I am using the dockerized version of Airflow. It is easy to set up and use proper different images to run different components instead of a one-machine setup. The docker-compose.yml file is available here but I have made a few changes to install custom libraries. Therefore, it is advised you use the file I have…
-
-
Schedule web scrapers with Apache Airflow
This post is the part of Data Engineering Series. In the previous post, I discussed Apache Airflow and it’s basic concepts, configuration, and usage. In this post, I am going to discuss how can you schedule your web scrapers with help of Apache Airflow. I will be using the same example I used in Apache Kafka and Elastic Search example that is scraping https://allrecipes.com because the purpose is to use Airflow. In case you want to learn about scraping you may check the entire series here. So, we will work on a workflow consist of tasks: parse_recipes: It will parse individual recipes. download_image: It downloads recipe image. store_data: Finally store image…
-
Getting started with Apache Airflow
This post is the part of Data Engineering Series. In this post, I am going to discuss Apache Airflow, a workflow management system developed by Airbnb. Earlier I had discussed writing basic ETL pipelines in Bonobo. Bonobo is cool for write ETL pipelines but the world is not all about writing ETL pipelines to automate things. There are other use cases in which you have to perform tasks in a certain order once or periodically. For instance: Monitoring Cron jobs transferring data from one place to other. Automating your DevOps operations. Periodically fetching data from websites and update the database for your awesome price comparison system. Data processing for recommendation based…