• Schedule web scrapers with Apache Airflow

     This post is the part of Data Engineering Series. In the previous post, I discussed Apache Airflow and it’s basic concepts, configuration, and usage. In this post, I am going to discuss how can you schedule your web scrapers with help of Apache Airflow. I will be using the same example I used in Apache Kafka and Elastic Search example that is scraping https://allrecipes.com  because the purpose is to use Airflow. In case you want to learn about scraping you may check the entire series here. So, we will work on a workflow consist of tasks: parse_recipes: It will parse individual recipes. download_image: It downloads recipe image. store_data: Finally store image…