Getting started with Apache Airflow

This post is the part of Data Engineering Series. In this post, I am going to discuss Apache Airflow, a workflow management system developed by Airbnb. Earlier I had discussed writing basic ETL pipelines in Bonobo. Bonobo is cool for write ETL pipelines but the world is not all about writing ETL pipelines to automate things. […]

Getting started with Elasticsearch in Python

In this post, I am going to discuss Elasticsearch and how you can integrate with different Python apps. What is ElasticSearch? ElasticSearch (ES) is a distributed and highly available open-source search engine that is built on top of Apache Lucene. It’s an open-source which is built in Java thus available for many platforms. You store […]

5 strategies to write unblock-able web scrapers in Python

People who read my posts in scraping series often contacted me to know how could they write scrapers that don’t get blocked. It is very difficult to write a scraper that NEVER gets blocked but yes, you can increase the life of your web scraper by implementing a few strategies. Today I am going to […]

Getting started with Python and IPFS

In this post I am going to discuss how you can use  decentralized IPFS in your Python apps for storing different kind of data. What is IPFS? From Wikipedia: InterPlanetary File System (IPFS) is a protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system. […]

Introduction to Exploratory Data Analysis in Python

  Recently I finished up Python Graph series by using Matplotlib to represent data in different types of charts. In this post I am giving a brief intro of Exploratory data analysis(EDA) in Python with help of pandas and matplotlib. What is Exploratory data analysis? According to Wikipedia: In statistics, exploratory data analysis (EDA) is an approach […]

Implementing beanstalk to create a scaleable web scraper

Image Credit (http://blog.hqc.sk.ca/wp-content/uploads/2012/12/Queue-2012-12-11.jpg) Queues are often used to make applications scaleable by offloading the data and process them later. In this post I am going to use BeansTalk queue management system in Python. Before I get into real task, allow me to give a brief intro of Beanstalk. What is Beanstalk? From the official website: Beanstalk […]

Develop database driven applications in Python with Peewee

It is not uncommon that most of the applications these days are interacting with database these days. Specially with RDBMS based engines( DB engines that support SQL). Like any other languages Python also provides native and 3rd party libraries to interact with database. Normally you have to write SQL queries for CRUD operations. That’s OK […]