In this post I am going to discuss how you can write ETL jobs in Python by using Bonobo library. Before I get into the library itself, allow me to discuss about ETL itself and why is it needed? What is ETL? ETL is actually short form of Extract, Transform and Load, a process in which data is acquired, changed/processes and then finally get loaded into data warehouse/database(s). You can extract data from data sources like Files, Website or some Database, transform the acquired data and then load the final version into database for business usage. You may ask, Why ETL?, well, what ETL does, many of you might already been doing…


How to setup PHP7.1, Apache 2.2 on Amazon Linux
OK I had no plan to make this post but recently I spent quite a few time to figure it out so thought to make it as a post for self and others who come across issue to deal with this simply thing otherwise. Alright, let’s proceed! Below is the details of my Amazon Distro: [crayon5c6a8a66163d7398895404/] Repo setting Before we proceed, first make sure that you are downloading stuff from the right repository. Remove all irrelevant repos, specially remisafe. [crayon5c6a8a66163df892719834/] You can see list of available repos, if all goes well you should see something like below: [crayon5c6a8a66163e3150561933/] Alright, we now have correct repo setup, it’s time to install things…

Getting started with Python and IPFS
In this post I am going to discuss how you can use decentralized IPFS in your Python apps for storing different kind of data. What is IPFS? From Wikipedia: InterPlanetary File System (IPFS) is a protocol and network designed to create a contentaddressable, peertopeer method of storing and sharing hypermedia in a distributed file system. IPFS was initially designed by Juan Benet, and is now an opensource project developed with help from the community. In simple it’s Amazon S3 on Blockchain. All of your information is available on decentralized network across nodes thus not only make the system scaleable but reliable as well since data which is fed in it…

Introduction to Exploratory Data Analysis in Python
Recently I finished up Python Graph series by using Matplotlib to represent data in different types of charts. In this post I am giving a brief intro of Exploratory data analysis(EDA) in Python with help of pandas and matplotlib. What is Exploratory data analysis? According to Wikipedia: In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. You can say that EDA is statisticians way of story telling where you explore…

Develop your first python app integrated with decentralized Stellar network
In this post I am going to discuss how to create a decentralized blockchain app aka dApp for Stellar Network. I will build a very simple web app, world’s simplest Ecommerce app, called it RocketCommerce where people can buy a single item by paying Lumens(XLM), stellar’s currency. The interface of the app looks like below: Before we get into the development of the application itself, allow me to discuss some background of Blockchain, decentralized apps and Stellar Network itself. What is Blockchain? From Wikipedia: A blockchain, originally block chain is a continuously growing list of records, called blocks, which are linked and secured using cryptography.[1][6] Each block typically contains a cryptographic hash of the previous block,[6] a timestamp and transaction data.[7] By design, a…

2017 in numbers
Credits 2017 is almost over. This year was very exciting in terms of taking initiatives in different aspects of life. I resumed technical blogging, create more open source work and starting video tutorials. Actually it all started when I resumed blogging on this very blog and created a post that became massive hit and made me to continue blogging. BTW I am not new into writing. I had a technical blog back in 2003 where I used to share different things. I had another blog where I used to share the other side of me, that is about politics, religion and current affairs. I am happy to say…

Data Visualization in Python – Subplots in Matplotlib
In this post I am going to discuss a Matplotlib feature which let you add multiple plots within a figure called subplots. Subplots are helpful when you want to show different data presentation in a single view, for instance Dashboards. There are multiple ways you can create subplots but I am here going to discuss the one which let you add graphs in grids by using subplot2grid method. subplot2grid takes two mandatory parameters, the first one is size and the next is location. A typical subplot2grid call will look like below: ax = plt.subplot2grid((2, 2), (0, 0)) The first parameter is the size that is a 2 x 2 grid, the 2nd is…

Data Visualization in Python – Pie charts in Matplotlib
In last post I discussed scatter, today I am going to discuss Pie charts. What are Pie Charts? An Emma chart (or a circle chart) is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair’s Statistical Breviary of 1801.[1][2] Pie charts are good to show proportional…

Data Visualization in Python – Scatter plots in Matplotlib
In last post I talked about plotting histograms, in this post we are going to learn how to use scatter plots with data and why it could be useful. What is Scatter Plot? From Wikipedia: A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram)[3] is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are colorcoded, one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value…

Data Visualization in Python – Histogram in Matplotlib
In the last post I talked about bar graphs and their implementation in Matplotlib. In this post I am going to discuss Histograms, a special kind of bar graphs. What is Histogram? From Wikipedia A histogram is an accurate graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson. It is a kind of bar graph. To construct a histogram, the first step is to “bin” the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each…