In this post I am going to discuss how you can write ETL jobs in Python by using Bonobo library. Before I get into the library itself, allow me to discuss about ETL itself and why is it needed? What is ETL? ETL is actually short form of Extract, Transform and Load, a process in which data is acquired, changed/processes and then finally get loaded into data warehouse/database(s). You can extract data from data sources like Files, Website or some Database, transform the acquired data and then load the final version into database for business usage. You may ask, Why ETL?, well, what ETL does, many of you might already been doing…
-
-
Introduction to Exploratory Data Analysis in Python
Recently I finished up Python Graph series by using Matplotlib to represent data in different types of charts. In this post I am giving a brief intro of Exploratory data analysis(EDA) in Python with help of pandas and matplotlib. What is Exploratory data analysis? According to Wikipedia: In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. You can say that EDA is statisticians way of story telling where you explore…
-
How I wrote my first Machine Learning program in 3 days
A few weeks back I was intrigued by Per Harald Borgen’s post Machine Learning in a Week which oversimplified the entire learning and implementing a Machine Learning algorithm in a week on a real dataset. He laid down the framework that how as a programmer one can get into ML thing without getting worried about heavy maths and statistics. It was a good excuse to give this ML thing a chance which I had been trying to do for many years after completing Coursera course. Alright. So on this Monday I started my mission. I had to find out some good dataset to achieve the task. Initially I wanted to use NASA’s meteorites…