• Introduction to Exploratory Data Analysis in Python

      Recently I finished up Python Graph series by using Matplotlib to represent data in different types of charts. In this post I am giving a brief intro of Exploratory data analysis(EDA) in Python with help of pandas and matplotlib. What is Exploratory data analysis? According to Wikipedia: In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. You can say that EDA is statisticians way of story telling where you explore…

  • Implementing beanstalk to create a scaleable web scraper

    Image Credit (http://blog.hqc.sk.ca/wp-content/uploads/2012/12/Queue-2012-12-11.jpg) Queues are often used to make applications scaleable by offloading the data and process them later. In this post I am going to use BeansTalk queue management system in Python. Before I get into real task, allow me to give a brief intro of Beanstalk. What is Beanstalk? From the official website: Beanstalk is a simple, fast work queue. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously. It’s demon called beanstalkd which you can run on *nix based machines. Since I am on OSX so I called brew install beanstalkd to install it. Once…

  • Develop database driven applications in Python with Peewee

    It is not uncommon that most of the applications these days are interacting with database these days. Specially with RDBMS based engines( DB engines that support SQL). Like any other languages Python also provides native and 3rd party libraries to interact with database. Normally you have to write SQL queries for CRUD operations. That’s OK but at times it happens that things get messy: The Big Boss decided to move from MySQL to.. MSSQL and you have no choice but to nod at him and making changes in your queries compatible with other Db Engine. You gotta make multiple queries just to retrieve a single piece of info from the…

  • How to automate your deployment and SSH activities with Fabric

    It is not uncommon for developers to interact with remote servers. Beside FTP clients many use terminals or consoles to carry out different tasks. SSH is usually used to connect with remote servers and execute different commands; from running git to initiating a web or db server, almost every thing can be done by using an SSH Client. Recently I came across an awesome tool called Fabric. It’s a Python based tool that automates your command line activities. What is Fabric? From official website:   Fabric is a Python (2.5-2.7) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks. It provides a…

  • Write your first web crawler in Python Scrapy

    The scraping series will not get completed without discussing Scrapy. In this post I am going to write a web crawler that will scrape data from OLX’s Electronics & Appliances’ items. Before I get into the code, how about having a brief intro of Scrapy itself? What is Scrapy? From Wikipedia: Scrapy (/ˈskreɪpi/ skray-pee)[1] is a free and open source web crawling framework, written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general purpose web crawler.[2] It is currently maintained by Scrapinghub Ltd., a web scraping development and services company.   A web crawling framework which has done all…

  • HTML

    How to develop an efficient web scraper in Python

    Last week I was working on a web scraper for a client who needed to get around a million records from a real estate website. After a certain level, the scraper stopped working and the reason was I forgot to put certain checks as I was expecting the client would not go for that route but he DID! A few days back I shared a post about how to write a basic scraper in Python by using Beautifulsoup. In this post, I am going to discuss how to make your scraper more foolproof and user-friendly for non-technical people. 1- Check 200 status code It is always good to check the…

  • Write your first web scraper in Python with Beautifulsoup

    Ok, so I am going to write the simplest web scraper in Python with the help of libraries like requests and BeautifulSoup. Before I move further, allow me to discuss what’s web/HTML scraping. What is Web scraping? According to Wikipedia: Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. This is accomplished by either directly implementing the Hypertext Transfer Protocol (on which the Web is based), or embedding a web browser.   So use scraping technique to access the data from web pages and make it useful for various purposes (e.g: Analysis, aggregation etc). Scraping is not the only way to…

  • Python Scrappers

    For different clients I have written various Python Scrappers BeautifulSoup library to fetch data from sites like NewEgg, Amazon, Rakuten (former Buy.com)

  • XBMC Plugin for Online Videos

    I developed an XBMC plug for a client that is installed on an Android based set-top box. It fetches Greek channels’ videos from various online video hosting sites like Youtube, Vimeo etc. Subscribed users are allowed to view videos after providing their credentials. The add-on itself is connected with remote server via REST APIs to retrieve/store data. The plugin has been built in Python. Below is the screen preview of the plugin.