The scraping series will not get completed without discussing Scrapy. In this post I am going to write a web crawler that will scrape data from OLX’s Electronics & Appliances’ items. Before I get into the code, how about having a brief intro of Scrapy itself? What is Scrapy? From Wikipedia: Scrapy (/ˈskreɪpi/ skray-pee)[1] is a free and open source web crawling framework, written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general purpose web crawler.[2] It is currently maintained by Scrapinghub Ltd., a web scraping development and services company. A web crawling framework which has done all…
-
-
How to develop an efficient web scraper in Python
Last week I was working on a web scraper for a client who needed to get around a million records from a real estate website. After a certain level, the scraper stopped working and the reason was I forgot to put certain checks as I was expecting the client would not go for that route but he DID! A few days back I shared a post about how to write a basic scraper in Python by using Beautifulsoup. In this post, I am going to discuss how to make your scraper more foolproof and user-friendly for non-technical people. 1- Check 200 status code It is always good to check the…
-
Write your first web scraper in Python with Beautifulsoup
Ok, so I am going to write the simplest web scraper in Python with the help of libraries like requests and BeautifulSoup. Before I move further, allow me to discuss what’s web/HTML scraping. What is Web scraping? According to Wikipedia: Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. This is accomplished by either directly implementing the Hypertext Transfer Protocol (on which the Web is based), or embedding a web browser. So use scraping technique to access the data from web pages and make it useful for various purposes (e.g: Analysis, aggregation etc). Scraping is not the only way to…
-
Python Scrappers
For different clients I have written various Python Scrappers BeautifulSoup library to fetch data from sites like NewEgg, Amazon, Rakuten (former Buy.com)
-
XBMC Plugin for Online Videos
I developed an XBMC plug for a client that is installed on an Android based set-top box. It fetches Greek channels’ videos from various online video hosting sites like Youtube, Vimeo etc. Subscribed users are allowed to view videos after providing their credentials. The add-on itself is connected with remote server via REST APIs to retrieve/store data. The plugin has been built in Python. Below is the screen preview of the plugin.