• Write your first web crawler in Python Scrapy

    The scraping series will not get completed without discussing Scrapy. In this post I am going to write a web crawler that will scrape data from OLX’s Electronics & Appliances’ items. Before I get into the code, how about having a brief intro of Scrapy itself? What is Scrapy? From Wikipedia: Scrapy (/ˈskreɪpi/ skray-pee)[1] is a free and open source web crawling framework, written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general purpose web crawler.[2] It is currently maintained by Scrapinghub Ltd., a web scraping development and services company.   A web crawling framework which has done all…

  • Write a Gmail autoresponder by using Python Selenium

    In earlier posts(here and here) I discuss how to use Python requests and beautifulsoup library to access and scrape a website. This time I am going to make a simple Gmail Autoresponder that responds to a certain mail. Before I discuss how to do it, a few words about Selenium and why is it going to make our life easier. Advantages of Selenium What one is going to achieve with Selenium by not opting for a lightweight solution based on Python requests and beautifulsoup? Selenium actually automates browser activities by simulating clicks and other events and makes easier to access information that is accessible after executing Javascript on page. Since it’s automating…

  • How to speed up your python web scraper by using multiprocessing

    In earlier posts, here and here I discussed how to write a scraper and make it secure and foolproof. These things are good to implement but not good enough to make it fast and efficient. In this post, I am going to show how a change of a few lines of code can speed up your web scraper by X times. Keep reading! If you remember the post, I scraped the detail page of OLX. Now, usually, you end up to this page after going thru the listing of such entries. First, I will make a script without multiprocessing, we will see why is it not good and then a scraper…

  • 6 things to develop an efficient web scraper in Python

    Last week I was working on a web scraper for a client who needed to get around a million of records from a real estate website. After a certain level the scraper stopped working and the reason was I forgot to put a certain checks as I was expecting client would not go for that route but he DID! A few days back I shared a post about how to write basic scraper in Python by using Beautifulsoup. In this post I am going to discuss how to make your scraper more fool proof and user friendly for non-technical people. 1- Check 200 status code It is always good to check…

  • Write your first web scraper in Python with Beautifulsoup

    Ok, so I am going to write the simplest web scraper in Python with the help of libraries like requests and BeautifulSoup. Before I move further, allow me to discuss what’s web/HTML scraping. What is Web scraping? According to Wikipedia: Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. This is accomplished by either directly implementing the Hypertext Transfer Protocol (on which the Web is based), or embedding a web browser.   So use scraping technique to access the data from web pages and make it useful for various purposes (e.g: Analysis, aggregation etc). Scraping is not the only way to…

  • A subreddit for web scrappers

    I recently realized that I am in love with Data Scraping. Thanks to Python and Beautifulsoup to get me into this. In past few months I have scraped data from sites like Amazon, Rakuten and NewEgg. I have started a subreddit for developers who love to scrap sites for fun( or for work). I named it Scraping the Web. You can join here and submit your entries.   Happy Scraping!