• How to speed up your python web scraper by using multiprocessing

    In earlier posts, here and here I discussed how to write a scraper and make it secure and foolproof. These things are good to implement but not good enough to make it fast and efficient. In this post, I am going to show how a change of a few lines of code can speed up your web scraper by X times. Keep reading! If you remember the post, I scraped the detail page of OLX. Now, usually, you end up to this page after going thru the listing of such entries. First, I will make a script without multiprocessing, we will see why is it not good and then a scraper…

  • HTML

    How to develop an efficient web scraper in Python

    Last week I was working on a web scraper for a client who needed to get around a million records from a real estate website. After a certain level, the scraper stopped working and the reason was I forgot to put certain checks as I was expecting the client would not go for that route but he DID! A few days back I shared a post about how to write a basic scraper in Python by using Beautifulsoup. In this post, I am going to discuss how to make your scraper more foolproof and user-friendly for non-technical people. 1- Check 200 status code It is always good to check the…

  • Write your first web scraper in Python with Beautifulsoup

    Ok, so I am going to write the simplest web scraper in Python with the help of libraries like requests and BeautifulSoup. Before I move further, allow me to discuss what’s web/HTML scraping. What is Web scraping? According to Wikipedia: Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. This is accomplished by either directly implementing the Hypertext Transfer Protocol (on which the Web is based), or embedding a web browser.   So use scraping technique to access the data from web pages and make it useful for various purposes (e.g: Analysis, aggregation etc). Scraping is not the only way to…

  • A subreddit for web scrappers

    I recently realized that I am in love with Data Scraping. Thanks to Python and Beautifulsoup to get me into this. In past few months I have scraped data from sites like Amazon, Rakuten and NewEgg. I have started a subreddit for developers who love to scrap sites for fun( or for work). I named it Scraping the Web. You can join here and submit your entries.   Happy Scraping!