• Scraping dynamic websites using Scraper API and Python
    Learn how to efficiently and easily scrape modern Javascript enabled websites or Single Page Applications without installing a headless browser and Selenium

      In the last post of the scraping series, I showed you how you can use Scraper API, an online data extractor to scrape websites that use proxies hence your chance of getting blocked is reduced. Today I am going to show how you can use Scraper API to scrape websites that are using AJAX to render data with the help of JavaScript, Single Page Applications(SPAs), or scraping websites using frameworks like ReactJS, AngularJS, or VueJS. I will be working on the same code I had written in the introductory post. Let’s work on a simple example. There is a website that tells your IP, called HttpBin. If you load…

  • Creating an e-commerce bot to buy online items with ScrapingBee and Python

    I wrote about ScrapingBee a couple of years ago where I gave a brief intro about the service. ScrapingBee is a cloud-based scraping service that provides both headless and lightweight typical HTTP request-based scraping services. Recently I discovered that they are providing some cool features which other online services are not providing as such. What are those features? I thought to explore and explain them with a real use case. I used Python language to automate the Daraz group’s shopping website, a famous e-commerce website service in Asian countries like Pakistan, Nepal, Bangladesh, and Sri Lanka.  I am automating DarazPK since I am in Pakistan. You can view the demo…

  • Develop Ali Express Scraper in Python with Scraper API

    This is another post in ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries about blocking and rendering dynamic sites since Scraper API takes care of everything. In this post, we are going to scrape AliExpress. AliExpress is a Chinese B2C portal to buy stuff. The script I am going to make consists of two parts, or I say, two functions: fetch and parse. The fetch will accept a category and return all links of individual items and parse will parse an individual entry and returns a few data points in…

  • Develop Google scraper in Python with Scraper API

    This is another post in ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries about blocking and rendering dynamic sites since Scraper API takes care of everything. So this post is about scraping Google search results, the script will accept a keyword and would return results across multiple pages. The data will be stored in a text file in JSON format. The code that is parsing the result is pretty straightforward and given below: def google_scraper(query, start=0): records = [] try: URL_TO_SCRAPE = "http://www.google.com/search?q=" + query.replace(' ', '+') +…

  • HTML

    Create Ebay Scraper in Python using Scraper API
    Learn how to create an eBay data scraper in Python to fetch item details and price.

    In this post of ScrapingTheFamous, I am going o write a scraper that will scrape data from eBay. eBay is an online auction site where people put their listing up for selling stuff based on an auction. Like before, we will be writing the two scripts, one to fetch listing URs and store in a text file and the other to parse those links. The data will be stored in JSON format for further processing. I will be using Scraper API service for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since it takes care of everything. The first script is to fetching listings of a category.…

  • Create Amazon Scraper in Python using Scraper API
    Learn how to create an Amazon scraper in python to scrape product details like price, ASIN etc

    In this post of ScrapingTheFamous, I am going o write a scraper that will scrape data from Amazon. I do not need to tell you what is Amazon. You are here because you already know about it 🙂 So, we are going to write two different scripts: one would be fetch.py that would be fetching URLs of individual listings and save in a text file. Later another script, parse.py that will have a function taking an individual listing URL, scrape data, and save in JSON format. I will be using Scraper API service for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since it…

  • HTML

    Create your first Web scraper in Go with goQuery
    A beginners tutorial for writing web scrapers in Go language for Yelp.

    Planning to write a book about Web Scraping in Python. Click here to give your feedback I have been covering web scraping for a long time on this blog for a long time but they were mostly in Python; be it requests, Selenium or Scrapy framework, all were based on Python language but scraping is not limited to a specific language. Any language that provides APIs or libraries for an Http client and HTML parser is able to provide you web scraping facility. Go also provides you the ability to write web scrapers. Go is a compiled and static type language and could be very beneficial to write efficient and…

  • Create your first web scraper with ScrapingBee API and Python
    Learn how to use cloud based Scraping API to scrape web pages without getting blocked.

    In this post, I am going to discuss another cloud-based scraping tool that takes care of many of the issues you usually face while scraping websites. This platform has been introduced by ScrapingBee, a cloud-based Scraping tool. What is ScrapingBee If you visit their website, you will find something like below: ScrapingBee API handles headless browsers and rotates proxies for you. As it suggests, it is offering you all the things to deal with the issues you usually come across while writing your scrapers, especially the availability of proxies and headless scraping. No installation of web drivers for Selenium, yay! Development ScrapingBee is based on REST API hence it can…

  • HTML

    Develop AirBnb Parser in Python

    Planning to write a book about Web Scraping in Python. Click here to give your feedback So I am starting a new scraping series, called, ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since Scraper API takes care of everything. Anyways, the first post is about Airbnb. We will be scraping some important data points from it. We will be scraping a list of rental URL and fetch and store data in JSON format. So let’s start! The URL we…

  • HTML

    Advanced Proxy Use for Web Scraping

    Guest Post by Vytautas Kirjazovas from Oxylabs.io In the eyes of many, web scraping is an art. It is safe to state that the majority of web scraping enthusiasts have faced bans from websites more than once during their careers. Web scraping is a challenging task, and it’s more common than you think to see your crawlers getting banned by websites. In this article, we’ll talk about more advanced ways to use proxies for web scraping. There are some key components that you should take into account with web scraping to avoid getting banned too quickly: Set browser-like headers User-Agent that can be found in real life. Referer header. Other…