scraping

Using ScraperAPI to bypass Cloudflare in Python

Introduction Cloudflare’s Captcha solutions are one of the biggest hurdles Python developers usually face while writing a web scraper. Cloudflare offers various solutions like bot detection, CAPTCHA challenges (including the newer Turnstile verification), and IP blocking to prevent automated website access for data retrieval. These protections often result in “Verify You Are Human” checks, 403 Forbidden errors, or the challenging 1020 Access Denied responses. In this post of the scraper series, we will learn about Cloudflare and its service that hinders web scraping and how you can use ScraperAPI’s APIs to bypass Cloudflare’s CAPTCHA/protection techniques. Understanding Cloudflare Protection What is Cloudflare? A global network service provider that offers website security,…

Read More
Build Your Web Scraper with Crawlbase in Python: A Beginner’s Guide

If you’ve ever felt curious about collecting data straight from websites—but instantly thought, “This sounds way too complicated!”—then you’re in for a treat. Web scraping, contrary to popular belief, can be simple, efficient, and even fun… especially if you have the right tools in your arsenal. With just a few lines of Python code and an amazing third-party service like Crawlbase, you can automate the collection of information from the web while bypassing the usual challenges (think CAPTCHA, IP blocks, JavaScript-heavy sites, and more). In this guide, I’m going to walk you step-by-step through the process of building your first web scraper with Python using Crawlbase. By the end, you’ll…

Read More
Amazon Scraper: Get ANY Product Data from Amazon without coding

As the world’s leading e-commerce platform, Amazon not only provides consumers with a seamless and reliable one-stop shopping experience but also serves as a critical data aggregation hub. Its vast repository of product information enables sellers to perform in-depth market research and develop effective sales strategies. How to integrate a substantial amount of data in a structured way to give your e-commerce business a real leg up? It is not a smart move to manually search, paste, and copy the required information directly; while using code for web scraping poses a technical barrier for people lacking a programming background. Fortunately, there is a user-friendly intelligent web scraper available that can…

Read More
Scraping HTML Data with BeautifulSoup [2024 Guide]

Have you ever wondered how to pull out useful information from websites without the hassle? BeautifulSoup is your go-to tool for scraping HTML data effortlessly. In this article, we’ll walk you through the basics of web scraping using BeautifulSoup. No prior experience is needed! With its simple syntax and straightforward approach, you’ll quickly grasp the essentials of parsing HTML and extracting data from web pages. Join us as we explore the world of web scraping in a beginner-friendly way. By the end, you’ll be equipped with the skills to gather valuable insights from any website with ease. Let’s dive in and uncover the magic of BeautifulSoup together! BeautifulSoup Overview You…

Read More
Scraping dynamic websites using Scraper API and Python
Learn how to efficiently and easily scrape modern Javascript enabled websites or Single Page Applications without installing a headless browser and Selenium

In the last post of the scraping series, I showed you how you can use Scraper API, an online data extractor to scrape websites that use proxies hence your chance of getting blocked is reduced. Today I am going to show how you can use Scraper API to scrape websites that are using AJAX to render data with the help of JavaScript, Single Page Applications(SPAs), or scraping websites using frameworks like ReactJS, AngularJS, or VueJS. I will be working on the same code I had written in the introductory post. Let’s work on a simple example. There is a website that tells your IP, called HttpBin. If you load…

Read More
Creating an e-commerce bot to buy online items with ScrapingBee and Python

I wrote about ScrapingBee a couple of years ago where I gave a brief intro about the service. ScrapingBee is a cloud-based scraping service that provides both headless and lightweight typical HTTP request-based scraping services. Recently I discovered that they are providing some cool features which other online services are not providing as such. What are those features? I thought to explore and explain them with a real use case. I used Python language to automate the Daraz group’s shopping website, a famous e-commerce website service in Asian countries like Pakistan, Nepal, Bangladesh, and Sri Lanka. I am automating DarazPK since I am in Pakistan. You can view the demo…

Read More
Develop Ali Express Scraper in Python with Scraper API

This is another post in ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries about blocking and rendering dynamic sites since Scraper API takes care of everything. In this post, we are going to scrape AliExpress. AliExpress is a Chinese B2C portal to buy stuff. The script I am going to make consists of two parts, or I say, two functions: fetch and parse. The fetch will accept a category and return all links of individual items and parse will parse an individual entry and returns a few data points in…

Read More
Develop Google scraper in Python with Scraper API

This is another post in ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries about blocking and rendering dynamic sites since Scraper API takes care of everything. So this post is about scraping Google search results, the script will accept a keyword and would return results across multiple pages. The data will be stored in a text file in JSON format. The code that is parsing the result is pretty straightforward and given below: def google_scraper(query, start=0): records = [] try: URL_TO_SCRAPE = "http://www.google.com/search?q=" + query.replace(' ', '+') +…

Read More
Create Ebay Scraper in Python using Scraper API
Learn how to create an eBay data scraper in Python to fetch item details and price.

In this post of ScrapingTheFamous, I am going o write a scraper that will scrape data from eBay. eBay is an online auction site where people put their listing up for selling stuff based on an auction. Like before, we will be writing the two scripts, one to fetch listing URs and store in a text file and the other to parse those links. The data will be stored in JSON format for further processing. I will be using Scraper API service for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since it takes care of everything. The first script is to fetching listings of a category.…

Read More
Create Amazon Scraper in Python using Scraper API
Learn how to create an Amazon scraper in python to scrape product details like price, ASIN etc

In this post of ScrapingTheFamous, I am going o write a scraper that will scrape data from Amazon. I do not need to tell you what is Amazon. You are here because you already know about it 🙂 So, we are going to write two different scripts: one would be fetch.py that would be fetching URLs of individual listings and save in a text file. Later another script, parse.py that will have a function taking an individual listing URL, scrape data, and save in JSON format. I will be using Scraper API service for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since it…

Read More