Develop Google scraper in Python with Scraper API

This is another post in ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries about blocking and rendering dynamic sites since Scraper API takes care of everything.

So this post is about scraping Google search results, the script will accept a keyword and would return results across multiple pages. The data will be stored in a text file in JSON format.

The code that is parsing the result is pretty straightforward and given below:

def google_scraper(query, start=0):
    records = []
    try:
        URL_TO_SCRAPE = "http://www.google.com/search?q=" + query.replace(' ', '+') + "&start=" + str(start * 10) \
                        + '&num=10&pws=0'
        print(URL_TO_SCRAPE)
        print("Checking on page# " + str(start + 1))

        payload = {'api_key': API_KEY, 'url': URL_TO_SCRAPE, 'render': 'false'}

        r = requests.get('http://api.scraperapi.com', params=payload, timeout=60)
        soup = BeautifulSoup(r.text, 'lxml')
        results = soup.select('.yuRUbf > a')
        for result in results:
            heading = result.select('h3')
            records.append({'URL': result['href'], 'TITLE': heading[0].text})
            print(heading[0].text, ' ', result['href'])
    except requests.ConnectionError as e:
        print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
        print(str(e))
    except requests.Timeout as e:
        print("OOPS!! Timeout Error. Technical Details given below.\n")
        print(str(e))
    except requests.RequestException as e:
        print("OOPS!! General Error. Technical Details given below.\n")
        print(str(e))
    finally:
        return records

It goes from one page to the next, keeping the search keyword intact. It then parses the H3 and a tag and stores in a JSON structure.

Conclusion

In this post, you learned how you can create a Google Search parser in Python using Scraper API. You do not have to worry about Proxy IPs either nor do you have to pay hundreds of dollars, especially when you are an individual or working in a startup. The company I worked with spends 100s of dollars on a monthly basis just for the proxy IPs.

Oh if you sign up here with my referral link or enter promo code adnan10, you will get a 10% discount on it. In case you do not get the discount then just let me know via email on my site and I’d sure help you out.

The code is available on Github

If you like this post then you should subscribe to my blog for future updates.

* indicates required