Python

Create your first REST API in Django Rest Framework
A step by step guide creating APIs in Django Rest Framework

In this post, I am going to talk about Django Rest Framework or DRF. DRF is used to create RESTful APIs in Django which later could be consumed by various apps; mobile, web, desktop, etc. We will be discussing how to install DRF on your machine and then will be writing our APIs for a system. Before we discuss DRF, let’s talk a bit about REST itself. What is Rest From Wikipedia Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called RESTful Web services, provide interoperability between…

Read More
ScrapingBee API Python Tutorial: How to Use Render JS, Proxies & Headers
Learn how to use cloud based Scraping API to scrape web pages without getting blocked.

In this post, I am going to discuss another cloud-based scraping tool that takes care of many of the issues you usually face while scraping websites. This platform has been introduced by ScrapingBee, a cloud-based Scraping tool. What is ScrapingBee If you visit their website, you will find something like this: ScrapingBee API handles headless browsers and rotates proxies for you. As it suggests, it is offering you all the things to deal with the issues you usually come across while writing your scrapers, especially the availability of proxies and headless scraping. No installation of web drivers for Selenium, yay! Development ScrapingBee is based on REST API hence it can…

Read More
Python for Bioinformatics: Step-by-Step Sequence Analysis Tutorial with Biopython
A Biopython tutorial about DNA, RNA and other sequence analysis

In this post, I am going to discuss how Python is being used in the field of bioinformatics and how you can use it to analyze sequences of DNA, RNA, and proteins. Yeah, Python is being used by biologists as well. Before I get into coding, I’d like to give a brief background of bioinformatics and related things. What is bioinformatics? According to Wikipedia: Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of…

Read More
Develop AirBnb Parser in Python

Planning to write a book about Web Scraping in Python. Click here to give your feedback So I am starting a new scraping series, called, ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since Scraper API takes care of everything. Anyways, the first post is about Airbnb. We will be scraping some important data points from it. We will be scraping a list of rental URL and fetch and store data in JSON format. So let’s start! The URL we…

Read More
Advanced Proxy Use for Web Scraping

Guest Post by Vytautas Kirjazovas from Oxylabs.io In the eyes of many, web scraping is an art. It is safe to state that the majority of web scraping enthusiasts have faced bans from websites more than once during their careers. Web scraping is a challenging task, and it’s more common than you think to see your crawlers getting banned by websites. In this article, we’ll talk about more advanced ways to use proxies for web scraping. There are some key components that you should take into account with web scraping to avoid getting banned too quickly: Set browser-like headers User-Agent that can be found in real life. Referer header. Other…

Read More
Create a simple image search engine in OpenCV and Flask

I recently started playing with OpenCV, an open-source Computer Vision library for image processing. Luckily there are Python bindings available. More luck that the guys like Adrian has done a great service by releasing both book and blog on a similar topic. I have also made a demo which can see below. Conclusion So this was a basic image processing tutorial in OpenCV. It is not a mature product as it can’t tell you about unique colors in a picture. It is just telling you informaiton based on pixel colors rather than performing color segmentation and clustering based on a ML algorithm. So what is it all about? Well, it…

Read More
Create your first web scraper with Scraper API and Python

Recently I come across a tool that takes care of many of the issues you usually face while scraping websites. The tool is called Scraper API which provides an easy to use REST API to scrape a different kind of websites(Simple, JS enabled, Captcha, etc) with quite an ease. Before I proceed further, allow me to introduce Scraper API. What is Scraper API If you visit their website you’d find their mission statement: Scraper API handles proxies, browsers, and CAPTCHAs, so you can get the HTML from any web page with a simple API call! As it suggests, it is offering you all the things to deal with the issues…

Read More
Create your first ETL Pipeline in Apache Spark and Python

In this post, I am going to discuss Apache Spark and how you can create simple but robust ETL pipelines in it. You will learn how Spark provides APIs to transform different data format into Data frames and SQL for analysis purpose and how one data source could be transformed into another without any hassle. What is Apache Spark? According to Wikipedia: Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. From Official Website: Apache Spark™ is a unified analytics engine for large-scale data processing. In short, Apache Spark is a framework which is…

Read More
Getting started with Apache Cassandra and Python

In this post, I am going to talk about Apache Cassandra, its purpose, usage, configuration, and setting up a cluster and in the end, how can you access it in your Python applications. At the end of the post, you should have an idea of it and could start playing it for your next project. What is Apache Cassandra? According to Wikipedia: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,[1] with asynchronous masterless replication…

Read More
Getting started with Apache Airflow

This post is the part of Data Engineering Series. In this post, I am going to discuss Apache Airflow, a workflow management system developed by Airbnb. Earlier I had discussed writing basic ETL pipelines in Bonobo. Bonobo is cool for write ETL pipelines but the world is not all about writing ETL pipelines to automate things. There are other use cases in which you have to perform tasks in a certain order once or periodically. For instance: Monitoring Cron jobs transferring data from one place to other. Automating your DevOps operations. Periodically fetching data from websites and update the database for your awesome price comparison system. Data processing for recommendation based…

Read More