• Create your first web scraper with ScrapingBee API and Python
    Learn how to use cloud based Scraping API to scrape web pages without getting blocked.

    In this post, I am going to discuss another cloud-based scraping tool that takes care of many of the issues you usually face while scraping websites. This platform has been introduced by ScrapingBee, a cloud-based Scraping tool. What is ScrapingBee If you visit their website, you will find something like below: ScrapingBee API handles headless browsers and rotates proxies for you. As it suggests, it is offering you all the things to deal with the issues you usually come across while writing your scrapers, especially the availability of proxies and headless scraping. No installation of web drivers for Selenium, yay! Development ScrapingBee is based on REST API hence it can…

  • GoCache: LRU Cache Implementation in Go

    I got to know about Golang a year back and did write some toy programs in it while learning but then I gave up as I was not really enjoying despite liking Go language. It is very much like Python but with better performance because it’s compiled. Recently I against wished to do something in Go. This time I did not want to go back to practice topic by topic. I rather thought to do some project and will learn whatever the stuff I need to get the thing done. I have used Memcached in the past in PHP and really liked it so I thought to come up with…

  • Create your first ETL Pipeline in Apache Beam and Python
    Learn how to use Apache Beam to create efficient Pipelines for your applications.

    This post is part of Data Engineering and ETL Series. In this post, I am going to introduce another ETL tool for your Python applications, called Apache Beam. What is Apache Beam? According to Wikipedia: Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.. Unlike Airflow and Luigi, Apache Beam is not a server. It is rather a programming model that contains a set of APIs. Currently, they are available for Java, Python and Go programming languages. A typical Apache Beam based pipeline looks like below: (Image Source: https://beam.apache.org/images/design-your-pipeline-linear.svg) From the left, the data is being…

  • 13 Tips for Making the Most of the COVID-19 Lockdown
    A few useful tips for those working from home during the time of coronavirus pandemic

    Like many parts of the world, Pakistan has also suffered coronavirus pandemic. As of March 23, 800+ cases recorded out of the 6 recovered and 6 died. Many cities of the world have been locked down to avoid the spread of the COVID19 disease. Many companies around the world are now asking for work from home. People are being caged at home and they can’t roam around. It is not easy to spend time at home, especially when you are not used to working from home. Men usually are not used to staying home. Below are the few things which you could do to keep yourself sane, productive while working…

  • Top 5 Open Source Kubernetes Monitoring Tools

    Image source Monitoring distributed microservices like Kubernetes is not an easy task because they require real-time attention and proactive monitoring. To overcome this challenge, many companies develop various open-source monitoring tools for Kubernetes.  Some tools collect metrics, others collect logs. Some are Kubernetes-native, others are more agnostic in nature. Some are data collectors while others provide an interface for operating Kubernetes. This article takes a look at five of the more popular Kubernetes monitoring tools out there.  What Is Kubernetes? Kubernetes (K8s) is an open-source platform for deploying, automating, managing, and scaling containerized applications. Kubernetes groups containers into clusters for easy discovery and management. You can deploy K8s on-premise, and…

  • Python for bioinformatics: Getting started with sequence analysis in Python
    A Biopython tutorial about DNA, RNA and other sequence analysis

    In this post, I am going to discuss how Python is being used in the field of bioinformatics and how you can use it to analyze sequences of DNA, RNA, and proteins. Yeah, Python is being used by biologists as well. Before I get into coding, I’d like to give a brief background of bioinformatics and related things. What is bioinformatics? According to Wikipedia: Bioinformatics  is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of…

  • HTML

    Develop AirBnb Parser in Python

    Planning to write a book about Web Scraping in Python. Click here to give your feedback So I am starting a new scraping series, called, ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since Scraper API takes care of everything. Anyways, the first post is about Airbnb. We will be scraping some important data points from it. We will be scraping a list of rental URL and fetch and store data in JSON format. So let’s start! The URL we…

  • 6 Tools for Optimizing your Media with Java

    Image Source Modern web design focuses on displaying media in clear and aesthetic ways. It also places an emphasis on accessibility, across devices and for a variety of users. This is partially due to changing tastes and partially due to advances in technology. Advances in media optimization play a role in particular. The media you use and the way it affects the experience of your users have a huge impact on the effectiveness of your sites. It also affects your page loading speed, which impacts your search rankings and your bottom line. Java is the most widely used language in web and mobile applications. Finding suitable Java tools is key…

  • HTML

    Advanced Proxy Use for Web Scraping

    Guest Post by Vytautas Kirjazovas from Oxylabs.io In the eyes of many, web scraping is an art. It is safe to state that the majority of web scraping enthusiasts have faced bans from websites more than once during their careers. Web scraping is a challenging task, and it’s more common than you think to see your crawlers getting banned by websites. In this article, we’ll talk about more advanced ways to use proxies for web scraping. There are some key components that you should take into account with web scraping to avoid getting banned too quickly: Set browser-like headers User-Agent that can be found in real life. Referer header. Other…

  • Create your first sales dashboard in Apache Superset
    Learn how to use Apache Superset to create an interactive e-commerce sales dashboard

    So far You have learned how to acquire data, process it and visualize it. Today I am going to talk about dashboards that what are they and how you can come up your own personal or enterprise dashboard for sales data by using Apache Superset. What is a Dashboard? According to Wikipedia: A dashboard is a type of graphical user interface which often provides at-a-glance views of key performance indicators (KPIs) relevant to a particular objective or business process. In other usage, “dashboard” is another name for “progress report” or “report.”   Basically, a dashboard is an information management tool that could show information both in text and visual format in the form of charts, tables…