• HTML

    Create your first Web scraper in Go with goQuery
    A beginners tutorial for writing web scrapers in Go language for Yelp.

    Planning to write a book about Web Scraping in Python. Click here to give your feedback I have been covering web scraping for a long time on this blog for a long time but they were mostly in Python; be it requests, Selenium or Scrapy framework, all were based on Python language but scraping is not limited to a specific language. Any language that provides APIs or libraries for an Http client and HTML parser is able to provide you web scraping facility. Go also provides you the ability to write web scrapers. Go is a compiled and static type language and could be very beneficial to write efficient and…

  • Fehrist – Document Indexing Library in Go
    Fehrist is a document indexing library written in Golang which is used to index different kind of text documents

    TLDR: Visit Github repo if you are not interested in the internals of the lib. Today I present you another library I made in Go language, called, Fehrist From the Github README: Fehrist is a pure Go library for indexing different types of documents. Currently, it supports only CSV and JSON but flexible architecture gives you the liberty to add more documents. Fehrist(فہرست) is an Urdu word for Index. Similar terminologies used in Arabic(فھرس) and Farsi(فہرست) as well. Fehrist is based on an Inverted Index data structure for indexing purposes. Why did I make it? It seems I have fallen in love with Golang after Python. Go is an opinionated language…

  • Test-Driven Development: a Cost-Effective Approach to Developing Software

    The main goal of software developers is to create high-quality products with the required functionality. At the same time, this product creation shouldn’t cost a fortune. One of the effective methods used to speed up the process of creating software and reduce its cost is test-driven development, which offers a pool of benefits to both programmers and customers. TDD or Test-Driven Development: What Is It? Test-Driven Development is an approach to software or mobile app development that relies on cycling. Its mechanism of action is the following: a QA engineer writes a test that covers the desired change; developers write a code, which must pass this test; the code is…

  • Setting Up a DevOps Pipeline with a Remote Team

    The global coronavirus pandemic is affecting the majority of the world’s population. People are locked in their homes, and businesses are trying to adopt remote working solutions. Collaboration is difficult even when your team is in the same office. When the team is separated across different home offices, collaboration and communication get even more complex.  This article explains key concepts of DevOps pipelines, and then presents a few ways to get around remote work challenges for DevOps teams.  What Is a DevOps Pipeline A software deployment pipeline is a set of solutions and practices that enable you to quickly build, test, and deploy code. Different development methodologies use different pipelines…

  • Create your first web scraper with ScrapingBee API and Python
    Learn how to use cloud based Scraping API to scrape web pages without getting blocked.

    In this post, I am going to discuss another cloud-based scraping tool that takes care of many of the issues you usually face while scraping websites. This platform has been introduced by ScrapingBee, a cloud-based Scraping tool. What is ScrapingBee If you visit their website, you will find something like below: ScrapingBee API handles headless browsers and rotates proxies for you. As it suggests, it is offering you all the things to deal with the issues you usually come across while writing your scrapers, especially the availability of proxies and headless scraping. No installation of web drivers for Selenium, yay! Development ScrapingBee is based on REST API hence it can…

  • GoCache: LRU Cache Implementation in Go

    I got to know about Golang a year back and did write some toy programs in it while learning but then I gave up as I was not really enjoying despite liking Go language. It is very much like Python but with better performance because it’s compiled. Recently I against wished to do something in Go. This time I did not want to go back to practice topic by topic. I rather thought to do some project and will learn whatever the stuff I need to get the thing done. I have used Memcached in the past in PHP and really liked it so I thought to come up with…

  • Create your first ETL Pipeline in Apache Beam and Python
    Learn how to use Apache Beam to create efficient Pipelines for your applications.

    This post is part of Data Engineering and ETL Series. In this post, I am going to introduce another ETL tool for your Python applications, called Apache Beam. What is Apache Beam? According to Wikipedia: Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.. Unlike Airflow and Luigi, Apache Beam is not a server. It is rather a programming model that contains a set of APIs. Currently, they are available for Java, Python and Go programming languages. A typical Apache Beam based pipeline looks like below: (Image Source: https://beam.apache.org/images/design-your-pipeline-linear.svg) From the left, the data is being…

  • 13 Tips for Making the Most of the COVID-19 Lockdown
    A few useful tips for those working from home during the time of coronavirus pandemic

    Like many parts of the world, Pakistan has also suffered coronavirus pandemic. As of March 23, 800+ cases recorded out of the 6 recovered and 6 died. Many cities of the world have been locked down to avoid the spread of the COVID19 disease. Many companies around the world are now asking for work from home. People are being caged at home and they can’t roam around. It is not easy to spend time at home, especially when you are not used to working from home. Men usually are not used to staying home. Below are the few things which you could do to keep yourself sane, productive while working…

  • Top 5 Open Source Kubernetes Monitoring Tools

    Image source Monitoring distributed microservices like Kubernetes is not an easy task because they require real-time attention and proactive monitoring. To overcome this challenge, many companies develop various open-source monitoring tools for Kubernetes.  Some tools collect metrics, others collect logs. Some are Kubernetes-native, others are more agnostic in nature. Some are data collectors while others provide an interface for operating Kubernetes. This article takes a look at five of the more popular Kubernetes monitoring tools out there.  What Is Kubernetes? Kubernetes (K8s) is an open-source platform for deploying, automating, managing, and scaling containerized applications. Kubernetes groups containers into clusters for easy discovery and management. You can deploy K8s on-premise, and…

  • Python for bioinformatics: Getting started with sequence analysis in Python
    A Biopython tutorial about DNA, RNA and other sequence analysis

    In this post, I am going to discuss how Python is being used in the field of bioinformatics and how you can use it to analyze sequences of DNA, RNA, and proteins. Yeah, Python is being used by biologists as well. Before I get into coding, I’d like to give a brief background of bioinformatics and related things. What is bioinformatics? According to Wikipedia: Bioinformatics  is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of…