• Getting started with Apache Avro and Python
    Learn how to create and consume Apache Avro based data for better and efficient transfer.

    In this post, I am going to talk about Apache Avro, an open-source data serialization system that is being used by tools like Spark, Kafka, and others for big data processing. What is Apache Avro According to Wikipedia: Avro is a row-oriented remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema…

  • Create your first REST API in Django Rest Framework
    A step by step guide creating APIs in Django Rest Framework

    In this post, I am going to talk about Django Rest Framework or DRF. DRF is used to create RESTful APIs in Django which later could be consumed by various apps; mobile, web, desktop, etc. We will be discussing how to install DRF on your machine and then will be writing our APIs for a system. Before we discuss DRF, let’s talk a bit about REST itself. What is Rest From Wikipedia Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called RESTful Web services, provide interoperability between…

  • Top 6 Tips for Planning a Successful Azure Migration

    Migrating your workloads to Azure can help you leverage the benefits of cloud computing. This includes agility, scalability, lower costs, and easier management. However, the process of migration can sometimes be complicated. You have to select the proper service model for every workload and establish a migration strategy for all workloads. A well-planned migration strategy can help you make the move without impacting your business. The following tips will ensure that your Azure migration goes smoothly. Reasons for Migrating to the Cloud Cloud migration can be risky and expensive, but also rewarding. Here are some of the common drivers for moving workloads and applications to the cloud:  Reducing operating costs—the…

  • HTML

    Create your first Web scraper in Go with goQuery
    A beginners tutorial for writing web scrapers in Go language for Yelp.

    Planning to write a book about Web Scraping in Python. Click here to give your feedback I have been covering web scraping for a long time on this blog for a long time but they were mostly in Python; be it requests, Selenium or Scrapy framework, all were based on Python language but scraping is not limited to a specific language. Any language that provides APIs or libraries for an Http client and HTML parser is able to provide you web scraping facility. Go also provides you the ability to write web scrapers. Go is a compiled and static type language and could be very beneficial to write efficient and…

  • Fehrist – Document Indexing Library in Go
    Fehrist is a document indexing library written in Golang which is used to index different kind of text documents

    TLDR: Visit Github repo if you are not interested in the internals of the lib. Today I present you another library I made in Go language, called, Fehrist From the Github README: Fehrist is a pure Go library for indexing different types of documents. Currently, it supports only CSV and JSON but flexible architecture gives you the liberty to add more documents. Fehrist(فہرست) is an Urdu word for Index. Similar terminologies used in Arabic(فھرس) and Farsi(فہرست) as well. Fehrist is based on an Inverted Index data structure for indexing purposes. Why did I make it? It seems I have fallen in love with Golang after Python. Go is an opinionated language…

  • Test-Driven Development: a Cost-Effective Approach to Developing Software

    The main goal of software developers is to create high-quality products with the required functionality. At the same time, this product creation shouldn’t cost a fortune. One of the effective methods used to speed up the process of creating software and reduce its cost is test-driven development, which offers a pool of benefits to both programmers and customers. TDD or Test-Driven Development: What Is It? Test-Driven Development is an approach to software or mobile app development that relies on cycling. Its mechanism of action is the following: a QA engineer writes a test that covers the desired change; developers write a code, which must pass this test; the code is…

  • Setting Up a DevOps Pipeline with a Remote Team

    The global coronavirus pandemic is affecting the majority of the world’s population. People are locked in their homes, and businesses are trying to adopt remote working solutions. Collaboration is difficult even when your team is in the same office. When the team is separated across different home offices, collaboration and communication get even more complex.  This article explains key concepts of DevOps pipelines, and then presents a few ways to get around remote work challenges for DevOps teams.  What Is a DevOps Pipeline A software deployment pipeline is a set of solutions and practices that enable you to quickly build, test, and deploy code. Different development methodologies use different pipelines…

  • Create your first web scraper with ScrapingBee API and Python
    Learn how to use cloud based Scraping API to scrape web pages without getting blocked.

    Planning to write a book about Web Scraping in Python. Click here to give your feedback In this post, I am going to discuss another cloud-based scraping tool that takes care of many of the issues you usually face while scraping websites. This platform has been introduced by ScrapingBee, a cloud-based Scraping tool. What is ScrapingBee If you visit their website, you will find something like below: ScrapingBee API handles headless browsers and rotates proxies for you. As it suggests, it is offering you all the things to deal with the issues you usually come across while writing your scrapers, especially the availability of proxies and headless scraping. No installation…

  • GoCache: LRU Cache Implementation in Go

    I got to know about Golang a year back and did write some toy programs in it while learning but then I gave up as I was not really enjoying despite liking Go language. It is very much like Python but with better performance because it’s compiled. Recently I against wished to do something in Go. This time I did not want to go back to practice topic by topic. I rather thought to do some project and will learn whatever the stuff I need to get the thing done. I have used Memcached in the past in PHP and really liked it so I thought to come up with…

  • Create your first ETL Pipeline in Apache Beam and Python
    Learn how to use Apache Beam to create efficient Pipelines for your applications.

    This post is part of Data Engineering and ETL Series. In this post, I am going to introduce another ETL tool for your Python applications, called Apache Beam. What is Apache Beam? According to Wikipedia: Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.. Unlike Airflow and Luigi, Apache Beam is not a server. It is rather a programming model that contains a set of APIs. Currently, they are available for Java, Python and Go programming languages. A typical Apache Beam based pipeline looks like below: (Image Source: https://beam.apache.org/images/design-your-pipeline-linear.svg) From the left, the data is being…