This post is the part of Data Engineering Series. In this post, I am going to discuss Apache Kafka and how Python programmers can use it for building distributed systems. What is Apache Kafka? Apache Kafka is an open-source streaming platform that was initially built by LinkedIn. It was later handed over to Apache foundation and open sourced it in 2011. According to Wikipedia: Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue architected…
-
-
Getting started with Elasticsearch in Python
The updated version of this post for Elasticsearch 7.x is available here. In this post, I am going to discuss Elasticsearch and how you can integrate it with different Python apps. What is ElasticSearch? ElasticSearch (ES) is a distributed and highly available open-source search engine that is built on top of Apache Lucene. It’s an open-source which is built in Java thus available for many platforms. You store unstructured data in JSON format which also makes it a NoSQL database. So, unlike other NoSQL databases ES also provides search engine capabilities and other related features. ElasticSearch Use Cases You can use ES for multiple purposes, a couple of them given below: You…
-
How to create a custom token on Stellar network in Python
A few months back I made a post about Stellar that how you can use it in your Python applications. In this post, I am going to discuss that how you can create your own custom token, a.k.a, a coin programmatically in Python. Before I get into the code, I’d like to discuss what are tokens and their background, how they are different from Alt-coins and some Stellar network concepts. This post is lengthy so read it when you have ample time to read. What are Tokens? The term token is not new and many of us would have experienced the application of it one way or other. Tokens are…
-
5 strategies to write unblockable web scrapers in Python
Introduction People who read my posts in scraping series often contacted me to know how could they write scrapers that don’t get blocked. It is very difficult to write a scraper that NEVER gets blocked but yes, you can increase the life of your web scraper by implementing a few strategies. Today I am going to discuss them. User-Agent The very first thing you need to take care of is setting the user-agent. User Agent is a tool that works on behalf of the user and tells the server about which web browser the user is using for visiting the website. Many websites do not let you view the content…
-
Develop your first ETL job in Python using bonobo
In this post I am going to discuss how you can write ETL jobs in Python by using Bonobo library. Before I get into the library itself, allow me to discuss about ETL itself and why is it needed? What is ETL? ETL is actually short form of Extract, Transform and Load, a process in which data is acquired, changed/processes and then finally get loaded into data warehouse/database(s). You can extract data from data sources like Files, Website or some Database, transform the acquired data and then load the final version into database for business usage. You may ask, Why ETL?, well, what ETL does, many of you might already been doing…
-
How to setup PHP7.1, Apache 2.2 on Amazon Linux
OK I had no plan to make this post but recently I spent quite a few time to figure it out so thought to make it as a post for self and others who come across issue to deal with this simply thing otherwise. Alright, let’s proceed! Below is the details of my Amazon Distro: cat /etc/*-release NAME="Amazon Linux AMI" VERSION="2017.09" ID="amzn" ID_LIKE="rhel fedora" VERSION_ID="2017.09" PRETTY_NAME="Amazon Linux AMI 2017.09" ANSI_COLOR="0;33" CPE_NAME="cpe:/o:amazon:linux:2017.09:ga" HOME_URL="http://aws.amazon.com/amazon-linux-ami/" Amazon Linux AMI release 2017.09 Repo setting Before we proceed, first make sure that you are downloading stuff from the right repository. Remove all irrelevant repos, specially remi-safe. wget https://mirror.webtatic.com/yum/el6/latest.rpm sudo yum install latest.rpm sudo vi /etc/yum.repos.d/webtatic.repo 'set…
-
Getting started with Python and IPFS
In this post I am going to discuss how you can use decentralized IPFS in your Python apps for storing different kind of data. What is IPFS? From Wikipedia: InterPlanetary File System (IPFS) is a protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system. IPFS was initially designed by Juan Benet, and is now an open-source project developed with help from the community. In simple it’s Amazon S3 on Blockchain. All of your information is available on decentralized network across nodes thus not only make the system scaleable but reliable as well since data which is fed in it…
-
Introduction to Exploratory Data Analysis in Python
Recently I finished up Python Graph series by using Matplotlib to represent data in different types of charts. In this post I am giving a brief intro of Exploratory data analysis(EDA) in Python with help of pandas and matplotlib. What is Exploratory data analysis? According to Wikipedia: In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. You can say that EDA is statisticians way of story telling where you explore…
-
Develop your first python app integrated with decentralized Stellar network
In this post I am going to discuss how to create a decentralized blockchain app aka dApp for Stellar Network. I will build a very simple web app, world’s simplest Ecommerce app, called it RocketCommerce where people can buy a single item by paying Lumens(XLM), stellar’s currency. The interface of the app looks like below: Before we get into the development of the application itself, allow me to discuss some background of Blockchain, decentralized apps and Stellar Network itself. What is Blockchain? From Wikipedia: A blockchain, originally block chain is a continuously growing list of records, called blocks, which are linked and secured using cryptography.[1][6] Each block typically contains a cryptographic hash of the previous block,[6] a timestamp and transaction data.[7] By design, a…
-
2017 in numbers
Credits 2017 is almost over. This year was very exciting in terms of taking initiatives in different aspects of life. I resumed technical blogging, create more open source work and starting video tutorials. Actually it all started when I resumed blogging on this very blog and created a post that became massive hit and made me to continue blogging. BTW I am not new into writing. I had a technical blog back in 2003 where I used to share different things. I had another blog where I used to share the other side of me, that is about politics, religion and current affairs. I am happy to say…