In this post, I am going to discuss Apache Spark and how you can create simple but robust ETL pipelines in it. You will learn how Spark provides APIs to transform different data format into Data frames and SQL for analysis purpose and how one data source could be transformed into another without any hassle. What is Apache Spark? According to Wikipedia: Apache Spark is an open-source distributed general-purpose cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. From Official Website: Apache Spark™ is a unified analytics engine for large-scale data processing. In short, Apache Spark is a framework which is…
-
-
Codelobster IDE – Free PHP, HTML, CSS, JavaScript editor
Guest Post by Stas Ustimenko In this article, we suggest you to get acquainted with the free editor of web languages – Codelobster IDE. It is presented on the software market for a long time already, and it wins a lot of fans. Codelobster IDE allows you to edit PHP, HTML, CSS and JavaScript files, it highlights the syntax and gives hints for tags, functions and their parameters. This editor easily deals with those files that contain a mixed content. If you insert PHP code in your HTML template, then the editor correctly highlights both HTML tags and PHP functions. The same applies to CSS and JavaScript code, which is…
-
Getting started with Apache Cassandra and Python
In this post, I am going to talk about Apache Cassandra, its purpose, usage, configuration, and setting up a cluster and in the end, how can you access it in your Python applications. At the end of the post, you should have an idea of it and could start playing it for your next project. What is Apache Cassandra? According to Wikipedia: Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,[1] with asynchronous masterless replication…
-
How to implement Stellar blockchain on existing ecommerce site
In the last couple of posts, I had discussed the technical site of implementing Stellar blockchain in Python. Those posts stirred interest among many and I got emails about the implementation of it on existing web applications, especially e-commerce websites. Since the actual purpose got buried in lathe st two posts between the lines of code. So today, I am making an attempt to explain in a language that business people, the site owners or managers could understand. Keep reading! I will be discussing a fictitious website that sells tea online, say allthingstea.com. They have a variety of tea products which are loved by thousands of customers. They have different…
-
Create your first ReactJS app
In this tutorial, you are going to learn What is ReactJS and how you can use this JS framework to write maintainable web apps in Javascript. The project we are going to build will be a clone of Coinmarketcap, a famous website that lists the latest price of Cryptocurrencies. CMC provides JSON APIs to access their data. Before I get into the main project, let’s talk a bit about ReactJS itself. What is ReactJS? From the official website: A JavaScript library for building user interfaces. ReactJS was initially developed by Facebook and now maintained by the community. ReactJS is: Declarative:- By declarative, it means that you don’t tell all the…
-
Deploy your first scaleable PHP/MySQL Web application in Kubernetes
Introduction In this post, I am going to talk about Kubernetes, what is it all about, why to use it and how to use it. At the end of this post you should be able to understand the basic working of Kubernetes and be able to deploy your app in a Kubernetes cluster. Prerequisite It will be very difficult to understand Kubernetes if you have no idea of containerization tool like Docker. If you don’t know what Docker is and how to use it for making modular architecture than you should visit this two-part series. (Part 1, Part 2). Infact, I will recommend you to read both part as this…
-
Create your first PHP/MySQL application in docker
In the first part I discussed how to install and configure Docker for PHP based application. In this post, we will be building a full-fledged PHP application that will be communicating with MySQL. Docker Compose In the last post you learned how to install the docker itself and create a Dockerfile. Using command line based tools like docker build and docker run could be tedious in real-world scenarios where you have to run multiple containers. You can consider Compose a batch file that contains a set of instructions, commands to perform operation. Just like you create a Dockerfile to define how your image look like, in docker-compose.yml file you can…
-
Getting started with Docker
A step by step tutorial to install docker on your machine and run your first PHP web application in it
-
Schedule web scrapers with Apache Airflow
This post is the part of Data Engineering Series. In the previous post, I discussed Apache Airflow and it’s basic concepts, configuration, and usage. In this post, I am going to discuss how can you schedule your web scrapers with help of Apache Airflow. I will be using the same example I used in Apache Kafka and Elastic Search example that is scraping https://allrecipes.com because the purpose is to use Airflow. In case you want to learn about scraping you may check the entire series here. So, we will work on a workflow consist of tasks: parse_recipes: It will parse individual recipes. download_image: It downloads recipe image. store_data: Finally store image…
-
Getting started with Apache Airflow
This post is the part of Data Engineering Series. In this post, I am going to discuss Apache Airflow, a workflow management system developed by Airbnb. Earlier I had discussed writing basic ETL pipelines in Bonobo. Bonobo is cool for write ETL pipelines but the world is not all about writing ETL pipelines to automate things. There are other use cases in which you have to perform tasks in a certain order once or periodically. For instance: Monitoring Cron jobs transferring data from one place to other. Automating your DevOps operations. Periodically fetching data from websites and update the database for your awesome price comparison system. Data processing for recommendation based…