This post is the part of Data Engineering Series. In previous posts, I discussed writing ETLs in Bonobo, Spark, and Airflow. In this post, I am introducing another ETL tool which was developed by Spotify, called Luigi. Earlier I had discussed here, here and here about writing basic ETL pipelines. Bonobo is cool for write ETL pipelines but the world is not all about writing ETL pipelines to automate things. There are other use cases in which you have to perform tasks in a certain order once or periodically. For instance: Monitoring Cron jobs transferring data from one place to another. Automating your DevOps operations. Periodically fetching data from websites and…
-
Create your first ETL in Luigi
-
Promote yourself on my blog
I just realized that this bog is up for more than five years. I was not so active in the first two years hence this blog did not generate much traffic. In 2016 I resumed and then I have been writing since then. Oh by the way this blog is not my first attempt though as I have written in past as well. So far I have written 99% of these posts(quite obvious, right?) but I am the person who loves to share the knowledge I have and so far I am successful. Today I am (re)opening this platform for everyone who wants to share his/her creativity. This is not…
-
Securing Your Data in Azure: Tips and Tricks
Guest Post by Gilad David Maayan Two-thirds of all businesses are currently using some form of cloud storage, with many turning to Azure as their provider. Cloud services contain a significant amount of data and allow persistent Internet access. This makes clouds an appealing and valuable target for attackers. To make sure that your data doesn’t fall victim to attacks, you should take steps to protect your cloud systems. The first step is learning how your data is vulnerable and what steps you can take to secure it. In this article, you’ll learn about security concerns specific to Azure and some best practices for addressing these vulnerabilities. Azure Security Considerations…
-
ScrapeGen – Tool for generating Python scrapers
A simple python tool that generates a requests/bs4 based web scraperOK, so I was kind of bored last week so thought of coming up something anyway, even if it is useless. So, I allocated a few hours and came up with ScrapeGen. What is ScrapeGen? It is a simple Python-based command-line tool that generates python web scrapers based on rules and details entered in a YAML file. When it runs, it generates a new file. Rules turned to separate functions which then are called to main parsing method. View the Demo: Why is needed? Such kind of tool could be good for companies and individuals who write many parsers and hardcode the rules within the .py files. Imagine the rule…
-
You don’t need to know everything
Image Source Last week I decided to shut all the streams of political news and information. The purpose was to take a hiatus from all kind of information related to politics or world affairs. I wanted to confirm that knowing everything happening around me is not necessary and I am not the center of this universe, thus, if I do not know something, it will not stop working. This is not the first time I was doing, in the past, I have taken such breaks or I say, semi-breaks. This time it was different in a way that I was not only blocking all info from my laptop but also…
-
5 Must-Know Web Application Security Tips
Image Source Guest Post by Gilad David Maayan Gone are the days when developers could code a web application, release it, and be done with the project. Agile methodologies are changing the way we work, the way we code, and the way we collaborate. The developers of 2019 are expected to ensure the web apps they deliver are secure. That means your work doesn’t end with quick delivery. You also need to secure the apps. Read on to learn how to do that. 1. Don’t Aim to Fix All Vulnerabilities: Prioritize Them Common website platforms like WordPress, Drupal and Joomla are plagued with thousands of vulnerabilities. If you develop your…
-
Things every developer should know to improve site performance
Introduction Hundreds of websites/web apps are being launched every day for different purposes. Poorly optimized websites not only leave a bad impact on visitors but also hurt businesses. While more and more people visiting websites on their mobile devices, it is very important to make them load fast. In this post, I am going to discuss a few things that every developer should know while building or updating a website to load them fast. I will categorize these issues into two sections: Frontend and Backend. Types of Optimization Frontend:- In which CSS, JS and at times HTML is refactored and optimized one way or other. Backend:- Databases; indices, queries etc.…
-
5 Ways EBS Snapshots Can Make Your Life Easier
Image source Guest Post by Gilad David Maayan Amazon Web Services is the leading cloud provider, offering a range of storage options to meet the complex and evolving needs of an organization. Along with the standard cloud storage option, S3, Amazon also offers EBS, which provides flexible storage for the data contained in applications. With Amazon’s high availability, customizability and variety of integrated cloud services, you can pursue a storage tiering strategy for your data, taking advantage of a key facet of Amazon—EBS snapshots. Read on to learn how snapshots can make your life on the cloud easier. What Is Amazon EBS? Amazon Elastic Block Store (EBS) is a cloud-based…
-
Create a simple image search engine in OpenCV and Flask
I recently started playing with OpenCV, an open-source Computer Vision library for image processing. Luckily there are Python bindings available. More luck that the guys like Adrian has done a great service by releasing both book and blog on a similar topic. I have also made a demo which can see below. Conclusion So this was a basic image processing tutorial in OpenCV. It is not a mature product as it can’t tell you about unique colors in a picture. It is just telling you informaiton based on pixel colors rather than performing color segmentation and clustering based on a ML algorithm. So what is it all about? Well, it…
-
Getting Started with AWS Automation: EBS Snapshots, AWS Lambda and AWS Systems Manager
Image source: Pixabay Guest Post by Gilad David Maayan Whether you’ve recently adopted cloud-based services or you’ve been using them for a while, you’re likely interested in reducing the amount of manual work that managing a cloud system requires. Luckily, AWS includes a variety of automation tools and integration options that you can put to use to accomplish just that. We’ll look at three of your options here, one simple to implement and two a bit more challenging, to give you a better idea of how they work and how you can use them to maximum benefit. EBS Snapshots The Automation of EBS snapshots is one of the easiest places…