• HTML

    Develop AirBnb Parser in Python

    Planning to write a book about Web Scraping in Python. Click here to give your feedback So I am starting a new scraping series, called, ScrapeTheFamous, in which I will be parsing some famous websites and will discuss my development process. The posts will be using Scraper API for parsing purposes which makes me free from all worries blocking and rendering dynamic sites since Scraper API takes care of everything. Anyways, the first post is about Airbnb. We will be scraping some important data points from it. We will be scraping a list of rental URL and fetch and store data in JSON format. So let’s start! The URL we…

  • 6 Tools for Optimizing your Media with Java

    Image Source Modern web design focuses on displaying media in clear and aesthetic ways. It also places an emphasis on accessibility, across devices and for a variety of users. This is partially due to changing tastes and partially due to advances in technology. Advances in media optimization play a role in particular. The media you use and the way it affects the experience of your users have a huge impact on the effectiveness of your sites. It also affects your page loading speed, which impacts your search rankings and your bottom line. Java is the most widely used language in web and mobile applications. Finding suitable Java tools is key…

  • HTML

    Advanced Proxy Use for Web Scraping

    Guest Post by Vytautas Kirjazovas from Oxylabs.io In the eyes of many, web scraping is an art. It is safe to state that the majority of web scraping enthusiasts have faced bans from websites more than once during their careers. Web scraping is a challenging task, and it’s more common than you think to see your crawlers getting banned by websites. In this article, we’ll talk about more advanced ways to use proxies for web scraping. There are some key components that you should take into account with web scraping to avoid getting banned too quickly: Set browser-like headers User-Agent that can be found in real life. Referer header. Other…

  • Create your first sales dashboard in Apache Superset
    Learn how to use Apache Superset to create an interactive e-commerce sales dashboard

    So far You have learned how to acquire data, process it and visualize it. Today I am going to talk about dashboards that what are they and how you can come up your own personal or enterprise dashboard for sales data by using Apache Superset. What is a Dashboard? According to Wikipedia: A dashboard is a type of graphical user interface which often provides at-a-glance views of key performance indicators (KPIs) relevant to a particular objective or business process. In other usage, “dashboard” is another name for “progress report” or “report.”   Basically, a dashboard is an information management tool that could show information both in text and visual format in the form of charts, tables…

  • Agile Project Planning: A Step-by-Step Guide

    Image Source Guest Post by Gilad David Maayan Software teams have been adopting agile project management methodologies for nearly a decade. As a result, development teams have increased their velocity and communication, and they are quicker to react to market trends. However, to be truly effective, agile methodologies rely on solid planning. This article offers a guideline for planning your agile project. What Is Agile Project Planning? Agile project planning is a project management approach that breaks big projects into independent units called sprints. Each sprint is a short work cycle that runs from one week to one month.  Teams use sprints as small manageable tasks, which they strive to…

  • Create your first ETL in Luigi
    An introductory tutorial covering the basics of Luigi and an example ETL application.

    This post is the part of Data Engineering Series. In previous posts, I discussed writing ETLs in Bonobo, Spark, and Airflow. In this post, I am introducing another ETL tool which was developed by Spotify, called Luigi. Earlier I had discussed here, here and here about writing basic ETL pipelines. Bonobo is cool for write ETL pipelines but the world is not all about writing ETL pipelines to automate things. There are other use cases in which you have to perform tasks in a certain order once or periodically. For instance: Monitoring Cron jobs transferring data from one place to another. Automating your DevOps operations. Periodically fetching data from websites and…

  • Promote yourself on my blog

    I just realized that this bog is up for more than five years. I was not so active in the first two years hence this blog did not generate much traffic. In 2016 I resumed and then I have been writing since then. Oh by the way this blog is not my first attempt though as I have written  in past as well.  So far I have written 99% of these posts(quite obvious, right?) but I am the person who loves to share the knowledge I have and so far I am successful. Today I am (re)opening this platform for everyone who wants to share his/her creativity. This is not…

  • Securing Your Data in Azure: Tips and Tricks

    Guest Post by Gilad David Maayan Two-thirds of all businesses are currently using some form of cloud storage, with many turning to Azure as their provider. Cloud services contain a significant amount of data and allow persistent Internet access. This makes clouds an appealing and valuable target for attackers.  To make sure that your data doesn’t fall victim to attacks, you should take steps to protect your cloud systems. The first step is learning how your data is vulnerable and what steps you can take to secure it. In this article, you’ll learn about security concerns specific to Azure and some best practices for addressing these vulnerabilities. Azure Security Considerations…

  • ScrapeGen – Tool for generating Python scrapers
    A simple python tool that generates a requests/bs4 based web scraper

    OK, so I was kind of bored last week so thought of coming up something anyway, even if it is useless. So, I allocated a few hours and came up with ScrapeGen. What is ScrapeGen? It is a simple Python-based command-line tool that generates python web scrapers based on rules and details entered in a YAML file. When it runs, it generates a new file. Rules turned to separate functions which then are called to main parsing method. View the Demo: Why is needed? Such kind of tool could be good for companies and individuals who write many parsers and hardcode the rules within the .py files. Imagine the rule…

  • You don’t need to know everything

    Image Source Last week I decided to shut all the streams of political news and information. The purpose was to take a hiatus from all kind of information related to politics or world affairs. I wanted to confirm that knowing everything happening around me is not necessary and I am not the center of this universe, thus, if I do not know something, it will not stop working. This is not the first time I was doing, in the past, I have taken such breaks or I say, semi-breaks. This time it was different in a way that I was not only blocking all info from my laptop but also…