In this post I am going to discuss how you can write ETL jobs in Python by using Bonobo library. Before I get into the library itself, allow me to discuss about ETL itself and why is it needed? What is ETL? ETL is actually short form of Extract, Transform and Load, a process in which data is acquired, changed/processes and then finally get loaded into data warehouse/database(s). You can extract data from data sources like Files, Website or some Database, transform the acquired data and then load the final version into database for business usage. You may ask, Why ETL?, well, what ETL does, many of you might already been doing…
I recently realized that I am in love with Data Scraping. Thanks to Python and Beautifulsoup to get me into this. In past few months I have scraped data from sites like Amazon, Rakuten and NewEgg. I have started a subreddit for developers who love to scrap sites for fun( or for work). I named it Scraping the Web. You can join here and submit your entries. Happy Scraping!