Combing Pig and Python to explore raw datasets via Pydoop

Taking the raw data processing capabilities of Hadoop and seamlessly integrating that into an ipython notebook to allow some exploratory data analysis on my raw electricity logs.

What we do done here is to take raw data directly from my Utility Sensor log files, process them on the hadoop cluster using pig and perform some basic exploratory data anlysis using python and the matplotlib libaries.

I’ve been a little lazy and just embedded the whole notebook below.

Leave a Reply

Your email address will not be published. Required fields are marked *