MichiPUG: using Python to run reports in Hadoop clusters

Zattoo’s Marshall Weir will be talking at this week’s MichiPUG (Thursday evening at 7PM at SRT Solutions in downtown Ann Arbor). In his own words:

I’ve been working on a python module for running reports in Hadoop. Its sort of a wrapper around the pig data processing language and some smarts for running reports on a hadoop cluster and pushing and pulling data to it. It’s designed primarily to make it easier and more efficient to run complex sets of interdependent reports – I’ve been using it to do business reporting on our customer behavior at Zattoo.

This should be very interesting for folks like me who have never seen Hadoop in action!

2 thoughts on “MichiPUG: using Python to run reports in Hadoop clusters”

  1. At the meeting now, Marshall mentioned another module called “dumbo”. dumbo and happy both let you do map/reduce in Python, but Marshall’s module wraps pig execution, which makes it much higher level for reporting than just using map/reduce.

Comments are closed.