Yahoo! Distribution of Hadoop Released on GitHub

Posted by Alin Irimie on June 11, 2009

The Yahoo! Distribution of Hadoop is tested and deployed on Yahoo!’s clusters, which are the largest Hadoop clusters in the world. The Yahoo! Distribution of Hadoop is a source distribution that is based entirely on code found in the Apache Hadoop project.

Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop users wiki page.

Amazon announced in April the beta release of a new service called Amazon Elastic MapReduce which they describe as “a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). Continue reading…

Weekly Cloud Application: Panda Stream

Posted by Alin Irimie on December 29, 2008

Panda is an open source solution for video uploading, encoding and streaming, running completely within Amazon’s Web Services using customized EC2 instances, S3 and SimpleDB. It has support for the encoding profiles which FFmpeg supports. They include FLV for flash and H264 for iPhone.

The service is easy to integrate with your application. The EC2 instance will provide a simple REST (both YAML and XML formats support) API for listing, creating, editing and deleting videos. When a new video is created on your site the actual file upload takes place in a popup or iframe. Doing so means that the large video file is uploaded directly to your Panda EC2 instance so you don’t have to handle it within your application. The server also is configured to support an upload progress bar so user’s can see the video upload in progress. It cannot get any easier than this.

The range of encoding support does not depend on Panda, but rather depends on ffmpeg and Libavcodec(open source encoder/decoder tools and libraries Panda use underneath) encoding capability. Wikipedia page has a list of implemented video codecs. For Panda AMI setup, see this and this google group thread. Continue reading…

Python wrapper for Windows Azure Storage

Posted by Alin Irimie on November 14, 2008

Sriram Krishnan, Azure Program Manager at Microsoft, published a Python wrapper on top of the Azure Storage APIs. The code is hosted on Github here.

Slowly but surely, Microsoft is releasing to open-source community lots of free code, in order to promote their new services. Not too long ago, the SQL Services team released a Ruby library for accessing the SQL Services. The code is also hosted on Github.