Posted by Alin Irimie
on June 11, 2009
The Yahoo! Distribution of Hadoop is tested and deployed on Yahoo!’s clusters, which are the largest Hadoop clusters in the world. The Yahoo! Distribution of Hadoop is a source distribution that is based entirely on code found in the Apache Hadoop project.
Hadoop is a free Java software framework that supports data intensive distributed applications. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.
A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop users wiki page.
Amazon announced in April the beta release of a new service called Amazon Elastic MapReduce which they describe as “a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). Continue reading…
Posted by Alin Irimie
on June 10, 2009
I wrote previously about the potential of SQL Data Services, the sql-in-the-cloud offering from Microsoft. Today, Shankar Pal, the Principal Program Manager for SQL Data Services (SDS) team mentioned in the SQL Data Services blog that Microsoft Exchange Hosted Archive (EHA) is built in the cloud, using SDS. He writes about how well the SDS relational database service platform scales and provides some simple design principles to achieve maximum scalability .
Make no mistake about it, this is a big deal. Continue reading…
Posted by Alin Irimie
on April 02, 2009
Amazon announced today the public beta of Amazon Elastic MapReduce, a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. Amazon Elastic MapReduce lets you focus on crunching or analyzing your data without having to worry about time-consuming set-up, management or tuning of Hadoop clusters or the compute capacity upon which they sit. Continue reading…