Thursday, May 10, 2012

Hadoop : The Definitive Guide


Ideal for processing large datasets, framework Apache Hadoop is an open source implementation of MapReduce algorithm that Google built its empire. This suggests that a comprehensive resource on how to use Hadoop to build reliable, scalable, distributed systems: a programmer will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop to solve specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) to store large datasets, and run a distributed computation on the dataset using MapReduce Become familiar with Hadoop and data I / O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for write a MapReduce program real-world design, build, and manage a dedicated Hadoop cluster, or run Hadoop in a cloud Use the Pig, the demand for high-level language for large scale data processing advantage HBase, Hadoop database for structured and semi-structured data Learn ZooKeeper, primitive toolkit coordination to build a distributed system If you have a lot of data - whether it is gigabytes or petabytes - Hadoop is the perfect solution. 

Download Link

No comments:

Post a Comment