Ideal for processing large datasets, framework Apache Hadoop is an open
source implementation of MapReduce algorithm that Google built its
empire. This suggests that a comprehensive resource on how to use Hadoop
to build reliable, scalable, distributed systems: a programmer will
find details for analyzing large datasets, and administrators will learn
how to set up and run Hadoop clusters. Complete with case studies that
illustrate how Hadoop to solve specific problems, this book helps you:
Use the Hadoop Distributed File System (HDFS) to store large datasets,
and run a distributed computation on the dataset using MapReduce Become
familiar with Hadoop and data I / O building blocks for compression,
data integrity, serialization, and persistence Discover common pitfalls
and advanced features for write a MapReduce program real-world design,
build, and manage a dedicated Hadoop cluster, or run Hadoop in a cloud
Use the Pig, the demand for high-level language for large scale data
processing advantage HBase, Hadoop database for structured and
semi-structured data Learn ZooKeeper, primitive toolkit coordination to
build a distributed system If you have a lot of data - whether it is
gigabytes or petabytes - Hadoop is the perfect solution.
No comments:
Post a Comment