skip to content
Optimizing Hadoop for MapReduce. Preview this item
ClosePreview this item
Checking...

Optimizing Hadoop for MapReduce.

Author: Khaled Tannir
Publisher: Packt Publishing, 2014.
Edition/Format:   eBook : Document : EnglishView all editions and formats
Summary:
In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and  Read more...
Rating:

(not yet rated) 0 with reviews - Be the first.

Subjects
More like this

 

Find a copy online

Links to this item

Find a copy in the library

&AllPage.SpinnerRetrieving; Finding libraries that hold this item...

Details

Genre/Form: Electronic books
Additional Physical Format: Print version:
Material Type: Document, Internet resource
Document Type: Internet Resource, Computer File
All Authors / Contributors: Khaled Tannir
ISBN: 130646367X 9781306463676 9781783285662 1783285664 1783285656 9781783285655
OCLC Number: 871189870
Language Note: English.
Description: 1 online resource
Contents: Cover; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Understanding Hadoop MapReduce; The MapReduce model; Overview of Hadoop MapReduce; Hadoop MapReduce internals; Factors affecting the performance of MapReduce; Summary; Chapter 2: An Overview of the Hadoop Parameters; Investigating the Hadoop parameters; The mapred-site.xml configuration file; The CPU-related parameters; The disk I/O related parameters; The memory-related parameters; The network-related parameters; The hdfs-site.xml configuration file. The core-site.xml configuration fileHadoop MapReduce metrics; Performance monitoring tools; Using Chukwa to monitor Hadoop; Using Ganglia to monitor Hadoop; Using Nagios to monitor Hadoop; Using Apache Ambari to monitor Hadoop; Summary; Chapter 3: Detecting System Bottlenecks; Performance tuning; Creating a performance baseline; Identifying resource bottlenecks; Identifying RAM bottlenecks; Identifying CPU bottlenecks; Identifying storage bottlenecks; Identifying network bandwidth bottlenecks; Summary; Chapter 4: Identifying Resource Weaknesses; Identifying cluster weakness. Checking the Hadoop cluster node's healthChecking the input data size; Checking massive I/O and network traffic; Checking for insufficient concurrent tasks; Checking for CPU contention; Sizing your Hadoop cluster; Configuring your cluster correctly; Summary; Chapter 5: Enhancement of Map and Reduce Tasks; Enhancing Map tasks; Input data and block size impact; Dealing with small and unsplittable files; Reducing spilled records during the Map phase; Calculating map tasks' throughput; Enhancing Reduce tasks; Calculating reduce task throughput; Improving Reduce execution phase. Tuning map and reduce parametersSummary; Chapter 6: Optimizing MapReduce Tasks; Using Combiners; Using compression; Using appropriate Writable types; Reusing types smartly; Optimizing mappers and reducers code; Summary; Chapter 7: Best Practices and Recommendations; Hardware tuning and OS recommendations; Hadoop cluster checklists; The Bios tuning checklist; OS configuration recommendations; Hadoop best practices and recommendations; Deploying Hadoop; Hadoop tuning recommendations; Using a MapReduce template class code; Summary; Index.

Abstract:

In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation. This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster's node resources to run MapReduce jobs optimally. This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster's node resources. Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression. The book ends with best practices and recommendations on how to use your Hadoop cluster optimally. Approach This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance. Who this book is for If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.

Reviews

User-contributed reviews
Retrieving GoodReads reviews...
Retrieving DOGObooks reviews...

Tags

Be the first.
Confirm this request

You may have already requested this item. Please select Ok if you would like to proceed with this request anyway.

Linked Data


Primary Entity

<http://www.worldcat.org/oclc/871189870> # Optimizing Hadoop for MapReduce.
    a schema:Book, schema:CreativeWork, schema:MediaObject ;
    library:oclcnum "871189870" ;
    schema:about <http://experiment.worldcat.org/entity/work/data/1821509877#CreativeWork/apache_hadoop> ; # Apache Hadoop.
    schema:about <http://experiment.worldcat.org/entity/work/data/1821509877#Topic/cluster_analysis_data_processing> ; # Cluster analysis--Data processing
    schema:about <http://dewey.info/class/005.74/> ;
    schema:about <http://experiment.worldcat.org/entity/work/data/1821509877#CreativeWork/mapreduce_computer_file> ; # MapReduce (Computer file)
    schema:about <http://experiment.worldcat.org/entity/work/data/1821509877#Topic/open_source_software> ; # Open source software
    schema:about <http://experiment.worldcat.org/entity/work/data/1821509877#Topic/electronic_data_processing_distributed_processing> ; # Electronic data processing--Distributed processing
    schema:bookFormat schema:EBook ;
    schema:creator <http://experiment.worldcat.org/entity/work/data/1821509877#Person/tannir_khaled> ; # Khaled Tannir
    schema:datePublished "2014" ;
    schema:description "In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, and statistical machine translation. This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster's node resources to run MapReduce jobs optimally. This book details the Hadoop MapReduce job performance optimization process. Through a number of clear and practical steps, it will help you to fully utilize your cluster's node resources. Starting with how MapReduce works and the factors that affect MapReduce performance, you will be given an overview of Hadoop metrics and several performance monitoring tools. Further on, you will explore performance counters that help you identify resource bottlenecks, check cluster health, and size your Hadoop cluster. You will also learn about optimizing map and reduce tasks by using Combiners and compression. The book ends with best practices and recommendations on how to use your Hadoop cluster optimally. Approach This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance. Who this book is for If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code."@en ;
    schema:description "Cover; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Understanding Hadoop MapReduce; The MapReduce model; Overview of Hadoop MapReduce; Hadoop MapReduce internals; Factors affecting the performance of MapReduce; Summary; Chapter 2: An Overview of the Hadoop Parameters; Investigating the Hadoop parameters; The mapred-site.xml configuration file; The CPU-related parameters; The disk I/O related parameters; The memory-related parameters; The network-related parameters; The hdfs-site.xml configuration file."@en ;
    schema:exampleOfWork <http://worldcat.org/entity/work/id/1821509877> ;
    schema:genre "Electronic books"@en ;
    schema:inLanguage "en" ;
    schema:isSimilarTo <http://worldcat.org/entity/work/data/1821509877#CreativeWork/> ;
    schema:name "Optimizing Hadoop for MapReduce."@en ;
    schema:productID "871189870" ;
    schema:publication <http://www.worldcat.org/title/-/oclc/871189870#PublicationEvent/packt_publishing_2014> ;
    schema:publisher <http://experiment.worldcat.org/entity/work/data/1821509877#Agent/packt_publishing> ; # Packt Publishing
    schema:url <http://www.myilibrary.com?id=577618> ;
    schema:url <http://cdn.totalboox.com/static/covers/PT/399c15d551a5a60e-b.jpg> ;
    schema:url <http://public.eblib.com/choice/publicfullrecord.aspx?p=1644025> ;
    schema:url <http://www.totalboox.com/book/id-4151216962470782478> ;
    schema:url <http://public.ebookcentral.proquest.com/choice/publicfullrecord.aspx?p=1644025> ;
    schema:url <http://ebookcentral.proquest.com/lib/columbia/detail.action?docID=1644025> ;
    schema:workExample <http://worldcat.org/isbn/9781306463676> ;
    schema:workExample <http://worldcat.org/isbn/9781783285662> ;
    schema:workExample <http://worldcat.org/isbn/9781783285655> ;
    wdrs:describedby <http://www.worldcat.org/title/-/oclc/871189870> ;
    .


Related Entities

<http://experiment.worldcat.org/entity/work/data/1821509877#Agent/packt_publishing> # Packt Publishing
    a bgn:Agent ;
    schema:name "Packt Publishing" ;
    .

<http://experiment.worldcat.org/entity/work/data/1821509877#CreativeWork/mapreduce_computer_file> # MapReduce (Computer file)
    a schema:CreativeWork ;
    schema:name "MapReduce (Computer file)" ;
    .

<http://experiment.worldcat.org/entity/work/data/1821509877#Person/tannir_khaled> # Khaled Tannir
    a schema:Person ;
    schema:familyName "Tannir" ;
    schema:givenName "Khaled" ;
    schema:name "Khaled Tannir" ;
    .

<http://experiment.worldcat.org/entity/work/data/1821509877#Topic/cluster_analysis_data_processing> # Cluster analysis--Data processing
    a schema:Intangible ;
    schema:name "Cluster analysis--Data processing"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/1821509877#Topic/electronic_data_processing_distributed_processing> # Electronic data processing--Distributed processing
    a schema:Intangible ;
    schema:name "Electronic data processing--Distributed processing"@en ;
    .

<http://experiment.worldcat.org/entity/work/data/1821509877#Topic/open_source_software> # Open source software
    a schema:Intangible ;
    schema:name "Open source software"@en ;
    .

<http://worldcat.org/isbn/9781306463676>
    a schema:ProductModel ;
    schema:isbn "130646367X" ;
    schema:isbn "9781306463676" ;
    .

<http://worldcat.org/isbn/9781783285655>
    a schema:ProductModel ;
    schema:isbn "1783285656" ;
    schema:isbn "9781783285655" ;
    .

<http://worldcat.org/isbn/9781783285662>
    a schema:ProductModel ;
    schema:isbn "1783285664" ;
    schema:isbn "9781783285662" ;
    .


Content-negotiable representations

Close Window

Please sign in to WorldCat 

Don't have an account? You can easily create a free account.