Hadoop MapReduce Cookbook.

Title:

Author:

Perera, Srinath.

ISBN:

9781849517294

Personal Author:

Perera, Srinath.

Physical Description:

1 online resource (369 pages)

Contents:

Hadoop MapReduce Cookbook -- Table of Contents -- Hadoop MapReduce Cookbook -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Errata -- Piracy -- Questions -- 1. Getting Hadoop Up and Running in a Cluster -- Introduction -- Setting up Hadoop on your machine -- Getting ready -- How to do it... -- How it works... -- Writing a WordCount MapReduce sample, bundling it, and running it using standalone Hadoop -- Getting ready -- How to do it... -- How it works... -- There's more... -- Adding the combiner step to the WordCount MapReduce program -- How to do it... -- How it works... -- There's more... -- Setting up HDFS -- Getting ready -- How to do it... -- How it works... -- Using HDFS monitoring UI -- Getting ready -- How to do it... -- HDFS basic command-line file operations -- Getting ready -- How to do it... -- How it works... -- There's more... -- Setting Hadoop in a distributed cluster environment -- Getting ready -- How to do it... -- How it works... -- There's more... -- Running the WordCount program in a distributed cluster environment -- Getting ready -- How to do it... -- How it works... -- There's more... -- Using MapReduce monitoring UI -- How to do it... -- How it works... -- 2. Advanced HDFS -- Introduction -- Benchmarking HDFS -- Getting ready -- How to do it... -- How it works... -- There's more... -- See also -- Adding a new DataNode -- Getting ready -- How to do it... -- There's more... -- Rebalancing HDFS -- See also -- Decommissioning DataNodes -- How to do it... -- How it works... -- See also.

Using multiple disks/volumes and limiting HDFS disk usage -- How to do it... -- Setting HDFS block size -- How to do it... -- There's more... -- See also -- Setting the file replication factor -- How to do it... -- How it works... -- There's more... -- See also -- Using HDFS Java API -- Getting ready -- How to do it... -- How it works... -- There's more... -- Configuring the FileSystem object -- Retrieving the list of data blocks of a file -- See also -- Using HDFS C API (libhdfs) -- Getting ready -- How to do it... -- How it works... -- There's more... -- Configuring using HDFS configuration files -- See also -- Mounting HDFS (Fuse-DFS) -- Getting ready -- How to do it... -- How it works... -- There's more... -- Building libhdfs -- See also -- Merging files in HDFS -- How to do it... -- How it works... -- 3. Advanced Hadoop MapReduce Administration -- Introduction -- Tuning Hadoop configurations for cluster deployments -- Getting ready -- How to do it... -- How it works... -- There's more... -- Running benchmarks to verify the Hadoop installation -- Getting ready -- How to do it... -- How it works... -- There's more... -- Reusing Java VMs to improve the performance -- How to do it... -- How it works... -- Fault tolerance and speculative execution -- How to do it... -- How it works... -- Debug scripts - analyzing task failures -- Getting ready -- How to do it... -- How it works... -- Setting failure percentages and skipping bad records -- Getting ready -- How to do it... -- How it works... -- There's more... -- Shared-user Hadoop clusters - using fair and other schedulers -- Getting ready -- How to do it... -- How it works... -- There's more... -- Hadoop security - integrating with Kerberos -- Getting ready -- How to do it... -- How it works... -- Using the Hadoop Tool interface -- How to do it... -- How it works...

4. Developing Complex Hadoop MapReduce Applications -- Introduction -- Choosing appropriate Hadoop data types -- How to do it... -- There's more... -- See also -- Implementing a custom Hadoop Writable data type -- How to do it... -- How it works... -- There's more... -- See also -- Implementing a custom Hadoop key type -- How to do it... -- How it works... -- See also -- Emitting data of different value types from a mapper -- How to do it... -- How it works... -- There's more... -- See also -- Choosing a suitable Hadoop InputFormat for your input data format -- How to do it... -- How it works... -- There's more... -- Using multiple input data types and multiple mapper implementations in a single MapReduce application -- See also -- Adding support for new input data formats - implementing a custom InputFormat -- How to do it... -- How it works... -- There's more... -- See also -- Formatting the results of MapReduce computations - using Hadoop OutputFormats -- How to do it... -- How it works... -- There's more... -- Hadoop intermediate (map to reduce) data partitioning -- How to do it... -- How it works... -- There's more... -- TotalOrderPartitioner -- KeyFieldBasedPartitioner -- Broadcasting and distributing shared resources to tasks in a MapReduce job - Hadoop DistributedCache -- How to do it... -- How it works... -- There's more... -- Distributing archives using the DistributedCache -- Adding resources to the DistributedCache from the command line -- Adding resources to the classpath using DistributedCache -- See also -- Using Hadoop with legacy applications - Hadoop Streaming -- How to do it... -- How it works... -- There's more... -- See also -- Adding dependencies between MapReduce jobs -- How to do it... -- How it works... -- There's more... -- Hadoop counters for reporting custom metrics -- How to do it... -- How it works...

5. Hadoop Ecosystem -- Introduction -- Installing HBase -- How to do it... -- How it works... -- There's more... -- Data random access using Java client APIs -- Getting ready -- How to do it... -- How it works... -- Running MapReduce jobs on HBase (table input/output) -- Getting ready -- How to do it... -- How it works... -- Installing Pig -- How to do it... -- How it works... -- There's more... -- Running your first Pig command -- How to do it... -- How it works... -- Set operations (join, union) and sorting with Pig -- Getting ready -- How to do it... -- How it works... -- There's more... -- Installing Hive -- Getting ready -- How to do it... -- How it works... -- Running a SQL-style query with Hive -- Getting ready -- How to do it... -- How it works... -- Performing a join with Hive -- Getting ready -- How to do it... -- How it works... -- There's more... -- Installing Mahout -- How to do it... -- How it works... -- Running K-means with Mahout -- Getting ready -- How to do it... -- How it works... -- Visualizing K-means results -- Getting ready -- How to do it... -- How it works... -- 6. Analytics -- Introduction -- Simple analytics using MapReduce -- Getting ready -- How to do it... -- How it works... -- There's more... -- Performing Group-By using MapReduce -- Getting ready -- How to do it... -- How it works... -- Calculating frequency distributions and sorting using MapReduce -- Getting ready -- How to do it... -- How it works... -- Plotting the Hadoop results using GNU Plot -- Getting ready -- How to do it... -- How it works... -- There's more... -- Calculating histograms using MapReduce -- Getting ready -- How to do it... -- How it works... -- Calculating scatter plots using MapReduce -- Getting ready -- How to do it... -- How it works... -- Parsing a complex dataset with Hadoop -- Getting ready -- How to do it... -- How it works...

Joining two datasets using MapReduce -- Getting ready -- How to do it... -- How it works... -- 7. Searching and Indexing -- Introduction -- Generating an inverted index using Hadoop MapReduce -- Getting ready -- How to do it... -- How it works... -- There's more... -- See also -- Intra-domain web crawling using Apache Nutch -- Getting ready -- How to do it... -- See also -- Indexing and searching web documents using Apache Solr -- Getting Ready -- How to do it -- How it works -- See also -- Configuring Apache HBase as the backend data store for Apache Nutch -- Getting ready -- How to do it -- How it works... -- See also -- Deploying Apache HBase on a Hadoop cluster -- Getting ready -- How to do it -- How it works... -- See also -- Whole web crawling with Apache Nutch using a Hadoop/HBase cluster -- Getting ready -- How to do it -- How it works -- See also -- ElasticSearch for indexing and searching -- Getting ready -- How to do it -- How it works -- See also -- Generating the in-links graph for crawled web pages -- Getting ready -- How to do it -- How it works -- See also -- 8. Classifications, Recommendations, and Finding Relationships -- Introduction -- Content-based recommendations -- Getting ready -- How to do it... -- How it works... -- There's more... -- Hierarchical clustering -- Getting ready -- How to do it... -- How it works... -- There's more... -- Clustering an Amazon sales dataset -- Getting ready -- How to do it... -- How it works... -- There's more... -- Collaborative filtering-based recommendations -- Getting ready -- How to do it... -- How it works... -- Classification using Naive Bayes Classifier -- Getting ready -- How to do it... -- How it works... -- Assigning advertisements to keywords using the Adwords balance algorithm -- Getting ready -- How to do it... -- How it works... -- There's more... -- 9. Mass Text Data Processing.

Introduction.

Abstract:

Individual self-contained code recipes. Solve specific problems using individual recipes, or work through the book to develop your capabilities. If you are a big data enthusiast and striving to use Hadoop to solve your problems, this book is for you. Aimed at Java programmers with some knowledge of Hadoop MapReduce, this is also a comprehensive reference for developers and system admins who want to get up to speed using Hadoop.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Cloud computing.

Electronic data processing -- Distributed processing.

File organization (Computer science).

Open source software.

Genre:

Electronic books.

Added Author:

Gunarathne, Thilina.

Electronic Access:

Click to View

Holds: Copies:

Available:*

Bound With These Titles

On Order