Cover image for Professional Hadoop Solutions.
Professional Hadoop Solutions.
Title:
Professional Hadoop Solutions.
Author:
Lublinsky, Boris.
ISBN:
9781118612545
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (506 pages)
Contents:
Professional Hadoop® Solutions -- Copyright -- Credits -- About the Authors -- About the Technical Editors -- Acnowledgments -- Contents -- Introduction -- Who This Book Is For -- What This Book Covers -- How This Book Is Structured -- What You Need to Use This Book -- Conventions -- Source Code -- Errata -- P2P.Wrox.Com -- Chapter 1: Big Data and the Hadoop Ecosystem -- Big Data Meets Hadoop -- Hadoop: Meeting the Big Data Challenge -- Data Science in the Business World -- The Hadoop Ecosystem -- Hadoop Core Components -- Hadoop Distributions -- Developing Enterprise Applications with Hadoop -- Summary -- Chapter 2: Storing Data in Hadoop -- HDFS -- HDFS Architecture -- Using HDFS Files -- Hadoop-Specific File Types -- HDFS Federation and High Availability -- HBase -- HBase Architecture -- HBase Schema Design -- Programming for HBase -- New HBase Features -- Combining HDFS and HBase for Effective Data Storage -- Using Apache Avro -- Managing Metadata with HCatalog -- Choosing an Appropriate Hadoop Data Organization for Your Applications -- Summary -- Chapter 3: Processing Your Data with MapReduce -- Getting to Know MapReduce -- MapReduce Execution Pipeline -- Runtime Coordination and Task Management in MapReduce -- Your First MapReduce Application -- Building and Executing MapReduce Programs -- Designing MapReduce Implementations -- Using MapReduce as a Framework for Parallel Processing -- Simple Data Processing with MapReduce -- Building Joins with MapReduce -- Building Iterative MapReduce Applications -- To MapReduce or Not to MapReduce? -- Common MapReduce Design Gotchas -- Summary -- Chapter 4: Customizing MapReduce Execution -- Controlling MapReduce Execution with InputFormat -- Implementing InputFormat for Compute-Intensive Applications -- Implementing InputFormat to Control the Number of Maps.

Implementing InputFormat for Multiple HBase Tables -- Reading Data Your Way with Custom RecordReaders -- Implementing a Queue-Based RecordReader -- Implementing RecordReader for XML Data -- Organizing Output Data with Custom Output Formats -- Implementing OutputFormat for Splitting MapReduce Job's Output into Multiple Directories -- Writing Data Your Way with Custom RecordWriters -- Implementing a RecordWriter to Produce Output tar Files -- Optimizing Your MapReduce Execution with a Combiner -- Controlling Reducer Execution with Partitioners -- Implementing a Custom Partitioner for One-to-Many Joins -- Using Non-Java Code with Hadoop -- Pipes -- Hadoop Streaming -- Using JNI -- Summary -- Chapter 5: Building Reliable MapReduce Apps -- Unit Testing MapReduce Applications -- Testing Mappers -- Testing Reducers -- Integration Testing -- Local Application Testing with Eclipse -- Using Logging for Hadoop Testing -- Processing Applications Logs -- Reporting Metrics with Job Counters -- Defensive Programming in MapReduce -- Summary -- Chapter 6: Automating Data Processing with Oozie -- Getting to Know Oozie -- Oozie Workflow -- Executing Asynchronous Activities in Oozie Workflow -- Oozie Recovery Capabilities -- Oozie Workflow Job Life Cycle -- Oozie Coordinator -- Oozie Bundle -- Oozie Parameterization with Expression Language -- Workflow Functions -- Coordinator Functions -- Bundle Functions -- Other EL Functions -- Oozie Job Execution Model -- Accessing Oozie -- Oozie SLA -- Summary -- Chapter 7: Using Oozie -- Validating Information about Places Using Probes -- Designing Place Validation Based on Probes -- Designing Oozie Workflows -- Implementing Oozie Workflow Applications -- Implementing the Data Preparation Workflow -- Implementing Attendance Index and Cluster Strands Workflows -- Implementing Workflow Activities.

Populating the Execution Context from a java Action -- Using MapReduce Jobs in Oozie Workflows -- Implementing Oozie Coordinator Applications -- Implementing Oozie Bundle Applications -- Deploying, Testing, and Executing Oozie Applications -- Deploying Oozie Applications -- Using the Oozie CLI for Execution of an Oozie Application -- Passing Arguments to Oozie Jobs -- Using the Oozie Console to Get Information about Oozie Applications -- Getting to Know the Oozie Console Screens -- Getting Information about a Coordinator Job -- Summary -- Chapter 8: Advanced Oozie Features -- Building Custom Oozie Workflow Actions -- Implementing a Custom Oozie Workflow Action -- Deploying Oozie Custom Workflow Actions -- Adding Dynamic Execution to Oozie Workflows -- Overall Implementation Approach -- A Machine Learning Model, Parameters, and Algorithm -- Defining a Workflow for an Iterative Process -- Dynamic Workflow Generation -- Using the Oozie Java API -- Using Uber Jars with Oozie Applications -- Data Ingestion Conveyer -- Summary -- Chapter 9: Real-Time Hadoop -- Real-Time Applications in the Real World -- Using HBase for Implementing Real-Time Applications -- Using HBase as a Picture Management System -- Using HBase as a Lucene Back End -- Using Specialized Real-Time Hadoop Query Systems -- Apache Drill -- Impala -- Comparing Real-Time Queries to MapReduce -- Using Hadoop-Based Event-Processing Systems -- HFlame -- Storm -- Comparing Event Processing to MapReduce -- Summary -- Chapter 10: Hadoop Security -- A Brief History: Understanding Hadoop Security Challenges -- Authentication -- Kerberos Authentication -- Delegated Security Credentials -- Authorization -- HDFS File Permissions -- Service-Level Authorization -- Job Authorization -- Oozie Authentication and Authorization -- Network Encryption -- Security Enhancements with Project Rhino.

HDFS Disk-Level Encryption -- Token-Based Authentication and Unified Authorization Framework -- HBase Cell-Level Security -- Putting it All Together - Best Practices for Securing Hadoop -- Authentication -- Authorization -- Network Encryption -- Stay Tuned for Hadoop Enhancements -- Summary -- Chapter 11: Running Hadoop Applications on AWS -- Getting to Know AWS -- Options for Running Hadoop on AWS -- Custom Installation using EC2 Instances -- Elastic MapReduce -- Additional Considerations before Making Your Choice -- Understanding the EMR-Hadoop Relationship -- EMR Architecture -- Using S3 Storage -- Maximizing Your Use of EMR -- Utilizing CloudWatch and Other AWS Components -- Accessing and Using EMR -- Using AWS S3 -- Understanding the Use of Buckets -- Content Browsing with the Console -- Programmatically Accessing Files in S3 -- Using MapReduce to Upload Multiple Files to S3 -- Automating EMR Job Flow Creation and Job Execution -- Orchestrating Job Execution in EMR -- Using Oozie on an EMR Cluster -- AWS Simple Workflow -- AWS Data Pipeline -- Summary -- Chapter 12: Building Enterprise Security Solutions for Hadoop Implementations -- Security Concerns for Enterprise Applications -- Authentication -- Authorization -- Confidentiality -- Integrity -- Auditing -- What Hadoop Security Doesn't Natively Provide for Enterprise Applications -- Data-Oriented Access Control -- Differential Privacy -- Encrypted Data at Rest -- Enterprise Security Integration -- Approaches for Securing Enterprise Applications Using Hadoop -- Access Control Protection with Accumulo -- Encryption at Rest -- Network Isolation and Separation Approaches -- Summary -- Chapter 13: Hadoop's Future -- Simplifying MapReduce Programming with DSLs -- What Are DSLs? -- DSLs for Hadoop -- Faster, More Scalable Processing -- Apache YARN -- Tez -- Security Enhancements -- Emerging Trends.

Summary -- Appendix: Useful Reading -- Storing and Accessing Hadoop Data -- MapReduce -- Oozie -- Real-Time Hadoop -- AWS -- Hadoop DSLs -- Hadoop and Big Data Security -- Index -- Advertisement.
Abstract:
The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world solutions. This book is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop processes in real time are also covered in depth. With in-depth code examples in Java and XML and the latest on recent additions to the Hadoop ecosystem, this complete resource also covers the use of APIs, exposing their inner workings and allowing architects and developers to better leverage and customize them. The ultimate guide for developers, designers, and architects who need to build and deploy Hadoop applications Covers storing and processing data with various technologies, automating data processing, Hadoop security, and delivering real-time solutions Includes detailed, real-world examples and code-level guidelines Explains when, why, and how to use these tools effectively Written by a team of Hadoop experts in the programmer-to-programmer Wrox style Professional Hadoop Solutions is the reference enterprise architects and developers need to maximize the power of Hadoop.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: