Hadoop For Dummies.

Title:

Hadoop For Dummies.

Author:

deRoos, Dirk.

ISBN:

9781118705032

Personal Author:

deRoos, Dirk.

Edition:

1st ed.

Physical Description:

1 online resource (411 pages)

Contents:

Title Page -- Copyright Page -- Contents at a Glance -- Table of Contents -- Introduction -- About this Book -- Foolish Assumptions -- How This Book Is Organized -- Icons Used in This Book -- Beyond the Book -- Where to Go from Here -- Part I: Getting Started with Hadoop -- Chapter 1: Introducing Hadoop and Seeing What It's Good For -- Big Data and the Need for Hadoop -- The Origin and Design of Hadoop -- Examining the Various Hadoop Offerings -- Chapter 2: Common Use Cases for Big Data in Hadoop -- The Keys to Successfully Adopting Hadoop (Or, "Please, Can We Keep Him?") -- Log Data Analysis -- Data Warehouse Modernization -- Fraud Detection -- Risk Modeling -- Social Sentiment Analysis -- Image Classification -- Graph Analysis -- To Infinity and Beyond -- Chapter 3: Setting Up Your Hadoop Environment -- Choosing a Hadoop Distribution -- Choosing a Hadoop Cluster Architecture -- The Hadoop For Dummies Environment -- Your First Hadoop Program: Hello Hadoop! -- Part II: How Hadoop Works -- Chapter 4: Storing Data in Hadoop: The Hadoop Distributed File System -- Data Storage in HDFS -- Sketching Out the HDFS Architecture -- HDFS Federation -- HDFS High Availability -- Chapter 5: Reading and Writing Data -- Compressing Data -- Managing Files with the Hadoop File System Commands -- Ingesting Log Data with Flume -- Chapter 6: MapReduce Programming -- Thinking in Parallel -- Seeing the Importance of MapReduce -- Doing Things in Parallel: Breaking Big Problems into Many Bite-Size Pieces -- Writing MapReduce Applications -- Getting Your Feet Wet: Writing a Simple MapReduce Application -- Chapter 7: Frameworks for Processing Data in Hadoop: YARN and MapReduce -- Running Applications Before Hadoop 2 -- Seeing a World beyond MapReduce -- Real-Time and Streaming Applications -- Chapter 8: Pig: Hadoop Programming Made Easier -- Admiring the Pig Architecture.

Going with the Pig Latin Application Flow -- Working through the ABCs of Pig Latin -- Evaluating Local and Distributed Modes of Running Pig scripts -- Checking Out the Pig Script Interfaces -- Scripting with Pig Latin -- Chapter 9: Statistical Analysis in Hadoop -- Pumping Up Your Statistical Analysis -- Machine Learning with Mahout -- R on Hadoop -- Chapter 10: Developing and Scheduling Application Workflows with Oozie -- Getting Oozie in Place -- Developing and Running an Oozie Workflow -- Scheduling and Coordinating Oozie Workflows -- Part III: Hadoop and Structured Data -- Chapter 11: Hadoop and the Data Warehouse: Friends or Foes? -- Comparing and Contrasting Hadoop with Relational Databases -- Modernizing the Warehouse with Hadoop -- Chapter 12: Extremely Big Tables: Storing Data in HBase -- Say Hello to HBase -- Understanding the HBase Data Model -- Understanding the HBase Architecture -- Taking HBase for a Test Run -- Getting Things Done with HBase -- HBase and the RDBMS world -- Deploying and Tuning HBase -- Chapter 13: Applying Structure to Hadoop Data with Hive -- Saying Hello to Hive -- Seeing How the Hive is Put Together -- Getting Started with Apache Hive -- Examining the Hive Clients -- Working with Hive Data Types -- Creating and Managing Databases and Tables -- Seeing How the Hive Data Manipulation Language Works -- Querying and Analyzing Data -- Chapter 14: Integrating Hadoop with Relational Databases Using Sqoop -- The Principles of Sqoop Design -- Scooping Up Data with Sqoop -- Sending Data Elsewhere with Sqoop -- Looking at Your Sqoop Input and Output Formatting Options -- Sqoop 2.0 Preview -- Chapter 15: The Holy Grail: Native SQL Access to Hadoop Data -- SQL's Importance for Hadoop -- Looking at What SQL Access Actually Means -- SQL Access and Apache Hive -- Solutions Inspired by Google Dremel -- IBM Big SQL -- Pivotal HAWQ.

Hadapt -- The SQL Access Big Picture -- Part IV: Administering and Configuring Hadoop -- Chapter 16: Deploying Hadoop -- Working with Hadoop Cluster Components -- Hadoop Cluster Configurations -- Alternate Deployment Form Factors -- Sizing Your Hadoop Cluster -- Chapter 17: Administering Your Hadoop Cluster -- Achieving Balance: A Big Factor in Cluster Health -- Mastering the Hadoop Administration Commands -- Understanding Factors for Performance -- Tolerating Faults and Data Reliability -- Putting Apache Hadoop's Capacity Scheduler to Good Use -- Setting Security: The Kerberos Protocol -- Expanding Your Toolset Options -- Basic Hadoop Configuration Details -- Part V: The Part of Tens -- Chapter 18: Ten Hadoop Resources Worthy of a Bookmark -- Central Nervous System: Apache.org -- Tweet This -- Hortonworks University -- Cloudera University -- BigDataUniversity.com -- planet Big Data Blog Aggregator -- Quora's Apache Hadoop Forum -- The IBM Big Data Hub -- Conferences Not to Be Missed -- The Google Papers That Started It All -- The Bonus Resource: What Did We Ever Do B.G.? -- Chapter 19: Ten Reasons to Adopt Hadoop -- Hadoop Is Relatively Inexpensive -- Hadoop Has an Active Open Source Community -- Hadoop Is Being Widely Adopted in Every Industry -- Hadoop Can Easily Scale Out As Your Data Grows -- Traditional Tools Are Integrating with Hadoop -- Hadoop Can Store Data in Any Format -- Hadoop Is Designed to Run Complex Analytics -- Hadoop Can Process a Full Data Set (As Opposed to Sampling) -- Hardware Is Being Optimized for Hadoop -- Hadoop Can Increasingly Handle Flexible Workloads (No Longer Just Batch) -- Index -- About the Authors.

Abstract:

Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Apache Hadoop.

Electronic data processing -- Distributed processing.

File organization (Computer science) -- Computer programs.

Genre:

Added Author:

Electronic Access:

Holds: Copies:

Available:*

Bound With These Titles

On Order