
HDInsight Essentials.
Title:
HDInsight Essentials.
Author:
Nadipalli, Rajesh.
ISBN:
9781849695374
Personal Author:
Physical Description:
1 online resource (147 pages)
Contents:
HDInsight Essentials -- Table of Contents -- HDInsight Essentials -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Errata -- Piracy -- Questions -- 1. Hadoop and HDInsight in a Heartbeat -- Big Data - hype or real? -- Apache Hadoop concepts -- Core components -- Hadoop cluster layout -- The Hadoop ecosystem -- Data access -- Data processing -- The Hadoop data store -- Management and integration -- Hadoop distributions -- HDInsight distribution differentiator -- End-to-end solution using HDInsight -- Key phases of a Hadoop project -- Stage 1 - collect data -- Stage 2a - process your data (build MapReduce) -- Stage 2b - process your data (execute MapReduce) -- Stage 3 - analyze data using JavaScript and Pig -- Stage 4 - report data using JavaScript charts -- Summary -- 2. Deploying HDInsight on Premise -- HDInsight and Hadoop relationship -- Deployment options for on-premise -- Windows HDInsight server -- Hortonworks Data Platform (HDP for Windows) -- Supported platforms for on-premise install -- Single-node install -- Downloading the software -- Running the install wizard -- Validating the install -- Multinode planning and preparation -- Setting up the network -- Setting common time on all nodes -- Setting up remote scripting -- Configuring firewall ports -- Multinode installation -- Downloading the software -- Configuring the multinode install -- Running the installer -- Validating the install -- Managing HDInsight services -- Uninstalling HDInsight -- Summary -- 3. HDInsight Azure Cloud Service -- HDInsight Service on Azure.
Considerations for Azure HDInsight Service -- Provision your cluster -- HDInsight management dashboard -- Verify the cluster and run sample jobs -- Access HDFS -- Deploy and execute the sample MapReduce job -- View job results -- Monitor your cluster -- Azure storage integration -- Remove your cluster -- Delete your cluster -- Delete your storage -- Restore your cluster -- Summary -- 4. Administering Your HDInsight Cluster -- Cluster status -- Distributed filesystem health -- NameNode URL -- Browsing HDFS -- MapReduce health -- MapReduce summary -- MapReduce Job History -- Key files -- Backing up NameNode content -- Summary -- 5. Ingesting Data to Your Cluster -- Loading data using Hadoop commands -- Step 1 - connect to a Hadoop client -- Step 2 - get your files on local storage -- Step 3 - upload to HDFS -- Loading data using Azure Storage Vault (ASV) -- Storage access keys -- Storage tools -- Azure Storage Explorer -- Registering your storage account -- Uploading files to your blob storage -- Loading data using interactive JavaScript -- Shipping data to Azure -- Loading data using Sqoop -- Key benefits -- Two modes of using Sqoop -- Using Sqoop to import (SQL to Hadoop) -- Summary -- 6. Transforming Data in Cluster -- Transformation scenario -- Scenario -- Transformation objective -- File organization -- MapReduce solution -- Design -- Map code -- Reduce code -- Driver code -- Compiling and packaging the code -- Executing MapReduce -- Results verification -- Hive solution -- Overview of Hive -- Starting Hive in the HDInsight node -- Step 1 - table creation -- Step 2 - table loading -- Step 3 - summary table creation -- Step 4 - verifying the summary table -- Pig solution -- Pig architecture -- Pig or Hive? -- Starting Pig in the HDInsight node -- Pig Grunt script -- Code -- Code explanation -- Execution -- Verification -- Summary.
7. Analyzing and Reporting Your Data -- Analyzing and reporting using Excel -- Step 1 - installing the Hive ODBC driver -- Step 2 - creating Hive ODBC data source -- Step 3 - importing data to Excel -- Hive for ad hoc queries -- Creating reference tables -- Ad hoc queries -- Analytic functions in HiveQL -- Interactive JavaScript for analysis and reporting -- Other business intelligence tools -- Summary -- 8. Project Planning Tips and Resources -- Architectural considerations -- Extensible and modular -- Metadata-driven solution -- Integration strategy -- Security -- Project planning -- Proof of Concept -- Production implementation -- Reference sites and blogs -- Summary -- Index.
Abstract:
This book is a fast-paced guide full of step-by-step instructions on how to build a multi-node Hadoop cluster on Windows servers.If you are a data architect or developer who wants to understand how to transform your data using open source software, such as MapReduce, Hive, Pig and JavaScript, and also leverage the Windows infrastructure; this book is perfect for you. It is also ideal if you are part of a team who is starting or planning a Hadoop implementation, and you want to understand the key components of Hadoop, and how HDInsight provides added value in administration and reporting.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Genre:
Electronic Access:
Click to View