Fastdata Processing with Spark.

Title:

Author:

Karau, Holden.

ISBN:

9781782167075

Personal Author:

Karau, Holden.

Physical Description:

1 online resource (151 pages)

Contents:

Fast Data Processing with Spark -- Table of Contents -- Fast Data Processing with Spark -- Credits -- About the Author -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Disclaimer -- Errata -- Piracy -- Questions -- 1. Installing Spark and Setting Up Your Cluster -- Running Spark on a single machine -- Running Spark on EC2 -- Running Spark on EC2 with the scripts -- Deploying Spark on Elastic MapReduce -- Deploying Spark with Chef (opscode) -- Deploying Spark on Mesos -- Deploying Spark on YARN -- Deploying set of machines over SSH -- Links and references -- Summary -- 2. Using the Spark Shell -- Loading a simple text file -- Using the Spark shell to run logistic regression -- Interactively loading data from S3 -- Summary -- 3. Building and Running a Spark Application -- Building your Spark project with sbt -- Building your Spark job with Maven -- Building your Spark job with something else -- Summary -- 4. Creating a SparkContext -- Scala -- Java -- Shared Java and Scala APIs -- Python -- Links and references -- Summary -- 5. Loading and Saving Data in Spark -- RDDs -- Loading data into an RDD -- Saving your data -- Links and references -- Summary -- 6. Manipulating Your RDD -- Manipulating your RDD in Scala and Java -- Scala RDD functions -- Functions for joining PairRDD functions -- Other PairRDD functions -- DoubleRDD functions -- General RDD functions -- Java RDD functions -- Spark Java function classes -- Common Java RDD functions -- Methods for combining JavaPairRDD functions -- JavaPairRDD functions -- Manipulating your RDD in Python -- Standard RDD functions.

PairRDD functions -- Links and references -- Summary -- 7. Shark - Using Spark with Hive -- Why Hive/Shark? -- Installing Shark -- Running Shark -- Loading data -- Using Hive queries in a Spark program -- Links and references -- Summary -- 8. Testing -- Testing in Java and Scala -- Refactoring your code for testability -- Testing interactions with SparkContext -- Testing in Python -- Links and references -- Summary -- 9. Tips and Tricks -- Where to find logs? -- Concurrency limitations -- Memory usage and garbage collection -- Serialization -- IDE integration -- Using Spark with other languages -- A quick note on security -- Mailing lists -- Links and references -- Summary -- Index.

Abstract:

This book will be a basic, step-by-step tutorial, which will help readers take advantage of all that Spark has to offer.Fastdata Processing with Spark is for software developers who want to learn how to write distributed programs with Spark. It will help developers who have had problems that were too much to be dealt with on a single computer. No previous experience with distributed programming is necessary. This book assumes knowledge of either Java, Scala, or Python.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Big data.

Data mining -- Computer programs.

SPARK (Electronic resource).

Genre:

Electronic books.

Electronic Access:

Click to View

Holds: Copies:

Available:*

Bound With These Titles

On Order