Storm Blueprints : Patterns for Distributed Real-time Computation.

Title:

Author:

Goetz, P. Taylor.

ISBN:

9781782168300

Personal Author:

Goetz, P. Taylor.

Physical Description:

1 online resource (374 pages)

Contents:

Storm Blueprints: Patterns for Distributed Real-time Computation -- Table of Contents -- Storm Blueprints: Patterns for Distributed Real-time Computation -- Credits -- About the Authors -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Preface -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Errata -- Piracy -- Questions -- 1. Distributed Word Count -- Introducing elements of a Storm topology - streams, spouts, and bolts -- Streams -- Spouts -- Bolts -- Introducing the word count topology data flow -- Sentence spout -- Introducing the split sentence bolt -- Introducing the word count bolt -- Introducing the report bolt -- Implementing the word count topology -- Setting up a development environment -- Implementing the sentence spout -- Implementing the split sentence bolt -- Implementing the word count bolt -- Implementing the report bolt -- Implementing the word count topology -- Introducing parallelism in Storm -- WordCountTopology parallelism -- Adding workers to a topology -- Configuring executors and tasks -- Understanding stream groupings -- Guaranteed processing -- Reliability in spouts -- Reliability in bolts -- Reliable word count -- Summary -- 2. Configuring Storm Clusters -- Introducing the anatomy of a Storm cluster -- Understanding the nimbus daemon -- Working with the supervisor daemon -- Introducing Apache ZooKeeper -- Working with Storm's DRPC server -- Introducing the Storm UI -- Introducing the Storm technology stack -- Java and Clojure -- Python -- Installing Storm on Linux -- Installing the base operating system -- Installing Java -- ZooKeeper installation -- Storm installation -- Running the Storm daemons.

Configuring Storm -- Mandatory settings -- Optional settings -- The Storm executable -- Setting up the Storm executable on a workstation -- The daemon commands -- Nimbus -- Supervisor -- UI -- DRPC -- The management commands -- Jar -- Kill -- Deactivate -- Activate -- Rebalance -- Remoteconfvalue -- Local debug/development commands -- REPL -- Classpath -- Localconfvalue -- Submitting topologies to a Storm cluster -- Automating the cluster configuration -- A rapid introduction to Puppet -- Puppet manifests -- Puppet classes and modules -- Puppet templates -- Managing environments with Puppet Hiera -- Introducing Hiera -- Summary -- 3. Trident Topologies and Sensor Data -- Examining our use case -- Introducing Trident topologies -- Introducing Trident spouts -- Introducing Trident operations - filters and functions -- Introducing Trident filters -- Introducing Trident functions -- Introducing Trident aggregators - Combiners and Reducers -- CombinerAggregator -- ReducerAggregator -- Aggregator -- Introducing the Trident state -- The Repeat Transactional state -- The Opaque state -- Executing the topology -- Summary -- 4. Real-time Trend Analysis -- Use case -- Architecture -- The source application -- The logback Kafka appender -- Apache Kafka -- Kafka spout -- The XMPP server -- Installing the required software -- Installing Kafka -- Installing OpenFire -- Introducing the sample application -- Sending log messages to Kafka -- Introducing the log analysis topology -- Kafka spout -- The JSON project function -- Calculating a moving average -- Adding a sliding window -- Implementing the moving average function -- Filtering on thresholds -- Sending notifications with XMPP -- The final topology -- Running the log analysis topology -- Summary -- 5. Real-time Graph Analysis -- Use case -- Architecture -- The Twitter client -- Kafka spout.

A titan-distributed graph database -- A brief introduction to graph databases -- Accessing the graph - the TinkerPop stack -- Manipulating the graph with the Blueprints API -- Manipulating the graph with the Gremlin shell -- Software installation -- Titan installation -- Setting up Titan to use the Cassandra storage backend -- Installing Cassandra -- Starting Titan with the Cassandra backend -- Graph data model -- Connecting to the Twitter stream -- Setting up the Twitter4J client -- The OAuth configuration -- The TwitterStreamConsumer class -- The TwitterStatusListener class -- Twitter graph topology -- The JSONProjectFunction class -- Implementing GraphState -- GraphFactory -- GraphTupleProcessor -- GraphStateFactory -- GraphState -- GraphUpdater -- Implementing GraphFactory -- Implementing GraphTupleProcessor -- Putting it all together - the TwitterGraphTopology class -- The TwitterGraphTopology class -- Querying the graph with Gremlin -- Summary -- 6. Artificial Intelligence -- Designing for our use case -- Establishing the architecture -- Examining the design challenges -- Implementing the recursion -- Accessing the function's return values -- Immutable tuple field values -- Upfront field declaration -- Tuple acknowledgement in recursion -- Output to multiple streams -- Read-before-write -- Solving the challenges -- Implementing the architecture -- The data model -- Examining the recursive topology -- The queue interaction -- Functions and filters -- Examining the Scoring Topology -- Addressing read-before-write -- Distributed locking -- Retry when stale -- Executing the topology -- Enumerating the game tree -- Distributed Remote Procedure Call (DRPC) -- Remote deployment -- Summary -- 7. Integrating Druid for Financial Analytics -- Use case -- Integrating a non-transactional system -- The topology -- The spout -- The filter -- The state design.

Implementing the architecture -- DruidState -- Implementing the StormFirehose object -- Implementing the partition status in ZooKeeper -- Executing the implementation -- Examining the analytics -- Summary -- 8. Natural Language Processing -- Motivating a Lambda architecture -- Examining our use case -- Realizing a Lambda architecture -- Designing the topology for our use case -- Implementing the design -- TwitterSpout/TweetEmitter -- Functions -- TweetSplitterFunction -- WordFrequencyFunction -- PersistenceFunction -- Examining the analytics -- Batch processing / historical analysis -- Hadoop -- An overview of MapReduce -- The Druid setup -- HadoopDruidIndexer -- Summary -- 9. Deploying Storm on Hadoop for Advertising Analysis -- Examining the use case -- Establishing the architecture -- Examining HDFS -- Examining YARN -- Configuring the infrastructure -- The Hadoop infrastructure -- Configuring HDFS -- Configuring the NameNode -- Configuring the DataNode -- Configuring YARN -- Configuring the ResourceManager -- Configuring the NodeManager -- Deploying the analytics -- Performing a batch analysis with the Pig infrastructure -- Performing a real-time analysis with the Storm-YARN infrastructure -- Performing the analytics -- Executing the batch analysis -- Executing real-time analysis -- Deploying the topology -- Executing the topology -- Summary -- 10. Storm in the Cloud -- Introducing Amazon Elastic Compute Cloud (EC2) -- Setting up an AWS account -- The AWS Management Console -- Creating an SSH key pair -- Launching an EC2 instance manually -- Logging in to the EC2 instance -- Introducing Apache Whirr -- Installing Whirr -- Configuring a Storm cluster with Whirr -- Launching the cluster -- Introducing Whirr Storm -- Setting up Whirr Storm -- Cluster configuration -- Customizing Storm's configuration -- Customizing firewall rules.

Introducing Vagrant -- Installing Vagrant -- Launching your first virtual machine -- The Vagrantfile and shared filesystem -- Vagrant provisioning -- Configuring multimachine clusters with Vagrant -- Creating Storm-provisioning scripts -- ZooKeeper -- Storm -- Supervisord -- The Storm Vagrantfile -- Launching the Storm cluster -- Summary -- Index.

Abstract:

A blueprints book with 10 different projects built in 10 different chapters which demonstrate the various use cases of storm for both beginner and intermediate users, grounded in real-world example applications.Although the book focuses primarily on Java development with Storm, the patterns are more broadly applicable and the tips, techniques, and approaches described in the book apply to architects, developers, and operations.Additionally, the book should provoke and inspire applications of distributed computing to other industries and domains. Hadoop enthusiasts will also find this book a good introduction to Storm, providing a potential migration path from batch processing to the world of real-time analytics.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Data structures (Computer science).

Real-time data processing.

Genre:

Electronic books.

Added Author:

O'Neill, Brian.

Electronic Access:

Click to View

Holds: Copies:

Available:*

Bound With These Titles

On Order