Data Warehousing in the Age of Big Data.

Title:

Author:

Krishnan, Krish.

ISBN:

9780124059207

Personal Author:

Krishnan, Krish.

Physical Description:

1 online resource (371 pages)

Series:

The Morgan Kaufmann Series on Business Intelligence

Contents:

Front Cover -- Data Warehousing in the Age of Big Data -- Copyright Page -- Contents -- Acknowledgments -- About the Author -- Introduction -- Part 1: Big Data -- Part 2: The Data Warehousing -- Part 3: Building the Big Data - Data Warehouse -- Appendixes -- Companion website -- 1 BIG DATA -- 1 Introduction to Big Data -- Introduction -- Big Data -- Defining Big Data -- Why Big Data and why now? -- Big Data example -- Social Media posts -- Survey data analysis -- Survey data -- Weather data -- Twitter data -- Integration and analysis -- Additional data types -- Summary -- Further reading -- 2 Working with Big Data -- Introduction -- Data explosion -- Data volume -- Machine data -- Application log -- Clickstream logs -- External or third-party data -- Emails -- Contracts -- Geographic information systems and geo-spatial data -- Example: Funshots, Inc. -- Data velocity -- Amazon, Facebook, Yahoo, and Google -- Sensor data -- Mobile networks -- Social media -- Data variety -- Summary -- 3 Big Data Processing Architectures -- Introduction -- Data processing revisited -- Data processing techniques -- Data processing infrastructure challenges -- Storage -- Transportation -- Processing -- Speed or throughput -- Shared-everything and shared-nothing architectures -- Shared-everything architecture -- Shared-nothing architecture -- OLTP versus data warehousing -- Big Data processing -- Infrastructure explained -- Data processing explained -- Telco Big Data study -- Infrastructure -- Data processing -- 4 Introducing Big Data Technologies -- Introduction -- Distributed data processing -- Big Data processing requirements -- Technologies for Big Data processing -- Google file system -- Hadoop -- Hadoop core components -- HDFS -- HDFS architecture -- NameNode -- DataNodes -- Image -- Journal -- Checkpoint -- HDFS startup -- Block allocation and storage in HDFS.

HDFS client -- Replication and recovery -- Communication and management -- Heartbeats -- CheckpointNode and BackupNode -- CheckpointNode -- BackupNode -- File system snapshots -- JobTracker and TaskTracker -- MapReduce -- MapReduce programming model -- MapReduce program design -- MapReduce implementation architecture -- MapReduce job processing and management -- MapReduce limitations (Version 1, Hadoop MapReduce) -- MapReduce v2 (YARN) -- YARN scalability -- Comparison between MapReduce v1 and v2 -- SQL/MapReduce -- Zookeeper -- Zookeeper features -- Locks and processing -- Failure and recovery -- Pig -- Programming with pig latin -- Pig data types -- Running pig programs -- Pig program flow -- Common pig command -- HBase -- HBase architecture -- HBase components -- Write-ahead log -- Hive -- Hive architecture -- Infrastructure -- Execution: how does hive process queries? -- Hive data types -- Hive query language (HiveQL) -- Chukwa -- Flume -- Oozie -- HCatalog -- Sqoop -- Sqoop1 -- Sqoop2 -- Hadoop summary -- NoSQL -- CAP theorem -- Key-value pair: Voldemort -- Column family store: Cassandra -- Data model -- Data partitioning -- Data sorting -- Consistency management -- Write consistency -- Read consistency -- Specifying client consistency levels -- Built-in consistency repair features -- Cassandra ring architecture -- Data placement -- Data partitioning -- Peer-to-Peer: simple scalability -- Gossip protocol: node management -- Document database: Riak -- Graph databases -- NoSQL summary -- Textual ETL processing -- Further reading -- 5 Big Data Driving Business Value -- Introduction -- Case study 1: Sensor data -- Summary -- Vestas -- Overview -- Producing electricity from wind -- Turning climate into capital -- Tackling Big Data challenges -- Maintaining energy efficiency in its data center -- Case study 2: Streaming data -- Summary.

Surveillance and security: TerraEchos -- The need -- The solution -- The benefit -- Advanced fiber optics combine with real-time streaming data -- Solution components -- Extending the security perimeter creates a strategic advantage -- Correlating sensor data delivers a zero false-positive rate -- Case study 3: The right prescription: improving patient outcomes with Big Data analytics -- Summary -- Business objective -- Challenges -- Overview: giving practitioners new insights to guide patient care -- Challenges: blending traditional data warehouse ecosystems with Big Data -- Solution: getting ready for Big Data analytics -- Results: eliminating the "Data Trap" -- Why aster? -- About aurora -- Case study 4: University of Ontario, institute of technology: leveraging key data to provide proactive patient care -- Summary -- Overview -- Business benefits -- Making better use of the data resource -- Smarter healthcare -- Solution components -- Merging human knowledge and technology -- Broadening the impact of artemis -- Case study 5: Microsoft SQL server customer solution -- Customer profile -- Solution spotlight -- Business needs -- Solution -- Benefits -- Speed efficiency and cut costs -- Increases insight and advantage -- Facilitates innovation -- Case study 6: Customer-centric data integration -- Overview -- Solution design -- Enabling a better cross-sell and upsell opportunity -- Example -- Summary -- 2 THE DATA WAREHOUSING -- 6 Data Warehousing Revisited -- Introduction -- Traditional data warehousing, or data warehousing 1.0 -- Data architecture -- Infrastructure -- Pitfalls of data warehousing -- Performance -- Scalability -- Architecture approaches to building a data warehouse -- Pros and cons of information factory approach -- Pros and cons of datamart BUS architecture approach -- Data warehouse 2.0 -- Overview of Inmon's DW 2.0.

Overview of DSS 2.0 -- Summary -- Further reading -- 7 Reengineering the Data Warehouse -- Introduction -- Enterprise data warehouse platform -- Transactional systems -- Operational data store -- Staging area -- Data warehouse -- Datamarts -- Analytical databases -- Issues with the data warehouse -- Choices for reengineering the data warehouse -- Replatforming -- Platform engineering -- Data engineering -- Modernizing the data warehouse -- Case study of data warehouse modernization -- Current-state analysis -- Recommendations -- Business benefits of modernization -- The appliance selection process -- Request For Information/Request For Proposal (RFI/RFP) -- Vendor information -- Product information -- Scorecard -- Proof of concept process -- Program roadmap -- Modernization ROI -- Additional benefits -- Summary -- 8 Workload Management in the Data Warehouse -- Introduction -- Current state -- Defining workloads -- Understanding workloads -- Data warehouse outbound -- End-user application -- Data outbound to users -- Data inbound from users -- Datamarts -- Data outbound to users -- Data inbound from users -- Analytical databases -- Data warehouse inbound -- Data warehouse processing overheads -- Query classification -- Wide/Wide -- Wide/Narrow -- Narrow/Wide -- Narrow/Narrow -- Unstructured/semi-structured data -- ETL and CDC workloads -- Measurement -- Current system design limitations -- New workloads and Big Data -- Big Data workloads -- Technology choices -- Summary -- 9 New Technologies Applied to Data Warehousing -- Introduction -- Data warehouse challenges revisited -- Data loading -- Availability -- Data volumes -- Storage performance -- Query performance -- Data transport -- Data warehouse appliance -- Appliance architecture -- Data distribution in the appliance -- Key best practices for deploying a data warehouse appliance.

Big Data appliances -- Cloud computing -- Infrastructure as a service -- Platform as a service -- Software as a service -- Cloud infrastructure -- Benefits of cloud computing for data warehouse -- Issues facing cloud computing for data warehouse -- Data virtualization -- What is data virtualization? -- Increasing business intelligence performance -- Workload distribution -- Implementing a data virtualization program -- Pitfalls to avoid when using data virtualization -- In-memory technologies -- Benefits of in-memory architectures -- Summary -- Further reading -- 3 BUILDING THE BIG DATA - DATA WAREHOUSE -- 10 Integration of Big Data and Data Warehousing -- Introduction -- Components of the new data warehouse -- Data layer -- Algorithms -- Technology layer -- Integration strategies -- Data-driven integration -- Data classification -- Architecture -- Workload -- Analytics -- Physical component integration and architecture -- Data loading -- Data availability -- Data volumes -- Storage performance -- Operational costs -- External data integration -- Hadoop & RDBMS -- Big Data appliances -- Data virtualization -- Semantic framework -- Lexical processing -- Clustering -- Semantic knowledge processing -- Information extraction -- Visualization -- Summary -- 11 Data-Driven Architecture for Big Data -- Introduction -- Metadata -- Technical metadata -- Business metadata -- Contextual metadata -- Process design-level metadata -- Program-level metadata -- Infrastructure metadata -- Core business metadata -- Operational metadata -- Business intelligence metadata -- Master data management -- Processing data in the data warehouse -- Processing complexity of Big Data -- Processing limitations -- Processing Big Data -- Gather stage -- Analysis stage -- Process stage -- Context processing -- Metadata, master data, and semantic linkage -- Types of probabilistic links.

Standardize.

Abstract:

Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author Krish Krishnan helps you make sense of how Big Data fits into the world of data warehousing in clear and concise detail. The book is presented in three distinct parts. Part 1 discusses Big Data, its technologies and use cases from early adopters. Part 2 addresses data warehousing, its shortcomings, and new architecture options, workloads, and integration techniques for Big Data and the data warehouse. Part 3 deals with data governance, data visualization, information life-cycle management, data scientists, and implementing a Big Data-ready data warehouse. Extensive appendixes include case studies from vendor implementations and a special segment on how we can build a healthcare information factory. Ultimately, this book will help you navigate through the complex layers of Big Data and data warehousing while providing you information on how to effectively think about using all these technologies and the architectures to design the next-generation data warehouse. Learn how to leverage Big Data by effectively integrating it into your data warehouse. Includes real-world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies Understand how to optimize and tune your current data warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements.

Local Note:

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

Subject Term:

Genre:

Electronic Access:

Holds: Copies:

Available:*

Bound With These Titles

On Order