Cover image for High Performance Parallel Database Processing and Grid Databases.
High Performance Parallel Database Processing and Grid Databases.
Title:
High Performance Parallel Database Processing and Grid Databases.
Author:
Leung, Clement H. C.
ISBN:
9780470391358
Personal Author:
Edition:
1st ed.
Physical Description:
1 online resource (574 pages)
Series:
Wiley Series on Parallel and Distributed Computing Ser. ; v.67

Wiley Series on Parallel and Distributed Computing Ser.
Contents:
High-Performance Parallel Database Processing and Grid Databases -- Contents -- Preface -- Part I Introduction -- 1. Introduction -- 1.1. A Brief Overview: Parallel Databases and Grid Databases -- 1.2. Parallel Query Processing: Motivations -- 1.3. Parallel Query Processing: Objectives -- 1.3.1. Speed Up -- 1.3.2. Scale Up -- 1.3.3. Parallel Obstacles -- 1.4. Forms of Parallelism -- 1.4.1. Interquery Parallelism -- 1.4.2. Intraquery Parallelism -- 1.4.3. Intraoperation Parallelism -- 1.4.4. Interoperation Parallelism -- 1.4.5. Mixed Parallelism-A More Practical Solution -- 1.5. Parallel Database Architectures -- 1.5.1. Shared-Memory and Shared-Disk Architectures -- 1.5.2. Shared-Nothing Architecture -- 1.5.3. Shared-Something Architecture -- 1.5.4. Interconnection Networks -- 1.6. Grid Database Architecture -- 1.7. Structure of this Book -- 1.8. Summary -- 1.9. Bibliographical Notes -- 1.10. Exercises -- 2. Analytical Models -- 2.1. Cost Models -- 2.2. Cost Notations -- 2.2.1. Data Parameters -- 2.2.2. Systems Parameters -- 2.2.3. Query Parameters -- 2.2.4. Time Unit Costs -- 2.2.5. Communication Costs -- 2.3. Skew Model -- 2.4. Basic Operations in Parallel Databases -- 2.4.1. Disk Operations -- 2.4.2. Main Memory Operations -- 2.4.3. Data Computation and Data Distribution -- 2.5. Summary -- 2.6. Bibliographical Notes -- 2.7. Exercises -- Part II Basic Query Parallelism -- 3. Parallel Search -- 3.1. Search Queries -- 3.1.1. Exact-Match Search -- 3.1.2. Range Search Query -- 3.1.3. Multiattribute Search Query -- 3.2. Data Partitioning -- 3.2.1. Basic Data Partitioning -- 3.2.2. Complex Data Partitioning -- 3.3. Search Algorithms -- 3.3.1. Serial Search Algorithms -- 3.3.2. Parallel Search Algorithms -- 3.4. Summary -- 3.5. Bibliographical Notes -- 3.6. Exercises -- 4. Parallel Sort and GroupBy.

4.1. Sorting, Duplicate Removal, and Aggregate Queries -- 4.1.1. Sorting and Duplicate Removal -- 4.1.2. Scalar Aggregate -- 4.1.3. GroupBy -- 4.2. Serial External Sorting Method -- 4.3. Algorithms for Parallel External Sort -- 4.3.1. Parallel Merge-All Sort -- 4.3.2. Parallel Binary-Merge Sort -- 4.3.3. Parallel Redistribution Binary-Merge Sort -- 4.3.4. Parallel Redistribution Merge-All Sort -- 4.3.5. Parallel Partitioned Sort -- 4.4. Parallel Algorithms for GroupBy Queries -- 4.4.1. Traditional Methods (Merge-All and Hierarchical Merging) -- 4.4.2. Two-Phase Method -- 4.4.3. Redistribution Method -- 4.5. Cost Models for Parallel Sort -- 4.5.1. Cost Models for Serial External Merge-Sort -- 4.5.2. Cost Models for Parallel Merge-All Sort -- 4.5.3. Cost Models for Parallel Binary-Merge Sort -- 4.5.4. Cost Models for Parallel Redistribution Binary-Merge Sort -- 4.5.5. Cost Models for Parallel Redistribution Merge-All Sort -- 4.5.6. Cost Models for Parallel Partitioned Sort -- 4.6. Cost Models for Parallel GroupBy -- 4.6.1. Cost Models for Parallel Two-Phase Method -- 4.6.2. Cost Models for Parallel Redistribution Method -- 4.7. Summary -- 4.8. Bibliographical Notes -- 4.9. Exercises -- 5. Parallel Join -- 5.1. Join Operations -- 5.2. Serial Join Algorithms -- 5.2.1. Nested-Loop Join Algorithm -- 5.2.2. Sort-Merge Join Algorithm -- 5.2.3. Hash-Based Join Algorithm -- 5.2.4. Comparison -- 5.3. Parallel Join Algorithms -- 5.3.1. Divide and Broadcast-Based Parallel Join Algorithms -- 5.3.2. Disjoint Partitioning-Based Parallel Join Algorithms -- 5.4. Cost Models -- 5.4.1. Cost Models for Divide and Broadcast -- 5.4.2. Cost Models for Disjoint Partitioning -- 5.4.3. Cost Models for Local Join -- 5.5. Parallel Join Optimization -- 5.5.1. Optimizing Main Memory -- 5.5.2. Load Balancing -- 5.6. Summary -- 5.7. Bibliographical Notes -- 5.8. Exercises.

Part III Advanced Parallel Query Processing -- 6. Parallel GroupBy-Join -- 6.1. Groupby-Join Queries -- 6.1.1. Groupby Before Join -- 6.1.2. Groupby After Join -- 6.2. Parallel Algorithms for Groupby-Before-Join Query Processing -- 6.2.1. Early Distribution Scheme -- 6.2.2. Early GroupBy with Partitioning Scheme -- 6.2.3. Early GroupBy with Replication Scheme -- 6.3. Parallel Algorithms for Groupby-After-Join Query Processing -- 6.3.1. Join Partitioning Scheme -- 6.3.2. GroupBy Partitioning Scheme -- 6.4. Cost Model Notations -- 6.5. Cost Model for Groupby-Before-Join Query Processing -- 6.5.1. Cost Models for the Early Distribution Scheme -- 6.5.2. Cost Models for the Early GroupBy with Partitioning Scheme -- 6.5.3. Cost Models for the Early GroupBy with Replication Scheme -- 6.6. Cost Model for "Groupby-After-Join" Query Processing -- 6.6.1. Cost Models for the Join Partitioning Scheme -- 6.6.2. Cost Models for the GroupBy Partitioning Scheme -- 6.7. Summary -- 6.8. Bibliographical Notes -- 6.9. Exercises -- 7. Parallel Indexing -- 7.1. Parallel Indexing-an Internal Perspective on Parallel Indexing Structures -- 7.2. Parallel Indexing Structures -- 7.2.1. Nonreplicated Indexing (NRI) Structures -- 7.2.2. Partially Replicated Indexing (PRI) Structures -- 7.2.3. Fully Replicated Indexing (FRI) Structures -- 7.3. Index Maintenance -- 7.3.1. Maintaining a Parallel Nonreplicated Index -- 7.3.2. Maintaining a Parallel Partially Replicated Index -- 7.3.3. Maintaining a Parallel Fully Replicated Index -- 7.3.4. Complexity Degree of Index Maintenance -- 7.4. Index Storage Analysis -- 7.4.1. Storage Cost Models for Uniprocessors -- 7.4.2. Storage Cost Models for Parallel Processors -- 7.5. Parallel Processing of Search Queries using Index -- 7.5.1. Parallel One-Index Search Query Processing -- 7.5.2. Parallel Multi-Index Search Query Processing.

7.6. Parallel Index Join Algorithms -- 7.6.1. Parallel One-Index Join -- 7.6.2. Parallel Two-Index Join -- 7.7. Comparative Analysis -- 7.7.1. Comparative Analysis of Parallel Search Index -- 7.7.2. Comparative Analysis of Parallel Index Join -- 7.8. Summary -- 7.9. Bibliographical Notes -- 7.10. Exercises -- 8. Parallel Universal Qualification-Collection Join Queries -- 8.1. Universal Quantification and Collection Join -- 8.2. Collection Types and Collection Join Queries -- 8.2.1. Collection-Equi Join Queries -- 8.2.2. Collection-Intersect Join Queries -- 8.2.3. Subcollection Join Queries -- 8.3. Parallel Algorithms for Collection Join Queries -- 8.4. Parallel Collection-Equi Join Algorithms -- 8.4.1. Disjoint Data Partitioning -- 8.4.2. Parallel Double Sort-Merge Collection-Equi Join Algorithm -- 8.4.3. Parallel Sort-Hash Collection-Equi Join Algorithm -- 8.4.4. Parallel Hash Collection-Equi Join Algorithm -- 8.5. Parallel Collection-Intersect Join Algorithms -- 8.5.1. Non-Disjoint Data Partitioning -- 8.5.2. Parallel Sort-Merge Nested-Loop Collection-Intersect Join Algorithm -- 8.5.3. Parallel Sort-Hash Collection-Intersect Join Algorithm -- 8.5.4. Parallel Hash Collection-Intersect Join Algorithm -- 8.6. Parallel Subcollection Join Algorithms -- 8.6.1. Data Partitioning -- 8.6.2. Parallel Sort-Merge Nested-Loop Subcollection Join Algorithm -- 8.6.3. Parallel Sort-Hash Subcollection Join Algorithm -- 8.6.4. Parallel Hash Subcollection Join Algorithm -- 8.7. Summary -- 8.8. Bibliographical Notes -- 8.9. Exercises -- 9. Parallel Query Scheduling and Optimization -- 9.1. Query Execution Plan -- 9.2. Subqueries Execution Scheduling Strategies -- 9.2.1. Serial Execution Among Subqueries -- 9.2.2. Parallel Execution Among Subqueries -- 9.3. Serial vs. Parallel Execution Scheduling -- 9.3.1. Nonskewed Subqueries -- 9.3.2. Skewed Subqueries.

9.3.3. Skewed and Nonskewed Subqueries -- 9.4. Scheduling Rules -- 9.5. Cluster Query Processing Model -- 9.5.1. Overview of Dynamic Query Processing -- 9.5.2. A Cluster Query Processing Architecture -- 9.5.3. Load Information Exchange -- 9.6. Dynamic Cluster Query Optimization -- 9.6.1. Correction -- 9.6.2. Migration -- 9.6.3. Partition -- 9.7. Other Approaches to Dynamic Query Optimization -- 9.8. Summary -- 9.9. Bibliographical Notes -- 9.10. Exercises -- Part IV Grid Databases -- 10. Transactions in Distributed and Grid Databases -- 10.1. Grid Database Challenges -- 10.2. Distributed Database Systems and Multidatabase Systems -- 10.2.1. Distributed Database Systems -- 10.2.2. Multidatabase Systems -- 10.3. Basic Definitions on Transaction Management -- 10.4. Acid Properties of Transactions -- 10.5. Transaction Management in Various Database Systems -- 10.5.1. Transaction Management in Centralized and Homogeneous Distributed Database Systems -- 10.5.2. Transaction Management in Heterogeneous Distributed Database Systems -- 10.6. Requirements in Grid Database Systems -- 10.7. Concurrency Control Protocols -- 10.8. Atomic Commit Protocols -- 10.8.1. Homogeneous Distributed Database Systems -- 10.8.2. Heterogeneous Distributed Database Systems -- 10.9. Replica Synchronization Protocols -- 10.9.1. Network Partitioning -- 10.9.2. Replica Synchronization Protocols -- 10.10. Summary -- 10.11. Bibliographical Notes -- 10.12. Exercises -- 11. Grid Concurrency Control -- 11.1. A Grid Database Environment -- 11.2. An Example -- 11.3. Grid Concurrency Control -- 11.3.1. Basic Functions Required by GCC -- 11.3.2. Grid Serializability Theorem -- 11.3.3. Grid Concurrency Control Protocol -- 11.3.4. Revisiting the Earlier Example -- 11.3.5. Comparison with Traditional Concurrency Control Protocols -- 11.4. Correctness of GCC Protocol.

11.5. Features of GCC Protocol.
Abstract:
David Taniar, PhD, lectures in information technology at Monash University, Australia. Dr. Taniar has published extensively in the field of high- performance parallel databases and is the Editor in Chief of the International Journal of Data Warehousing and Mining. Clement H. C. Leung, PhD, is Foundation Chair in Computer Science at Victoria University, Australia. Dr. Leung previously held the Established Chair in Computer Science at the University of London. Wenny Rahayu, PhD, is Associate Professor at La Trobe University, Australia, and actively works in the areas of database design and implementation, covering object-relational databases and Web databases. Sushant Goel, PhD, is a software consultant and holds a PhD in computer systems engineering from RMIT University, Australia. His research interests are in grid transaction management and software development processes, such as agile computing.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: