
Heterogeneous Computing with OpenCL : Revised OpenCL 1.2 Edition.
Title:
Heterogeneous Computing with OpenCL : Revised OpenCL 1.2 Edition.
Author:
Gaster, Benedict.
ISBN:
9780124055209
Personal Author:
Edition:
2nd ed.
Physical Description:
1 online resource (309 pages)
Contents:
Front Cover -- HeterogeneousComputing withOpenCL -- Copyright -- Contents -- Foreword to the RevisedOpenCL 1.2 Edition -- Foreword to the First Edition -- Preface -- Our Heterogeneous World -- OpenCL -- This Text -- Acknowledgments -- About the Authors -- Chapter 1: Introduction to Parallel Programming -- Introduction -- OpenCL -- The Goals of This Book -- Thinking Parallel -- Concurrency and Parallel Programming Models -- Threads and Shared Memory -- Message-Passing Communication -- Different Grains of Parallelism -- Data Sharing and Synchronization -- Structure -- Reference -- Further Reading and Relevant Websites -- Chapter 2: Introduction to OpenCL -- Introduction -- The OpenCL Standard -- The OpenCL Specification -- Kernels and the OpenCL Execution Model -- Platform and Devices -- Host-Device Interaction -- The Execution Environment -- Contexts -- Command Queues -- Events -- Memory Objects -- Buffers -- Images -- Flush and Finish -- Creating an OpenCL Program Object -- The OpenCL Kernel -- Memory Model -- Writing Kernels -- Full Source Code Example for Vector Addition -- Vector Addition with C++ Wrapper -- Summary -- Reference -- Chapter 3: OpenCL Device Architectures -- Introduction -- Hardware trade-offs -- Performance Increase by Frequency, and Its Limitations -- Superscalar Execution -- VLIW -- SIMD and Vector Processing -- Hardware Multithreading -- Multi-Core Architectures -- Integration: Systems-on-Chip and the APU -- Cache Hierarchies and Memory Systems -- The architectural design space -- CPU Designs -- Low-Power CPUs -- Mainstream Desktop CPUs -- Intel Itanium 2 -- Niagara -- GPU Architectures -- Handheld GPUs -- At the High End: AMD Radeon HD7970 and NVIDIA GTX580 -- APU and APU-Like Designs -- Summary -- References -- Chapter 4: Basic OpenCL Examples -- Introduction -- Example Applications -- Simple Matrix Multiplication Example.
Step 1: Set Up Environment -- Step 2: Declare Buffers and Move Data -- Step 3: Runtime Kernel Compilation -- Step 4: Run the Program -- Step 5: Return Results to Host -- Image Rotation Example -- Step 1: Set Up Environment -- Step 2: Declare Buffers and Move Data -- Step 3: Runtime Kernel Compilation -- Step 4: Run the Program -- Step 5: Read Result Back to Host -- Image Convolution Example -- Step 1: Create Image and Buffer Objects -- Step 2: Write the Input Data -- Step 3: Create Sampler Object -- Step 4: Compile and Execute the Kernel -- Step 5: Read the Result -- The Convolution Kernel -- Compiling OpenCL Host Applications -- Summary -- Chapter 5: Understanding OpenCL's Concurrency and Execution Model -- Introduction -- Kernels, Work-Items, Workgroups, and the Execution Domain -- OpenCL Synchronization: Kernels, Fences, and Barriers -- Queuing and Global Synchronization -- Memory Consistency in OpenCL -- Events -- Command Queues to Multiple Devices -- Event Uses beyond Synchronization -- User Events -- Event Callbacks -- Native Kernels -- Command Barriers and Markers -- The Host-Side Memory Model -- Buffers -- Manipulating Buffer Objects -- Images -- The Device-Side Memory Model -- Device-Side Relaxed Consistency -- Global Memory -- Local Memory -- Constant Memory -- Private Memory -- Summary -- Chapter 6: Dissecting a CPU/GPU OpenCL Implementation -- Introduction -- OpenCL on an AMD Bulldozer CPU -- OpenCL on the AMD Radeon HD7970 GPU -- Threading and the Memory System -- Instruction Execution on the HD7970 Architecture -- The Shift from VLIW Execution -- Resource Allocation -- Memory Performance Considerations in OpenCL -- OpenCL Global Memory -- Local Memory as a Software-Managed Cache -- Summary -- References -- Chapter 7: Data Management -- Memory management -- Data transfer in a discrete environment -- Optimizations -- Zero-Copy Buffers.
Data placement in a shared-memory environment -- Local Memory -- Cacheable System Memory -- Uncached System Memory -- Example application-work group reduction -- Using a Discrete GPU Device -- Case1 Using device buffers -- Case2 Using pinned staging buffers -- Case3 Using zero-copy buffers -- Case4 Combination -- Using an APU -- Case1 Using local memory buffers -- Case2 Using pinned staging buffers -- Case3 Using zero-copy buffers -- References -- Chapter 8: OpenCL Case Study -- Introduction -- Convolution Kernel -- Selecting Workgroup Sizes -- Caching Data to Local Memory -- Aligning for Memory Accesses -- Improving Efficiency with Vector Reads -- Performing the Convolution -- Improving Performance with Loop Unrolling -- Conclusions -- Code Listings -- Host Code -- Kernel Code -- Reference -- Chapter 9: OpenCL Case Study -- Introduction -- Choosing the Number of Workgroups -- Choosing the Optimal Workgroup Size -- Optimizing Global Memory Data Access Patterns -- Using Atomics to Perform Local Histogram -- Optimizing Local Memory Access -- Local Histogram Reduction -- The Global Reduction -- Full Kernel Code -- Performance and Summary -- Chapter 10: OpenCL Case Study -- Introduction -- Overview of the Computation -- GPU Implementation -- Buffer Creation -- Building the Acceleration Structure -- Computing Collisions -- Integration -- CPU Implementation -- Load Balancing -- Performance and Summary -- Kernel for Uniform Grid Creation -- Kernels for Simulation -- Chapter 11: OpenCL Extensions -- Introduction -- Overview of Extension Mechanism -- Device Fission -- Double Precision -- References -- Chapter 12: Foreign Lands -- Introduction -- Beyond C and C++ -- Haskell OpenCL -- Module Structure -- Environments -- Reference Counting -- Platform and Devices -- The Execution Environment -- Contexts -- Command Queues -- Buffers.
Creating an OpenCL Program Object -- The OpenCL Kernel -- Full Source Code Example for Vector Addition -- Summary -- References -- Chapter 13: OpenCL Profiling and Debugging -- Introduction -- Profiling with Events -- AMD Accelerated Parallel Processing Profiler -- Collecting OpenCL Application Trace -- Summary Pages View -- API Trace View -- Collecting OpenCL GPU Kernel Performance Counters -- AMD Accelerated Parallel Processing KernelAnalyzer -- Walking through the AMD APP Profiler -- Starting the AMD APP Profiler -- Using the Application Trace to Find the Application Bottleneck -- Using the GPU Performance Counters to Find the Bottleneck in the Kernel -- Debugging OpenCL Applications -- Overview of gDEBugger -- Debugging Parallel OpenCL Applications with gDEBugger -- API-Level Debugging -- Kernel Debugging -- AMD Printf Extension -- Conclusion -- Chapter Chapter 14: Performance Optimization of an Image Analysis Application -- Introduction -- Description of the algorithm -- Migrating multithreaded CPU implementation to OpenCL -- Hotspot Analysis -- Kernel Development and Static Analysis -- Performance optimization -- Kernel Occupancy -- Kernel Occupancy for AMD Radeon HD5000/6000 Series -- Kernel Occupancy for AMD Radeon HD 7000 -- Impact of Workgroup Size -- Impact of VGPR and LDS -- Power and performance analysis -- Conclusion -- References.
Abstract:
Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials. Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications. Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more. Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architectures Addresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Genre:
Electronic Access:
Click to View