Cover image for GPU Computing Gems Emerald Edition.
GPU Computing Gems Emerald Edition.
Title:
GPU Computing Gems Emerald Edition.
Author:
Hwu, Wen-mei W.
ISBN:
9780123849892
Personal Author:
Physical Description:
1 online resource (889 pages)
Series:
Applications of GPU Computing Series
Contents:
Front Cover -- GPU Computing Gems -- Copyright -- Table of Contents -- Editors, Reviewers, and Authors -- Introduction -- Section 1: Scientific Simulation -- Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals -- 1.1. Introduction, Problem Statement, and Context -- 1.2. Core Method -- 1.3. Algorithms, Implementations, and Evaluations -- 1.4. Final Evaluation -- 1.5. Future Directions -- References -- Chapter 2. Large-Scale Chemical Informatics on GPUs -- 2.1. Introduction, Problem Statement, and Context -- 2.2. Core Methods -- 2.3. Gaussian Shape Overlay: Parallelization and Arithmetic Optimization -- 2.4. LINGO: Algorithmic Transformation and Memory Optimization -- 2.5. Final Evaluation -- 2.6. Future Directions -- Acknowledgments -- References -- Chapter 3. Dynamical Quadrature Grids: Applications in Density Functional Calculations -- 3.1. Introduction -- 3.2. Core Method -- 3.3. Implementation -- 3.4. Performance Improvement -- 3.5. Future Work -- References -- Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs -- 4.1. Introduction, Problem Statement, and Context -- 4.2. Core Method -- 4.3. Algorithms, Implementations, and Evaluations -- 4.4. Final Evaluation -- 4.5. Future Directions -- References -- Chapter 5. Quantum Chemistry: Propagation of Electronic Structure on a GPU -- 5.1. Problem Statement -- 5.2. Core Technology and Algorithm -- 5.3. The Key Insight on the Implementation-the Choice of Building Blocks -- 5.4. Final Evaluation and Benefits -- 5.5. Conclusions and Future Directions -- Acknowledgments -- References -- Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm -- 6.1. Introduction, Problem Statement, and Context -- 6.2. Core Methods -- 6.3. Algorithms and Implementations -- 6.4. Evaluation and Validation of Results, Total Benefits, and Limitations.

6.5. Future Directions -- Acknowledgments -- References -- Chapter 7. Leveraging the Untapped Computation Power of GPUs: Fast Spectral Synthesis Using Texture Interpolation -- 7.1. Background and Problem Statement -- 7.2. Flux Calculation and Aggregation -- 7.3. The GRASSY Platform -- 7.4. Initial Testing -- 7.5. Impact and Future Directions -- Acknowledgments -- References -- Chapter 8. Black Hole Simulations with CUDA -- 8.1. Introduction -- 8.2. The Post-Newtonian Approximation -- 8.3. Numerical Algorithm -- 8.4. GPU Implementation -- 8.5. Performance Results -- 8.6. GPU Supercomputing Clusters -- 8.7. Statistical Results for Black Hole Inspirals -- 8.8. Conclusion -- Acknowledgments -- References -- Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA -- 9.1. Introduction -- 9.2. Fast N-Body Simulation -- 9.3. CUDA Implementation of the Fast N-Body Algorithms -- 9.4. Improvements of Performance -- 9.5. Detailed Description of the GPU Kernels -- 9.6. Overview of Advanced Techniques -- 9.7. Conclusions -- References -- Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures -- 10.1. Introduction, Problem Statement, and Context -- 10.2. Core Method -- 10.3. Algorithms, Implementations, and Evaluations -- 10.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations -- 10.5. Conclusions and Future Directions -- References -- Section 2: Life Sciences -- Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm -- 11.1. Introduction, Problem Statement, and Context -- 11.2. Core Method -- 11.3. CUDA Implementation of the SW Algorithm for Identification of Homologous Proteins -- 11.4. Discussion -- 11.5. Final Evaluation -- References -- Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching.

12.1. Introduction, Problem Statement, and Context -- 12.2. Core Methods -- 12.3. Algorithms, Implementations, and Evaluations -- 12.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations -- 12.5. Future Directions -- References -- Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching -- 13.1. Introduction, Problem Statement, and Context -- 13.2. Core Method -- 13.3. Algorithms, Implementations, and Evaluations -- 13.4. Final Evaluation -- 13.5. Future Direction -- Acknowledgments -- Appendix -- References -- Chapter 14. GPU Accelerated RNA Folding Algorithm -- 14.1. Problem Statement -- 14.2. Core Method -- 14.3. Algorithms, Implementations, and Evaluations -- 14.4. Final Evaluation -- 14.5. Future Directions -- References -- Chapter 15. Temporal Data Mining for Neuroscience -- 15.1. Introduction -- 15.2. Core Methodology -- 15.3. GPU Parallelization: Algorithms and Implementations -- 15.4. Experimental Results -- 15.5. Discussion -- References -- Section 3: Statistical Modeling -- Chapter 16. Parallelization Techniques for Random Number Generators -- 16.1. Introduction -- 16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a -- 16.3. Sobol Generator -- 16.4. Mersenne Twister MT19937 -- 16.5. Performance Benchmarks -- Acknowledgments -- References -- Chapter 17. Monte Carlo Photon Transport on the GPU -- 17.1. Physics of Photon Transport -- 17.2. Photon Transport on the GPU -- 17.3. The Complete System -- 17.4. Results and Evaluation -- 17.5. Future Directions -- References -- Chapter 18. High-Performance Iterated Function Systems -- 18.1. Problem Statement and Mathematical Background -- 18.2. Core Technology -- 18.3. Implementation -- 18.4. Final Evaluation -- 18.5. Conclusion -- References -- Section 4: Emerging Data-Intensive Applications -- Chapter 19. Large-Scale Machine Learning -- 19.1. Introduction.

19.2. Core Technology -- 19.3. GPU Algorithm and Implementation -- 19.4. Improvements of Performance -- 19.5. Conclusions and Future Work -- Acknowledgments -- References -- Chapter 20. Multiclass Support Vector Machine -- 20.1. Introduction, Problem Statement, and Context -- 20.2. Core Method -- 20.3. Algorithms, Implementations, and Evaluations -- 20.4. Final Evaluation -- 20.5. Future Direction -- References -- Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA -- 21.1. Introduction, Problem Statement, and Context -- 21.2. Final Evaluation and Validation of Results -- 21.3. Conclusions, Benefits and Limitations, and Future Work -- References -- Chapter 22. GPU-Accelerated Ant Colony Optimization -- 22.1. Introduction, Problem Statement, and Context -- 22.2. Core Method -- 22.3. Algorithms, Implementations, and Evaluations -- 22.4. Final Evaluation -- 22.5. Future Direction -- Acknowledgments -- References -- Section 5: Electronic Design Automation -- Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs -- 23.1. Introduction -- 23.2. Simulator Overview -- 23.3. Compilation and Simulation -- 23.4. Experimental Results -- 23.5. Future Directions -- Related Work -- References -- Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization -- 24.1. Introduction, Problem Statement, and Context -- 24.2. Core Method -- 24.3. Algorithms, Implementations, and Evaluations -- 24.4. Final Evaluation -- 24.5. Future Direction -- References -- Section 6: Ray Tracing and Rendering -- Chapter 25. Lattice Boltzmann Lighting Models -- 25.1. Introduction, Problem Statement, and Context -- 25.2. Core Methods -- 25.3. Algorithms, Implementation, and Evaluation -- 25.4. Final Evaluation -- 25.5. Future Directions -- 25.6. Derivation of the Diffusion Equation -- Acknowledgments -- References.

Chapter 26. Path Regeneration for Random Walks -- 26.1. Introduction -- 26.2. Path Tracing as Case Study -- 26.3. Random Walks in Path Tracing -- 26.4. Implementation Details -- 26.5. Results -- 26.6. Discussion -- Acknowledgments -- References -- Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation -- 27.1. System Overview -- 27.2. Background -- 27.3. Core Technology and Algorithms -- 27.4. Future Directions -- Acknowledgments -- References -- Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency -- 28.1. Introduction, Problem Statement, and Context -- 28.2. Core Method -- 28.3. Algorithms, Implementations, and Evaluations -- 28.4. Final Evaluation -- 28.5. Future Direction -- References -- Section 7: Computer Vision -- Chapter 29. Fast Graph Cuts for Computer Vision -- 29.1. Introduction, Problem Statement, and Context -- 29.2. Core Method -- 29.3. Algorithms, Implementations, and Evaluations -- 29.4. Final evaluation and validation of results -- 29.5. Multilabel Graph Cuts -- References -- Chapter 30. Visual Saliency Model on Multi-GPU -- 30.1. Introduction -- 30.2. Visual Saliency Model -- 30.3. GPU Implementation -- 30.4. Results -- 30.5. Conclusion -- References -- Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows -- 31.1. Introduction, Problem Statement, and Context -- 31.2. Core Method -- References -- Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU -- 32.1. Introduction -- 32.2. Methods -- 32.3. Implementation -- 32.4. Results and Discussion -- 32.5. Conclusion and Future Work -- References -- Chapter 33. Haar Classifiers for Object Detection with CUDA -- 33.1. Introduction -- 33.2. Viola-Jones Object Detection Retrospective -- 33.3. Object Detection Pipeline with NVIDIA CUDA.

33.4. Benchmarking and Implementation Details.
Abstract:
"...the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk." -Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010 Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines. GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including: Black hole simulations with CUDA GPU-accelerated computation and interactive display of molecular orbitals Temporal data mining for neuroscience GPU -based parallelization for fast circuit optimization Fast graph cuts for computer vision Real-time stereo on GPGPU using progressive multi-resolution adaptive windows GPU image demosaicing Tomographic image reconstruction from unordered lines with CUDA Medical image processing using GPU -accelerated ITK image filters 41 more chapters of innovative GPU computing ideas, written to be accessible to researchers from any domain GPU Computing Gems: Emerald Edition is the first volume in Morgan Kaufmann's Applications of GPU Computing Series, offering the latest insights and research in computer vision, electronic design automation, emerging data-intensive applications, life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, and video / image processing. Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution Offers insights and ideas as well as practical "hands-on" skills you can immediately

put to use.
Local Note:
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2017. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Electronic Access:
Click to View
Holds: Copies: