PACKT (406)
Text Book 교재용원서 (674)
컴퓨터공학 (799)
컴퓨터 일반도서 (551)
전기,전자공학 (695)
기계공학 (188)
재료공학 (31)
에너지공학 (65)
의용공학 (38)
생명과학 (224)
물리학 (424)
지구과학 (74)
천문학 (38)
수학 (102)
통계학 (45)
경영학 (40)
산업공학 (12)
사회복지학 (5)
심리학 (247)
교육학 (1)
화학 (4)
기타 (62)
특가할인도서 (87)

> > 컴퓨터공학 > 분산시스템

이미지를 클릭하시면 큰 이미지를 보실 수 있습니다.
Programming Massively Parallel Processors: A Hands-on Approach 3rd Edition
출판사 : Morgan Kaufmann
저 자 : Kirk
ISBN : 9780128119860
발행일 : 2016-12
도서종류 : 외국도서
발행언어 : 영어
페이지수 : 576
판매가격 : 69,000원
판매여부 : 재고확인요망
주문수량 : [+]수량을 1개 늘입니다 [-]수량을 1개 줄입니다

My Wish List 에 저장하기
   Programming Massively Parallel Processors: A Hands-on Approach 3rd Edition 목차
Table of Contents
Dedication
Preface
Target Audience
How to Use the Book
Illinois–NVIDIA GPU Teaching Kit
Online Supplements
Acknowledgements
Chapter 1. Introduction
Abstract
1.1 Heterogeneous Parallel Computing
1.2 Architecture of a Modern GPU
1.3 Why More Speed or Parallelism?
1.4 Speeding Up Real Applications
1.5 Challenges in Parallel Programming
1.6 Parallel Programming Languages and Models
1.7 Overarching Goals
1.8 Organization of the Book
References
Chapter 2. Data parallel computing
Abstract
2.1 Data Parallelism
2.2 CUDA C Program Structure
2.3 A Vector Addition Kernel
2.4 Device Global Memory and Data Transfer
2.5 Kernel Functions and Threading
2.6 Kernel Launch
2.7 Summary
References
Chapter 3. Scalable parallel execution
Abstract
3.1 CUDA Thread Organization
3.2 Mapping Threads to Multidimensional Data
3.3 Image Blur: A More Complex Kernel
3.4 Synchronization and Transparent Scalability
3.5 Resource Assignment
3.6 Querying Device Properties
3.7 Thread Scheduling and Latency Tolerance
3.8 Summary
Chapter 4. Memory and data locality
Abstract
4.1 Importance of Memory Access Efficiency
4.2 Matrix Multiplication
4.3 CUDA Memory Types
4.4 Tiling for Reduced Memory Traffic
4.5 A Tiled Matrix Multiplication Kernel
4.6 Boundary Checks
4.7 Memory as a Limiting Factor to Parallelism
4.8 Summary
Chapter 5. Performance considerations
Abstract
5.1 Global Memory Bandwidth
5.2 More on Memory Parallelism
5.3 Warps and SIMD Hardware
5.4 Dynamic Partitioning of Resources
5.5 Thread Granularity
5.6 Summary
References
Chapter 6. Numerical considerations
Abstract
6.1 Floating-Point Data Representation
6.2 Representable Numbers
6.3 Special Bit Patterns and Precision in IEEE Format
6.4 Arithmetic Accuracy and Rounding
6.5 Algorithm Considerations
6.6 Linear Solvers and Numerical Stability
6.7 Summary
References
Chapter 7. Parallel patterns: convolution: An introduction to stencil computation
Abstract
7.1 Background
7.2 1D Parallel Convolution—A Basic Algorithm
7.3 Constant Memory and Caching
7.4 Tiled 1D Convolution with Halo Cells
7.5 A Simpler Tiled 1D Convolution—General Caching
7.6 Tiled 2D Convolution With Halo Cells
7.7 Summary
7.8 Exercises
Chapter 8. Parallel patterns: prefix sum: An introduction to work efficiency in parallel algorithms
Abstract
8.1 Background
8.2 A Simple Parallel Scan
8.3 Speed and Work Efficiency
8.4 A More Work-Efficient Parallel Scan
8.5 An Even More Work-Efficient Parallel Scan
8.6 Hierarchical Parallel Scan for Arbitrary-Length Inputs
8.7 Single-Pass Scan for Memory Access Efficiency
8.8 Summary
8.9 Exercises
References
Chapter 9. Parallel patterns—parallel histogram computation: An introduction to atomic operations and privatization
Abstract
9.1 Background
9.2 Use of Atomic Operations
9.3 Block versus Interleaved Partitioning
9.4 Latency versus Throughput of Atomic Operations
9.5 Atomic Operation in Cache Memory
9.6 Privatization
9.7 Aggregation
9.8 Summary
Reference
Chapter 10. Parallel patterns: sparse matrix computation: An introduction to data compression and regularization
Abstract
10.1 Background
10.2 Parallel SpMV Using CSR
10.3 Padding and Transposition
10.4 Using a Hybrid Approach to Regulate Padding
10.5 Sorting and Partitioning for Regularization
10.6 Summary
References
Chapter 11. Parallel patterns: merge sort: An introduction to tiling with dynamic input data identification
Abstract
11.1 Background
11.2 A Sequential Merge Algorithm
11.3 A Parallelization Approach
11.4 Co-Rank Function Implementation
11.5 A Basic Parallel Merge Kernel
11.6 A Tiled Merge Kernel
11.7 A Circular-Buffer Merge Kernel
11.8 Summary
Reference
Chapter 12. Parallel patterns: graph search
Abstract
12.1 Background
12.2 Breadth-First Search
12.3 A Sequential BFS Function
12.4 A Parallel BFS Function
12.5 Optimizations
12.6 Summary
References
Chapter 13. CUDA dynamic parallelism
Abstract
13.1 Background
13.2 Dynamic Parallelism Overview
13.3 A Simple Example
13.4 Memory Data Visibility
13.5 Configurations and Memory Management
13.6 Synchronization, Streams, and Events
13.7 A More Complex Example
13.8 A Recursive Example
13.9 Summary
References
A13.1 Code Appendix
Chapter 14. Application case study—non-Cartesian magnetic resonance imaging: An introduction to statistical estimation methods
Abstract
14.1 Background
14.2 Iterative Reconstruction
14.3 Computing FHD
14.4 Final Evaluation
References
Chapter 15. Application case study—molecular visualization and analysis
Abstract
15.1 Background
15.2 A Simple Kernel Implementation
15.3 Thread Granularity Adjustment
15.4 Memory Coalescing
15.5 Summary
References
Chapter 16. Application case study—machine learning
Abstract
16.1 Background
16.2 Convolutional Neural Networks
16.3 Convolutional Layer: A Basic CUDA Implementation of Forward Propagation
16.4 Reduction of Convolutional Layer to Matrix Multiplication
16.5 cuDNN Library
References
Chapter 17. Parallel programming and computational thinking
Abstract
17.1 Goals of Parallel Computing
17.2 Problem Decomposition
17.3 Algorithm Selection
17.4 Computational Thinking
17.5 Single Program, Multiple Data, Shared Memory and Locality
17.6 Strategies for Computational Thinking
17.7 A Hypothetical Example: Sodium Map of the Brain
17.8 Summary
References
Chapter 18. Programming a heterogeneous computing cluster
Abstract
18.1 Background
18.2 A Running Example
18.3 Message Passing Interface Basics
18.4 Message Passing Interface Point-to-Point Communication
18.5 Overlapping Computation and Communication
18.6 Message Passing Interface Collective Communication
18.7 CUDA-Aware Message Passing Interface
18.8 Summary
Reference
Chapter 19. Parallel programming with OpenACC
Abstract
19.1 The OpenACC Execution Model
19.2 OpenACC Directive Format
19.3 OpenACC by Example
19.4 Comparing OpenACC and CUDA
19.5 Interoperability with CUDA and Libraries
19.6 The Future of OpenACC
Chapter 20. More on CUDA and graphics processing unit computing
Abstract
20.1 Model of Host/Device Interaction
20.2 Kernel Execution Control
20.3 Memory Bandwidth and Compute Throughput
20.4 Programming Environment
20.5 Future Outlook
References
Chapter 21. Conclusion and outlook
Abstract
21.1 Goals Revisited
21.2 Future Outlook
Appendix A. An introduction to OpenCL
A.1 Background
A.2 Data Parallelism Model
A.3 Device Architecture
A.4 Kernel Functions
A.5 Device Management and Kernel Launch
A.6 Electrostatic Potential Map in OpenCL
A.7 Summary
Appendix B. THRUST: a productivity-oriented library for CUDA
B.1 Background
B.2 Motivation
B.3 Basic Thrust Features
B.4 Generic Programming
B.5 Benefits of Abstraction
B.6 Best Practices
Appendix C. CUDA Fortran
C.1 CUDA Fortran and CUDA C Differences
C.2 A First CUDA Fortran Program
C.3 Multidimensional Array in CUDA Fortran
C.4 Overloading Host/Device Routines with Generic Interfaces
C.5 Calling CUDA C via ISO_C_Binding
C.6 Kernel Loop Directives and Reduction Operations
C.7 Dynamic Shared Memory
C.8 Asynchronous Data Transfers
C.9 Compilation and Profiling
C.10 Calling Thrust from CUDA Fortran
Appendix D. An introduction to C++ AMP
D.1 Core C++ AMP Features
D.2 Details of the C++ AMP Execution Model
D.3 Managing Accelerators
D.4 Tiled Execution
D.5 C++ AMP Graphics Features
D.6 Summary
Reference
Index
   도서 상세설명   


Programming Massively Parallel Processors: A Hands-on Approach, Third Edition shows both student and professional alike the basic concepts of parallel programming and GPU architecture, exploring, in detail, various techniques for constructing parallel programs.

Case studies demonstrate the development process, detailing computational thinking and ending with effective and efficient parallel programs. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in-depth.

For this new edition, the authors have updated their coverage of CUDA, including coverage of newer libraries, such as CuDNN, moved content that has become less important to appendices, added two new chapters on parallel patterns, and updated case studies to reflect current industry practices.

Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing
Utilizes CUDA version 7.5, NVIDIA's software development tool created specifically for massively parallel environments
Contains new and updated case studies
Includes coverage of newer libraries, such as CuDNN for Deep Learning

  교육용 보조자료   
작성된 교육용 보조자료가 없습니다.