Session Description
Title: Advanced CUDA Training
Track: Technical Conference
Schedule: August 26, 2008 04:00 PM - 06:00 PM
SJCC, Room J1&4

- and -

August 27, 2008 11:00 AM - 01:00 PM
SJCC, Room J3
Learn to optimize CUDA code and fully exploit the power of NVIDIA's new double-precision, teraflop GPU. Optimized code can run over 100x faster on this chip than on a CPU, providing a workstation the power of a compute cluster, and a compute cluster the power of a supercomputer.

In this session, we will look at the architecture of the T10 GPU, then proceed with specific examples of optimizing particles simulations and finite difference methods on the architecture. Optimization topics will include: general rules for optimization, measuring performance with the visual profiler, optimizing host to device memory transfers, making efficient use of shared memory, coalesced memory access, optimizing execution configuration, and maximizing arithmetic intensity.



Speakers
Brent Oster (NVIDIA)
 
Greg Ruetsch (NVIDIA)
 

print close