TotalView for OpenPOWER, CUDA, and OpenMP
This talk from ScicomP 2015 helps you debug numerical simulations better by reviewing best debugging practices for CUDA and OpenACC-accelerated applications and discussing the development of OpenMP-specific tracing and debugging interfaces (including the OMPD interface for performance analysis).