Cuda Toolkit 12.6 Instant
The bundled Nsight Systems 2024.5 is excellent. The new "Kernel Fusion Candidate" detection helps identify naive kernel launches that can be manually fused. The memory pool allocator in the CUDA Driver API is also less chatty with the OS, reducing allocation overhead by ~15% in dynamic shape workloads.
Finally, official support for Clang 18 and GCC 13.2 . This is a lifesaver for developers using modern C++ features (C++20/23) in scientific computing. The NVCC frontend feels noticeably more robust with complex template metaprogramming. cuda toolkit 12.6
Rating: 4.5/5