CUDA programming can be difficult because of the unfamiliar hardware architecture. CUDA is so new that not many people have enough experience to say what is the best approach to write performant code. In this post, we will revisit Vasily Volkov’s talk on Better Performance at Lower Occupancy to show the importance of instruction-level parallelism (ILP).
Since the last NumbaPro blog post, there has been significant enhancement to the CUDA-Python compiler. For better GPU support, NumbaPro 0.12 uses CUDA 5.5, which contains the first official release of NVVM. The new NVVM includes the libdevice math function library, allowing NumbaPro to provide the same math function support as CUDA-C. We will highlight a few important changes into this post.
The GPU revolution of the past few years provides inexpensive access to hundreds of specialized computational units in a single silicon die. The challenge is efficiently accessing these and developing or adapting algorithms that can harness their power. Here, I’ll show how the NumbaPro module from Anaconda Accelerate can be used to parallelize a standard option pricing algorithm onto a GPU, giving a 14x speedup, using only a few extra lines of code.
Python is a great language for writing experimental code quickly. With PyGame, one can add graphics to visualize the experiments effortlessly. I always enjoy using PyGame for rendering simple physics simulation in realtime. However, as the simulation scales, Python can
In Peter Wang’s SciPy Evangelism 101 talk he discussed “The Truth About Good Programmers.” He says good programmers are