Jianbin Fang, Delft University of Technology

Download (PDF, 950KB)

Enabling the usage of “local memory” has been perceived as synonym to performance improvement for GPUs and performance penalty for CPUs. However, with the increase in architecture diversity, none of these can be taken for granted: the mapping of local memory to hardware, as well as its interaction with caches, may lead to divergent performance behaviour for applications using local memory. This is especially important when, using OpenCL’s portability, applications are migrated from GPUs to CPUs.

In this work, we present a systematic evaluation of the impact local memory has on the performance of different classes of applications, as represented by their different access patterns (MAPs). We further show how these results can be used to derive performance estimates. Finally, we discuss the overall impact of local memory for different classes of platforms.