Research Paper

Presented by Dr John Cavazos, University of Delaware
Until 2010 less than ten systems from the TOP500 contained hardware accelerators. Today more than fifty systems contain hardware accelerators. They account for 35% of the computing capability of all the systems in the TOP500. This trend began with NVIDIA’s CUDA (Compute Unified Device Architecture). With CUDA developers have been able to leverage the computation capabilities of GPUs (Graphic Processing Units).

In 2011, Intel introduced its own accelerator the XeonPhi. The XeonPhi is based on Intel’s Many Integrated Core (MIC) architecture. Following the introduction of the XeonPhi, Open Compute Language (OpenCL) is gaining popularity. While OpenCL is similar to CUDA, it is not restricted to NVIDIA accelerators. It has been implemented by many vendors, including Intel (CPU, GPU, XeonPhi), AMD (APU), NVIDIA (GPU), and Altera (FPGA).

However, both CUDA and OpenCL require expert programmers and important modifications of source code to achieve peak performances. Languages based on compiler directives are easier to use. OpenMP (Open Multi-Processors) is one such example. The same technique is used by OpenACC, a directive based extension for C, C++, and FORTRAN. Whereas OpenMP targets shared-memory systems, OpenACC targets hardware accelerators.
In this paper we present the generation of OpenCL C kernels from OpenACC annotated codes. This kernel generator is one module of our open-source OpenACC Compiler. We
developed it using ROSE, a source-to-source compiler. First, we will look at both OpenACC and OpenCL execution models. Next, we will present our method which leverages the inexact mapping of OpenACC to OpenCL to generate multiple kernels. In our experiments, we generated OpenCL C kernels for some linear algebra computations. We will
execute these kernels on different accelerators and discuss the performances of different generated kernels over different accelerators.