Programming of heterogeneous platforms requires thorough analysis of applications on their design stage for determining the best data and work decomposition between CPU and an accelerating hardware. In many cases the applications already exist in a form of conventional for CPU programming language like C++, the main problem is to determine which part of the application would leverage from being offloaded to an accelerating devise. Even bigger problem is to estimate, how much performance increase one should expect due to the accelerating in the particular heterogeneous platform. Each platform has its unique limitations that are affecting performance of offloaded compute tasks, e.g. data transfer tax, task initialization overhead, memory latency and bandwidth constrains. In order to take into account those constrains, software architects and developers need tooling for collecting right information and producing recommendations to make the best design and optimization decisions.
In this presentation we will introduce basics of the offload performance estimation analysis and the tool Offload Advisor which it intended to help with application design process. The Offload Advisor is an extended version of the Intel® Advisor, a code modernization, programming guidance, and performance estimation tool that supports OpenCL and SYCL/Data Parallel C++ languages on CPU and GPU. It provides codesign, performance modeling, analysis, and characterization features for industry-size applications written in C, C++, Fortran, and mixed Python*.
Offload Advisor analysis helps to determine which sections of a code can be offloaded to a GPU, accelerating the performance of a CPU-based application. It provides metrics and performance data such as projected speedup, a call tree showing offloaded and accelerated regions, identifies key bottlenecks (algorithmic, compute, caches, memory, throughput/latency), and more. It considers not only compute and memory limitations, but the time required to transfer data and the executed code on the target hardware.
Performance estimates provided by the tool are not limited to GPU only. It uses Accelerator Performance Models (APM) for modeling a target accelerator. Although, to the date APMs for Intel Gen architecture (integrated into CPU graphics module) and Intel Xe architecture (discrete GPU board) are available, it is extendable for future architectures.
The tool is flexible enough to accept for analysis applications which are already written on languages, like SYCL, dedicated for a heterogeneous platform but running on CPU. In this case the result of analysis would be the performance increase projection if executed on CPU+GPU.
In case an application is already designed for heterogeneous platform: written on OpenCL and execute computing tasks on iGPU, Intel Advisor proposes a GPU Roofline analysis. The GPU Roofline analysis helps estimate and visualize the actual performance of GPU kernels using benchmarks and hardware metric profiling against hardware-imposed performance ceilings, as well as determine the main limiting factor. With GPU profiling it collects OpenCL™ kernels timings and memory data, measures the hardware limitations and collects floating-point and integer operations data, similarly to Intel Advisor for CPU.
Offload Advisor is a new tool which is being actively developed along with development of new acceleration architectures at Intel.