Unlock Intel GPUs for High Performance Compute, Media and Computer Vision Capabilities with Intel OpenCL Extensions

   


Jeff Mcallister, Biju George, Adam Herr and Ben Ashbaugh from Intel

Tutorial Overview

The keys to unlock the full performance potential of Intel GPUs for emerging workloads in general compute, media, computer vision, and machine learning are in the rich suite of Intel OpenCL extensions. These give developers direct access to unique Intel hardware capabilities, which until now have been difficult to master.
This tutorial builds step by step with multiple examples, including:

  • How to write high performance general compute applications based on the core concept of OpenCL subgroups.
  • How to use additional subgroup operations described in the Intel subgroups and media block read/write extensions.
  • Then using the framework of subgroups, we explain the device-side motion estimation extension which leverages the unique Intel GPU media sampler to accelerate motion estimation operations from OpenCL kernels.
  • Finally we explain the Video Enhancement (VEBOX) extension, which is an OpenCL host level API extension to leverage a powerful media fixed function unit to accelerate many frame level video enchancement operations.

Putting these concepts together provides a recipe to achieve disruptive performance gains and improve quality/accuracy to meet the difficult demands of today’s marketplace.

OpenCL Extension Background

Historically, specialized hardware capabilities such as native support for explicit SIMD programming, high performance block reads/writes, media sampler and the VEBox engine in Intel GPUs were only available to internal developers. The external interfaces only enabled a limited range of capabilities through the high level DXAPI, VAAPI or Media SDK interfaces. Over the years Intel has added many extensions to give developers access to these hidden capabilities. These have become quite rich and full featured, though lack of documentation/examples has reduced utilization. Developers of a wide range of applications will benefit from the approach presented in this tutorial to understand the structural interconnections of the extension architecture to make fuller use of Intel GPU hardware.

For reference, here is a list of Intel extensions with short summaries:

This presentation will pull together the deep understanding of Intel OpenCL architects, the world’s top experts on how Intel OpenCL extensions are architected to make hardware capabilities accessible, into one simplified flow of ideas+examples to give developers the information they need to know to make full use of Intel GPU hardware via OpenCL.