Propel with OpenCL – A Deep Dive Workshop to Create, Debug, Analyze and Optimize OpenCL Applications using Intel Tools

Anita Banerjee and Uri Levy, Intel.

This workshop and tutorial will allow developers to learn underlying architecture relevant to running OpenCL applications on Intel Processor Graphics and use Intel tools such as Intel OpenCL Code Builder and Intel VTune Amplifier to create, develop and analyze OpenCL applications for Intel Processor Graphics. Developers will also learn how to achieve better performance of their OpenCL programs by using detailed optimization techniques and best known methods to address issues. All the steps will be demonstrated with supportive examples and real data. Developers will also get to explore ways to utilize the new concepts of OpenCL 2.0 along with the power of the tools to create more complex and better performing OpenCL programs. During the session, developers will get a chance to walk through every stage of the lifecycle, from creating their first OpenCL application, experiencing Intel OpenCL Code Builder and Intel VTune Amplifier debugging, to optimizing capabilities in complete action. Topics that will be covered include:

  • Intel Processor Graphics Architecture Overview.
  • Tools workshop to create, debug, analyze and apply optimization techniques for OpenCL programs using:– Intel OpenCL Code Builder
    — Intel VTune Amplifier
  • Optimization guides and best known methods with supportive samples and data.
  • Tuesday, 12 May
  • 09:30
    Duration: 3 hours
  • Track #1 Room
    Li Ka Shing Center, Stanford

Khronos SYCL for OpenCL

Ronan Keryell (AMD), Ruyman Reyes (Codeplay) and Lee Howes (Qualcomm).

SYCL ([sɪkəl] as in sickle) is a royalty-free, cross-platform C++ abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL, while adding the ease-of-use and flexibility of modern C++11.  For example, SYCL enables single source development where C++ template functions can contain both host and device code to construct complex algorithms that use OpenCL acceleration, and then re-use them throughout their source code on different types of data.

This half-day tutorial will provide an opportunity to learn about the latest developments in SYCL from leading members of the Khronos SYCL subgroup. Tutorial speakers include the SYCL specification editor; and the developers of two prototype SYCL implementations.

Tutorial Program

  • An introduction to SYCL for OpenCL
    Lee Howes, Qualcomm
  • Modern C++, heterogeneous computing and SYCL for OpenCL
    Ronan Keryell, AMD
  • SYCL for Parallel STL
    Ruyman Reyes, Codeplay
  • triSYCL: experiments around SYCL with an open-source implementation
    Ronan Keryell, AMD
  • Hands on SYCL using Codeplay’s SYCL implementation
    Ruyman Reyes and Maria Rovatsou, Codeplay

Attendees of the last session are encouraged to install the open-source CPU-only implementation of SYCL and code along on laptop/tablet.

  • Tuesday, 12 May
  • 09:30
    Duration: 3 hours
  • Track #2 Room
    Li Ka Shing Center, Stanford

A Framework for Visualization of OpenCL Application Execution

Amir Kavyan Ziabari, Rafael Ubal Tena, Dana Schaa and David Kaeli

Evaluating parallel and heterogeneous programs written in OpenCL can be challenging. Commonly, simulators can be used to aid the programmer in this regard. One of the fundamental requirements of any simulator is to provide traces, reports, and debugging information in a coherent and unambiguous format. Although these traces or reports contain a lot of detailed information about the logical and physical transactions within a simulated structure, they are usually extremely large and hard to analyze. What is needed is an appropriate visualization tool to accompany the simulator to make OpenCL execution process easier to understand and analyze.

In this tutorial, we present M2S-Visual interactive cycle-by-cycle trace-driven visualization tool, a complimentary addition to Multi2sim (M2S). M2S is an established simulator, designed with an emphasis on running OpenCL applications without any source code modifications. The simulation of a complete OpenCL application occurs seamlessly by launching vendor-compliant host and device binaries. Multi2sim GPU emulator provides traces of Intel x86 CPU and AMD Southern-Island (as well as AMD Evergreen) GPU instructions, and the detailed simulator tracks execution times and state of architectural components in both host and device. M2S-Visual complements the simulator by providing the visual representation of running instructions and the state of the architectural components, together through a user-friendly GUI.

During the execution of an OpenCL application, M2S-Visual captures and represents the state of CPU and GPU software entities (i.e. contexts, work-groups, wavefronts, and work-items), memory entities (i.e., accesses, sharers, owners), and network entities (i.e. messages and packets) , along with the state of CPU and GPU hardware resources (i.e. cores and compute units), memory hierarchy (i.e., L1 cache, L2 cache and the main memory), and network resources (i.e., nodes, buses, links and buffers).

We designed the M2S-Visual tool to support the research community, by providing deep analysis into the performance of OpenCL programs. We also introduce other new visualization options (through statistical graphs) in M2S which provide further details on OpenCL application characteristics and utilization of system resources. This includes plots that reveals the occupancy of compute units based on static and run-time characteristics of the executed OpenCL kernels, histograms that presents the memory access patterns of the OpenCL applications, plots that characterizes the network traffic generated by transactions between memory modules during an OpenCL application execution, and plots that reveals the utilization of network resources (such as links and buses) after the application execution is complete.

The tutorial is organized in two parts, covering the full-system visualization of OpenCL application execution via M2S-Visual, and characterization of OpenCL application impact on system resource using the generated static graphs. Each section is accompanied with simulation examples using working demos.

  • Tuesday, 12 May
  • 13:30
    Duration: 1.5 hours
  • Track #1 Room
    Li Ka Shing Center, Stanford

Developing Optimized Libraries for Scalable OpenCL Acceleration on FPGAs

Fernando Martinez Vallina, Spenser Gilliland, Devadas Varma, and Vinay Singh, Xilinx

Software libraries are at the backbone of all programming projects, providing scalable implementations of commonly used functions to accelerate the completion of the project. Until now, this design paradigm has been limited to CPU and GPU based systems due to the RTL based programming paradigm of the FPGA. The adoption of OpenCL as a programming standard for heterogeneous computing enables the software library design flow to be extended to FPGAs.

The Xilinx SDAccel Development Environment enables a software library development flow in which code written in OpenCL C, C, and C++ can be encapsulated and reused across multiple projects. One of the key advantages of the FPGA fabric that can be captured by a library is the trade off between latency, throughput , and power consumption. Depending on the application workload, functions of a library can be instantiated with different user provided parameters to generate custom hardware best suited for the current task. This workshop describes how users can leverage the SDAccel environment to create FPGA optimized OpenCL compatible libraries in the language of their choice.

  • Tuesday, 12 May
  • 13:30
    Duration: 1.5 hours
  • Track #2 Room
    Li Ka Shing Center, Stanford

OpenCL.next Overview

Ben Ashbaugh and Adam Lake

This Technical Presentation will describe new features and changes that are anticipated to be in the next version of OpenCL, which has been the primary Khronos OpenCL working group focus for the past year. The presentation will cover:

  • Design Philosophy – What are the main motivations for the next version of OpenCL?
  • OpenCL Kernel Language – New Features and Changes
  • SPIR/IR Updates
  • Execution Model Enhancements
  • New API Features
  • Call for Feedback – How to influence the next version of OpenCL and beyond?

This session will also include:

Developer Feedback Session on OpenCL 2.1, SYCL and SPIR-V,
including Panel Discussion

Chaired by Simon McIntosh-Smith.

Here is your opportunity as developers to help drive the future development of the OpenCL related APIs by providing your feedback to key members of the Khronos group and a select group of vendor panellists. Your contributions can relate directly to the detail of the latest draft specifications or you can share your thoughts on the wider issues relating to OpenCL, its eco-system and heterogeneous computing in general. We encourage participation and look forward to some lively debates and friendly discussions!

  • Tuesday, 12 May
  • 15:30
    Duration: 1.5+ hours
  • Combined Track Room
    Li Ka Shing Center, Stanford