IWOCL 2016 Sessions

Tue 19th – Hands on OpenCL | Wed 20th – Tutorials and Sessions | Thu 21st – Sessions | Posters (Wed/Thu)

Tue 19th April

Advanced ‘Hands-on-OpenCL’ Tutorial

Attend with a One-Day ‘Hands-on OpenCL’ pass or Three-Day Pass

Advanced ‘Hands-on-OpenCL’ Tutorial

Simon McIntosh-Smith (University of Bristol and Conference Chairman)

The tutorial format is a 50/50 split between lectures and exercises and uses a mix of OpenCL C and C++ host APIs. Attendees will require their own laptop to log onto a server running OpenCL 1.0 thru OpenCL 2.0. Alternatively, students can run the exercises on their laptops using their preferred OpenCL SDK. Additional Information

TUESDAY 19th APRIL
09:00 – 17:00
PLENARY

Wednesday 20th April

Tutorials and Conference Sessions

Attend with a Two-Day or Three-Day Pass

C++ for OpenCL

Maria Rovatsou (Codeplay) and Adam Stanski (Intel).

This workshop will present all of the efforts of the Khronos OpenCL working group to support modern C++ features and the different programming models and paradigms that are going to be supported as core features or as additional specifications. It will introduce all of the new developments in OpenCL language and the engagements with the C++ community and its evolution towards parallel and heterogeneous platforms. It is a great opportunity for the OpenCL community to discuss these developments and provide feedback to the Khronos OpenCL group on its direction and proposed standards.

There will be one coffee break at around 11:00.

WEDNESDAY 20th APRIL
09:00 – 13:00
TRACK A
DOWNLOAD SLIDES #1
DOWNLOAD SLIDES #2

Best Practices and Tools to Debug and Optimize OpenCL Applications

Yuval Eshkol (Intel).

While OpenCL provides a convenient abstraction layer, together with hardware multiplicity comes the danger of having the need to re-tune program for each specific device. So the significant part of this performance oriented tutorial emphasizes general debugging techniques and optimizations tips that span multiple compute architectures (we focus mostly on CPU and GPU). We also describe common best practices for OpenCL-enabled applications like zero-copy data flows, OS-specific tips (e.g. OpenGL and DirectX interoperability topics for Linux and Windows respectively)

There will be one coffee break at around 11:00.

WEDNESDAY 20th APRIL
09:00 – 13:00
TRACK B
DOWNLOAD SLIDES

Lunch Break, Demos and Posters

13:00 – 14:00

Welcome Address

Simon McIntosh-Smith (Univ. Bristol).

WEDNESDAY 20th APRIL
09:00 – 09:10
TRACK A
DOWNLOAD SLIDES

KEYNOTE: OpenCL – A State of the Union

Neil Trevett (Khronos President, OpenCL Working Group Chair and VP at NVidia).

Processor architectures employ ever more parallelism to increase performance and power efficiency, and OpenCL continues to evolve to provide a pragmatic language and run-time stack to enable that parallelism to be tapped on an ever widening variety of platforms. However, the use cases demanding parallel processing are rapidly evolving, while the open API landscape is exploring new generation, low-level, explicit graphics and compute APIs such as Vulkan. At the same time, standard languages such as C++ are evolving to natively describe parallelism. This session explores the state of the language and API landscape for parallel computation, and looks forward to how OpenCL may respond to these industry needs and dynamics to continue to play a central role in bringing heterogeneous parallel processing to the computing mainstream.

WEDNESDAY 20th APRIL
14:00 – 14:30
PLENARY
THE FUTURE & ECOSYSTEM
DOWNLOAD SLIDES

Envisioning the Future – Using SYCL to Develop Vision Tools

Luke Iwanski and Mehdi Goli (Codeplay).

Although high-level libraries like OpenCV abstracts both the system-level and kernel-level optimisations of built-in operations over heterogeneous platforms, it can still be difficult for a programmer to develop a custom vision operation across different platforms. In this presentation, we propose a high-level CV framework called VisionCpp that supports the performance portability of the developed CV applications over different OpenCL-enabled platforms. VisionCpp supports compile-time construction and optimisation of OpenCL kernels by using SYCL as a back-end architecture and is ideal for embedded platforms as it prevents the unpredictable run-time construction and memory usage required for OpenCL kernels of vision applications. Taking advantage of SYCL “single-source programming style”, VisionCpp allows programmers to easily develop custom vision operations in C++. The generated application will then be used on different platforms with no modification to the source code.

WEDNESDAY 20th APRIL
14:30 – 15:00
PLENARY
THE FUTURE & ECOSYSTEM
DOWNLOAD SLIDES

hiCL: An OpenCL Abstraction Layer for Scientific Computing, Application to Depth Imaging on GPU and APU

Issam Said (Lip6), Pierre Fortin, Jean-Luc Lamotte (Université Pierre et Marie Curie) and Henri Calandra (Total EP)

In order to deploy legacy scientific applications on hardware accelerators, OpenCL requires extensive programming efforts and often a high number of lines of code in the original code. Moreover, APUs feature a unified CPU-GPU memory, which while on one hand helps alleviate the impact of the PCI bus on GPU applications performance, it adds more OpenCL programming complexity as different memory access modes are introduced. This presentation will present hiCL, a C/C++ and Fortran 90 abstraction layer developed on top of OpenCL to help reduce the programming burden by simplifying memory management and the kernel executions. With the help of hiCL, plugging OpenCL kernels in existing algorithms becomes easier which encourages the use of OpenCL to accelerate industrial codes. The tool will be illustrated using example of integrating OpenCL kernels, via hiCL, on a Fortran 90 code of the Reverse Time Migration (RTM), a depth imaging algorithm widely used by the Oil & Gas industry to prospect new deposits. RTM performance numbers on GPUs and APUs with respect to the frequency of data retrieval (in order to construct the subsurface images) from the GPU will also be shown.

WEDNESDAY 20th APRIL
15:00 – 15:30
PLENARY
THE FUTURE & ECOSYSTEM
DOWNLOAD SLIDES

Afternoon Break, Demos and Posters

15:30 – 16:00

The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development

Tyler Sorensen and Alastair Donaldson (Imperial College London)

One of the benefits to programming in OpenCL is platform portability. That is, an OpenCL program that follows the OpenCL specification should, in principle, execute reliably on any platform that supports OpenCL. To assess the current state of OpenCL portability, we provide an experience report examining two sets of open source benchmarks which we attempted to execute across a variety of GPU platforms, via OpenCL. We report on the portability issues we encountered, where applications would execute successfully on one platform but fail on another. We classify issues into three groups: (1) framework bugs, where the vendor-provided OpenCL framework fails; (2) specification limitations, where the OpenCL specification is unclear and where different GPU platforms exhibit different behaviours; and (3) programming bugs, where non-portability arises due to the program exercising behaviours that are incorrect or undefined according to the OpenCL specification. The issues we encountered slowed the development process associated with our sets of applications, but we view the issues as providing exciting motivation for future testing and verification efforts to improve the state of OpenCL portability; we conclude with a discussion of these.

WEDNESDAY 20th APRIL
16:00 – 16:30
PLENARY
THE FUTURE & ECOSYSTEM
DOWNLOAD SLIDES

PANEL DISCUSSION: What Next for OpenCL?

Khronos OpenCL Working Group Members and OpenCL Community Members

If you have any questions you would like us to put to the panel please Contact Us.

WEDNESDAY 20th APRIL
16:30 – 17:30
PLENARY
THE FUTURE & ECOSYSTEM
NO SLIDES

Conference Dinner

Conference Delegates and Guests

All Two- and Three-Day pass holders may attend the conference dinner. Tickets for the guests of delegates may be purchased. The one-day ‘Hands-on OpenCL’ pass does not include the conference dinner, however delegates are welcome to attend by purchasing a conference dinner pass. Additional Information

WEDNESDAY 20th APRIL
19:30 – 22:00
LOCAL RESTAURANT

Thursday 21st April

Conference Sessions (Cont’d)

Attend with a Two-Day or Three-Day Pass

KEYNOTE: The OpenCL Library Ecosystem: Current Status and Future Perspectives

Karl Rupp (Freelance Computational Scientist)

OpenCL as an open standard for parallel programming of heterogeneous systems seems to be an attractive choice for software library implementations. Indeed, iwocl.org lists 83 OpenCL-enabled libraries as of February 12, 2016, suggesting a healthy library ecosystem. On closer inspection, however, a significant share of these libraries are either OpenCL bindings for other languages, libraries with OpenCL features in experimental state at best, or orphaned. Clearly, there is room for improvement; but what is required to improve the state of the OpenCL-enabled library ecosystem? Which future extensions to OpenCL can make library development easier? This talk aims to stimulate discussion by sharing lessons learnt in the area of high performance computing through the development of ViennaCL.

THURSDAY 21st APRIL
09:00 – 09:30
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

clSPARSE: A Vendor-Optimized Open Source Sparse BLAS Library

Joseph Greathouse, Kent Knox, Kiran Varaganti and Mayank Daga (AMD) and Jakub Poła (University of Wrocław and Vratis, Ltd.)

clSPARSE is a high-performance open source sparse BLAS library developed by AMD and Vratis, Ltd. This presentation discusses the benefits of clSPARSE and the algorithms it contains.

THURSDAY 21st APRIL
09:30 – 10:00
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

OpenCL FFT Optimized for Intel Processor Graphics

Dan Petre, Adam Lake and Allen Hux (Intel)

This technical presentation discusses the OpenCL implementation and optimization of 1D FFTs for the Intel Processor Graphics. The presentation will examine the motivation for the development, details of the implementation and conclude with the lessons learned and the next steps, including work on 2D FFTs.

THURSDAY 21st APRIL
10:00 – 10:30
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

Nearly Everything You Need to Know About Optimizing Convolutional Neural Networks on Embedded Platforms with OpenCL

Anton Lokhmotov and Grigori Fursin (Dividiti)

A Convolutional Neural Network (CNN) is comprised of one or more convolutional layers then followed by one or more fully connected layers. CNNs fall into the class of “deep learning” techniques. The architecture of a CNN is designed to take advantage of the 2D structure in digital signals such as images. Therefore, CNNs are increasingly being used for image classification, localization and detection. Due to the computational intensity of training and tuning CNNs, this is typically done on clusters with NVIDIA GPUs. In this technical talk, we will present the other, less known side of the story, namely, deploying CNNs on embedded platforms that support OpenCL. First, we will describe the computational kernels and provide results of profiling several popular CNNs to focus optimization efforts. Then, we will describe our experience with optimizing the kernels and overall computation on several embedded platforms. Finally, we will share our insights on designing CNNs in a way that trade-offs performance and accuracy to make deployment possible on a range of form factors – from sensors to self-driving cars. We aim to provide the audience with sufficient information to optimize CNNs on their embedded platforms of choice and perhaps even participate in the Low-Power Image Recognition Challenge (http://lpirc.net). We hope to inspire the community to share their experience later and contribute to an open-source framework comprised of optimized CNN implementations.

THURSDAY 21st APRIL
10:30 – 11:00
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

Morning Break, Demos and Posters

11:00 – 11:30

Boost.Compute: A Parallel Computing Library for C++ Based on OpenCL

Jakub Szuppe (Warsaw University of Technology)

Boost.Compute has been accepted for integration with the official Boost C++ libraries. With this step, and considering the large number of Boost users, usage of Boost.Compute and visibility of OpenCL among C++ developers is likely to increase. This technical presentation is therefore intended as a comprehensive overview of Boost.Compute for current and prospective users of the library and covers the library’s overall architecture, its low-level and high-level functionality and advanced topics such as custom functions, closures and lambda expressions. The presentation also describes how a custom template-based OpenCL library can be designed on top of Boost.Compute. Examples are included throughout the presentation to aid in a better understanding. Among others, I will demonstrate how advanced features of the library can lead to a simple and efficient C++-only solution for BLAS calculations. The architectural presentation of the library will be followed by a presentation of current performance results of the library and a comparison with competing solutions. I will conclude the presentation with insights that I gained during Google Summer of Code ’15 and my overall experience in contributing to Boost.Compute, which I hope to be of interest to the wider developer community.

THURSDAY 21st APRIL
11:30 – 12:00
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

Threading Building Block (Intel TBB) Flow Graph as a Software Infrastructure Layer for OpenCL-based Computations

Alexei Katranov and Alexey Kukanov. (Intel)

The modern computing systems are becoming heterogeneous with a variety of programmable units: CPU, GPU, FPGA, domain-specific accelerators, etc. OpenCL evolves as a cross-platform programming model for a wide range of computing devices, however utilizing these resources in a complex heterogeneous system remains a challenge. Intel® Threading Building Blocks (Intel® TBB) is a widely used C++ library for shared-memory parallel programming that provides the flow graph functionality to express unstructured parallelism and asynchronous computations. Our presentation shows how OpenCL and Intel TBB flow graph can be used in conjunction to simplify programming for complex heterogeneous systems.

THURSDAY 21st APRIL
12:00 – 12:30
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

OpenCL caffe: Accelerating and Enabling Cross-Platform Machine Learning Frameworks

Yibing Liu (Tsinghua University), Maohua Zhu (UCSB), Hugh Perkins (ASAPP), Junli Gu and Yuan Gao

This paper presents OpenCL caffe, which targets in transforming the popular CUDA based caffe framework into open standard OpenCL backend. OpenCL caffe targets to enable an heterogeneous platform compatible DNN framework and achieve competitive performance based on OpenCL tool chain. As DNN is a high complex algorithm, we use a two-phase strategies: first we introduce the OpenCL porting strategies that guarantee algorithm convergence; Then we summarize OpenCL’s performance bottlenecks in DNN domain and propose a few optimization techniques including batched manner data layout expansion and multiple command queues to better map the problem size into existing clBLAS libraries, improve hardware resources utilization and boost OpenCL runtime efficiency. We verify OpenCL caffe’s successful offline training and online recognition on both high end GPU cards and fused CPU + GPU APUs. Experimental results show that the phase-two’s optimized OpenCL caffe achieved a 5x speedup without modifying clBLAS library. The user can directly run mainstream DNN models and achieve the best performance for a specific processors by choosing the optimal batch number depending on H/W properties and input data size

THURSDAY 21st APRIL
12:30 – 13:00
PLENARY
LIBRARIES & APPS
DOWNLOAD SLIDES

Lunch Break, Demos and Posters

13:00 – 14:00

OpenCL-Based Mobile GPGPU Benchmarking: Methods and Challenges

Rotem Aviv and Guohui Wang (Qualcomm)

Benchmarking general-purpose computing on graphics processing unit (GPGPU) aims to profile and compare performance across different devices. Due to the low-level nature of most GPGPU APIs, GPGPU benchmarks are also useful for architectural exploration and program optimization. This can be challenging in mobile devices due to lack of underlying hardware details and limited profiling capabilities in some platforms. Measuring the performance of mobile GPU by executing benchmarks covering major hardware and software features can reveal the strength and weakness of a GPGPU system, enable better program optimization and make automatic performance tuning possible. In this presentation, we will describe several design methods of OpenCL-based mobile GPGPU benchmarking, and discuss key issues that one may encounter during development. We will also present design tips and guidelines to achieve more “fair” and accurate benchmarking results.

THURSDAY 21st APRIL
14:00 – 14:30
PLENARY
MOBILE & FPGA
DOWNLOAD SLIDES

Optimizing OpenCL Applications on Xilinx FPGA

Jeff Fifield, Ronan Keryell, Hervé Ratigner, Henry Styles, and Jim Wu (Xilinx)

In this presentation we focus on current Xilinx FPGA (Field-Programmable Gate Array) platforms with the SDAccel OpenCL environment. FPGA have the unique feature of a reconfigurable architecture by opposition to CPU, GPU or DSP which have a fixed architecture and are only programmable. For example the elementary functions in an FPGA can be configured according to an addressable memory, as such the interconnection among them, the internal memory organization, but also even the ultra high-speed input/output of the chip to interface with the outside world. This fine grain configurability allows high performance and power efficiency. We introduce the architecture of modern FPGA with their main building blocks and how functional operations can be expressed. The translation of imperative languages down to the hardware level is done through
High-Level Synthesis. It can be done in several ways with different time/surface trade-off, for example by playing on parallelism and pipelining.

THURSDAY 21st APRIL
14:30 – 15:00
PLENARY
MOBILE & FPGA
DOWNLOAD SLIDES

OpenCL Compiler Tools for FPGAs

Dmitry Denisenko (Altera)

Compiling OpenCL kernels to FPGAs presents a new set of usability challenges. Many OpenCL developers are not hardware experts but are creating state-of-the-art hardware with the help of OpenCL compilers for FPGAs. To get great performance, the compiler has to provide clear and actionable feedback on the generated hardware in terms that the user can understand, relate back to the source code, and make enable the developer to make modifications. The challenge is even greater because the feedback information is unfamiliar to users of CPUs and GPUs. In this Technical Presentation we will describe the usability tools available in Altera OpenCL SDK for FPGAs and how they allow quick iterations to get high performance code. We will first briefly describe the loop pipelining, things that can go wrong, and how our optimization report helps you diagnose most loop performance issues. Then we will show how the hardware report can help you diagnose inefficient resource usage as well as global, constant, local and private memory configurations. Finally, we’ll show how the Dynamic Profiler can help diagnose dynamic inefficiencies in hardware.

THURSDAY 21st APRIL
15:00 – 15:30
PLENARY
MOBILE & FPGA
DOWNLOAD SLIDES

Afternoon Break, Demos and Posters

15:30 – 16:00

Automatic Test Case Reduction for OpenCL

Moritz Pflanzer, Alastair Donaldson and Andrei Lascu (Imperial College London)

We report on an extension to the C-Reduce tool, for automatic reduction of C test cases, to handle OpenCL programs. This enables an automated method for detecting bugs in OpenCL compilers, by generating large random kernels using the CLsmith generator, identifying kernels that yield result differences across OpenCL platforms and optimisation levels, and using our novel extension to C-Reduce to automatically reduce such kernels to minimal forms that can be filed as bug reports. A major part of our effort involved the design of ShadowKeeper, a new plugin for the Oclgrind simulator that provides accurate detection of accesses to uninitialised data. We present experimental results showing the effectiveness of our method for finding bugs in a number of OpenCL compilers.

THURSDAY 21st APRIL
16:00 – 16:30
PLENARY
INFRASTRUCTURE
DOWNLOAD SLIDES

GPU daemon – Road to Zero Cost Submission

Michal Mrozek and Zbigniew Zdanowicz (Intel)

One of the biggest problems of OpenCL efficient usage is the latency submission. Time needed to pass through the driver stack is so significant that it limits the use of OpenCL on GPU in applications requiring low-latency. This presentation we present a novel approach utilizing new features of OpenCL 2.0 : Fine-Grained SVM and device enqueue_kernel that allows completely new usage models. We will present the idea of GPU daemon that operates using different modes (polling, enqueue_kernel and monitored_fence) and offers various levels of flexibility for the end user application. Part of presentation will show the data & code samples for each approach and will also compare each mode with the traditional submission model.

THURSDAY 21st APRIL
16:30 – 17:00
PLENARY
INFRASTRUCTURE
DOWNLOAD SLIDES

Employing Out Of Order Queues for Better GPU Utilization in OpenCL

Pavan Lanka and Krzysztof Laskowski (Intel)

GPUs (Graphics Processing Units) are highly parallel architectures. They can process large sets of data in a very efficient manner. To use the available compute power on these modern GPUs and get better performance per watt, efficient scheduling of work is important. The driver does efficient scheduling by taking into consideration both workload and HW (Hardware)/OS (Operating System) characteristics. The paper describes various optimizations/new support that the Intel OpenCL GPU driver implemented to significantly improve hardware utilization. When multiple jobs are submitted to the GPU the driver in between the application and GPU HW packages them into batches and submits work to the HW. When submitting multiple jobs it is very important the driver does not introduce any bubbles in the pipeline where the HW is left under/un-utilized. The driver in this optimization looks at various dependencies specified by the application across multiple jobs and schedules the work to the GPU by eliminating any un-necessary serialization events. This optimization is implemented in the Intel Graphics Driver’s OpenCL Driver Stack.

THURSDAY 21st APRIL
17:00 – 17:30
PLENARY
INFRASTRUCTURE
DOWNLOAD SLIDES

Closing Remarks and IWOCL 2017

Simon McIntosh-Smith (University of Bristol and Conference Chairman)

THURSDAY 21st APRIL
17:30
CLOSING PLENARY
DOWNLOAD SLIDES

Wed 20th – Thu 21st April

Posters

Attend with a Two-Day or Three-Day Pass

Runtime Comparison Solving Two Dimensional Gray-Soctt Equation on Different Devices That Support OpenCL

Michael Quell (TUWien)

The Gray-Scott equation is an example of a reaction-diffusion equation with chaotic solutions. You can expect patterns to emerge from chaos. A uniformly discretization in space and periodic boundary conditions allows the Fast Fourier Transform to be used, so that when coupled with a suitable time stepping scheme a numerical method that suits the parallelism of OpenCL is obtained. The code was benchmarked on various CPU and GPU devices. Performance results for various problem sizes are shown. Example programs can be found at: https://github.com/MichaelQuell/GrayScott-OpenCl

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

Employing Out Of Order Queues for Better GPU Utilization in OpenCL

Pavan Lanka and Krzysztof Laskowski (Intel)

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

Towards Visual Exploration of Parallel Programs Using a Domain-specific Language

Tobias Klein, Eduard Gröller and Markus Hadwiger (KAUST) and Stefan Bruckner and Eduard Gröller (University of Bergen)

The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program’s function and structure, ﬁnding the causes of possible slowdowns, locating program bugs, and interactively exploring
and visually comparing diﬀerent code variants in order to improve performance and correctness. Our approach enables very speciﬁc, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very speciﬁc to a particular problem or optimization goal, such as analyzing the causes of memory bank conﬂicts or understanding an entire parallel algorithm.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

Extending Paralldroid for the Automatic Generation of OpenCL Code

Sergio Afonso, Alejandro Acosta and Francisco Almeida (University of La Laguna)

The popularity of handheld systems (smartphones, tablets, …) and their increasing computational capabilities open a new era in parallel computing terms. The efficient use of such devices is still a challenge. The heterogeneity of SoCs and MPSoCs is demanding very specific knowledge of the devices, which represents a very high learning curve for general purpose programmers. To ease the development task we present the Paralldroid extension for OpenCL, a development framework oriented to general purpose programmers for mobile devices. Paralldroid presents a programming model that unifies the different programming models of Android and allows for the automatic generation of parallel code. The developer just implements an object oriented Java application and introduces a set of Paralldroid annotations in the sections of code to be optimized. The annotations used are based on the OpenMP 4.0 specification. The Paralldroid system then automatically generates the native C, Renderscript or OpenCL code required to take advantage of the underlying platform. The Renderscript and OpenCL generated codes allow the execution in the GPU. The computational experience proves that the results are quite promising. The code generated by Paralldroid takes advantage of the GPU and offers a good performance with a very low cost of development, so it contributes to increase the productivity when developing efficient code.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

Introduction of an OpenCL-based Design Pattern for Model-transformation

Tamás Fekete and Gergely Mezei (Budapest University of Technology and Economics)

Model-driven engineering (MDE) is a widely applied development methodology in the software industry. In MDE, one of the main concepts is the model-transformation. We are building a complex solution for model-transformation, which is based on the OpenCL framework in order to achieve hardware independent and highly parallel computation. The solution is referred to as the GPGPU-based Engine for Model Processing (GEMP). Model-transformation can be divided into two main steps. Firstly, the user defined patterns must be found in the input domain model, which is represented as a graph. The second step is to apply the changes, namely rewriting the model. The complexity of the steps are different, the first step has several orders of magnitude larger computing complexity. Currently, we are focusing on the first step only. In real life, GEMP must handle big input and output models, which are quite challenging from the aspects of the computation speed and the memory usage. Therefore, during the design of GEMP, we defined the following three main requirements: (i) Fast computation time is required by applying highly parallel computation. (ii) Low memory usage is mandatory both on the host side and on the GPU device. (iii) GEMP must be highly scalable to handle large models without significant time delay. To achieve the defined goals, we applied several techniques which are introduced and described. The main novelty of the current paper is that the OpenCL kernel code is created at runtime. The kernel code is built using run-time configuration data (e.g. GPU device type, pattern size) to achieve the best performance. Our design decisions, implementation technique and measurements are presented in the poster.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
REGISTER NOW

Benchmarking, Autotuning and Crowdtuning OpenCL programs using the Collective Knowledge Framework

Anton Lokhmotov (Dividiti)

This poster will present work on Collective Knowledge, an open framework for reproducible and collaborative optimization. Viewers of this poster can participate in crowdtuning with a small prize for the best optimization parameters found.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

C++ Classes and Templates for OpenCL Kernels with PATOS

Franz Richter-Gottfried, Patrick Kreutzer, Alexander Ditter and Dietmar Fey (FAU Erlangen-Nürnberg)

This poster presents PATOS, a CLANG-based source-to-source compiler to extend the OpenCL kernel language with C ++ classes, template types for classes and functions and C ++ functor templates. The generated code is standard conforming OpenCL-C which is usable with unmodified OpenCL drivers. With PATOS, type-agnostic host libraries can directly use OpenCL without having to deal with manual type matching. First, PATOS has to get to know which types are actually used either by analyzing the host code or by manually instantiating the required types directly inside of the kernel file. Classes are flattened into structs and functions operating on their data. Templates are separately instantiated with all templates mapped to the requested types. Type-aware name mangling prevents name clashes. The generated header and implementation can be used by the OpenCL driver without modification.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

Intel’s Fixed-Function Media Extensions for OpenCL

Adam Herr (Intel)

This poster presents Intel’s fixed function silicon blocks and how these are exposed through the OpenCL API with the objective to provide a high-level overview, ignoring many of the intricacies of hardware and extension APIs. A secondary objective is to demonstrate the suitability of OpenCL as a platform for synthesizing programmable kernels and fixed function hardware into coherent workloads that utilize all available hardware. The structure of the poster consists of an introduction and subsequent sections for each of the Video Motion Estimator, Video Enhancement Engine and Video Encode/Decode Engine blocks and their respective OpenCL extensions.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
REGISTER NOW

Analysis of Algorithms for Exact Pattern-Matching Problem Using OpenCL

Andrii Rozumnyi and Dmytro Chasovskyi (University of Tartu)

The exact pattern-matching problem which means to find all the occurrences of a pattern inside the given text, nowadays has many different applications such as parsers, word processors, spam filters, DNA applications in computational molecular biology etc. In some cases, when the string length is relatively small, the problem can be efficiently solved using classical algorithms with linear time complexity. However, in some areas such as bioinformatics, this task is still a problem as due to huge length of the genomic data (for instance human genome is around three billion characters) and many patterns (it can be millions of patterns) processing is time consuming. As the time complexity cannot be faster than linear in such case, there is a need to use another approach for increasing efficiency. Thus, parallel computing is able significantly speed up the time for solving exact pattern-matching problem. As OpenCL allows the use of wide range of devices in order to do parallel computing, it is a good idea to find such algorithm and configuration of particular devices, which give the best results. In this work, we implement the most widely used algorithms for the exact pattern-matching problem and compare them with the same algorithms, but adopted for concurrent processing using the power of OpenCL. In addition, performance on different hardware configurations will be measured.
Code sources for the current project can be found at: https://github.com/JaakTree/pattern_matching.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
REGISTER NOW

OpenCL Meets Open Source Streaming Analytics

Robin Grosman (Huawei)

Analytics is no longer just an off-line problem that can be easily solved with large clusters of computers. Today, businesses want to make decisions and take actions based on the input that was just received. Accelerators like FPGAs are great for time-sensitive processing with high volume, but are traditionally focused on a more fixed-function role. We integrated an FPGA using OpenCL into very flexible streaming software. This solution allows complex topologies to be built from basic building blocks with accelerators using a familiar interface for software developers.

20th & 21st APRIL
POSTER SESSION
BREAK-OUT AREA
ACM DIGITAL – SOON

Note: The above sessions are subject to change without notice.