Technical Articles

Stand on the shoulders of giants.

IPMACC: Open Source OpenACC to CUDA/OpenCL Translator

In this paper we introduce IPMACC, a framework for translating OpenACC applications to CUDA or OpenCL. IPMACC is composed of set of translators translating OpenACC for C applications to CUDA or OpenCL. The framework uses the system compiler (e.g. nvcc) for generating final accelerator’s binary. The framework can be used for extending the OpenACC API, [...]

2014-12-10T09:38:31+01:00December 10th, 2014|Technical Articles|

Portable OpenCL Out-of-Order Execution Framework for Heterogeneous Platforms

Heterogeneous computing has become a viable option in seeking computing performance, to the side of conventional homogeneous multi-/single-processor approaches. The advantage of heterogeneity is the possibility to choose the best device on the platform for different distinct workloads in the application to gain performance and/or to lower power consumption. The drawback of heterogeneity is the [...]

2014-12-10T09:13:17+01:00December 10th, 2014|Papers, Technical Articles|

High-Level Programming Framework for Executing Streaming Applications on Heterogeneous OpenCL Platforms

Abstract: As the computer industry is reaching more and more limits regarding processor speed and transistor size, they have to come up with complex new architectures and more efficient use of the available processing power. For application developers this can be a difficult task, because they have to be aware of low-level hardware properties and [...]

2014-07-02T07:55:48+01:00July 2nd, 2014|Technical Articles|

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that […]

2015-10-27T09:51:35+01:00June 26th, 2014|Technical Articles|

Aristotle: A performance impact indicator for the OpenCL kernels using local memory

Abstract Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on [...]

2016-02-24T09:26:46+01:00June 24th, 2014|Technical Articles|

On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures

Abstract With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel’s Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area — structured grid codes — and investigated techniques for ensuring performance portability [...]

2014-06-24T07:12:34+01:00June 24th, 2014|Technical Articles|

Toward OpenCL Automatic Multi-Device Support

Abstract To fully tap into the potential of today heterogeneous machines, offoading parts of an application on accelerators is no longer sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing [...]

2014-06-23T14:03:43+01:00June 23rd, 2014|Technical Articles|

Targeting multiple heterogeneous hardware platforms with OpenCL

Abstract The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations [...]

2014-06-23T13:53:13+01:00June 23rd, 2014|Technical Articles|

Efficient all-against-all protein similarity matrix computation using OpenCL

Abstract: Today, it is the amount of available data rather than its acquisition that poses a significant challenge to computer science. It is the issue of extracting valuable and useful information from increasing data volumes, and it has manifested itself throughout most of modern global digital frameworks, like economic analysis, weather forecast, or, indeed, national [...]

2014-06-23T14:14:42+01:00June 12th, 2014|Technical Articles|

3D Skeleton Extraction Method using Potential Field on OpenCL

Abstract: For 3D skeleton extraction, the algorithm based on generalized potential fields, known as the outstandingly flexible and robust method, is suffering from seriously heavy computational burden. In this paper, we put forward a parallel algorithm based on OpenCL heterogeneous parallel framework, which can make full use of the great computing power provided by heterogeneous [...]

2014-06-10T09:52:10+01:00June 10th, 2014|Technical Articles|
Go to Top