Abstract: Since the advent of heterogeneous computing a large number of applications have been ported to utilize heterogeneous systems. Data parallel applications have mapped their computation to the large numbers of cores in heterogeneous systems and reported large performance improvements. However, the programming APIs targeting heterogeneous systems require explicit data movement and thread management by the application developer. Low level programming models such as OpenCL complicate the performance optimization of closely coupled applications executing on heterogeneous systems.

Closely coupled applications refers to computational problems with frequent communication between the host and the device, where computation carried out on the host and device affect each other. Developing architectural support for efficient host device interaction in closely coupled applications is an open problem in heterogeneous systems which if addressed can enable new application spaces for heterogeneous systems, simplify development and improve performance. This thesis proposes architectural enhancements to the profiling and workgroup scheduling subsystems of heterogeneous devices. The profiling and workgroup scheduling subsystems have been augmented with a resource known as the Offload Control Unit(OCU). The OCU enables performance monitoring of compute units with throughput counters. Throughput counters provide utilization information of compute units and the performance knowledge generated is utilized to improve execution performance for priority and data-driven workloads. Throughput counters and the software profiling subsystems result in a runtime that allows performance monitoring, profiling and specializations of applications built using heterogeneous computational pipelines. The scheduling capabilities proposed enable utilization of heterogeneous systems for workloads with QOS and non-homogeneous workgroup distributions. This thesis also proposes a benchmark suite for heterogeneous systems where flexibility in behaviour is a primary guiding design choice. The benchmark suite has led to the construction of application features to model benchmark behaviour. The features have been applied to study application behaviour on different heterogeneous systems at different layers of abstraction. The benchmarks and the application classification methodology enables study of behaviour of heterogeneous architectures across applications or different behaviours seen within the same application.

Author/s: Perhaad Mistry
Institution/s: Northeastern University, Boston, Massachusetts
Source/Type: Thesis