Node-wide Asynchronous Message Progression for Efficient & Scalable Communication in High Performance Computing
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
High Performance Computing (HPC) has served as the enabler for several scientific and engineering accomplishments. Consequently, there has been an ever-increasing demand to create larger and faster high performance systems. To efficiently utilize the HPC resources, parallel applications rely on software that abstract the cluster hardware. Currently, the most prominent software abstraction standard in HPC is the Message Passing Interface (MPI). Parallel applications entail communications to synchronize and to share intermediate results. To minimize the duration of application executions, it is crucial to overlap such communications with the computations. The MPI standard specifies three messaging semantics, namely, point-to-point communication, one-sided communication and collectives. In point-to-point communication, the overlap of large payloads has been a problem and there have been several proposals to address this. Among these approaches, asynchronous message progression is more adoptable because of its ability to deal with a wider range of inefficiencies and its non-reliance on specialized hardware. Traditional asynchronous message progression approaches have relied on either polling or interrupt based threads. The polling based approach is more responsive but is resource-intensive. On the other hand, the interrupt based approach is resource-efficient but is associated with overheads. This thesis proposes a node-wide asynchronous message progression technique that offers the advantages of both polling and interrupt based approaches, while minimizing or eliminating their adverse effects. This approach was found to be scalable, incur negligible overheads, induce the ideal amount of overlap in most scenarios of point-to-point communications and cast a small memory footprint. This technique was found to improve the overlap of certain collectives as well. One-sided MPI communication offers the ability to transfer messages with few or no synchronizations, regardless of the payload size. This scheme promotes overlaps but there are several overlap inhibiting scenarios. This thesis proposes a similar asynchronous message progression technique to address such scenarios. The one-sided implementation was able to achieve overlap in the different inefficient scenarios, with negligible overheads and a small addition to the memory footprint.
