46. TASK-BASED EXECUTION OF PARALLEL APPLICATIONS
Formulating parallel algorithms to mask communication delays can be crucial in maintaining scalability and adapting to different parallel architectures, such as multi-core chips and petascale systems. However, programming efforts may fail to realize overlap in practice due to inherent limitations in the underlying computational model. For example, the standard SPMD model imposes limitations by failing to provide for dynamic scheduling, which is useful in managing overlap. We have implemented a small library, called Thyme, that employs a task-based model of execution, and represents a program as a task precedence (dependence) graph. This graph serves as application meta-data and facilitates dynamic task scheduling, as well as other performance optimizations. We show how this meta-data may be used to explore different scheduling policies and hence improve performance without affecting the application's correctness in three relevant applications: Jacobi relaxation in three dimensions, parallel matrix multiplication, and the Fast Fourier Transform.
Our approach is novel among graph-based programming models because the task dependence graphs represent a partitioning of an application's loop iteration spaces, reducing unnecessary synchronizations in statically scheduled code and allowing tasks from multiple iterations to execute simultaneously. Thus communication can be adaptively overlapped and tasks may execute as soon as possible, minimizing the critical path that determines performance. Our work is especially useful for parallel applications which have a high proportion of communication to computation, and where an optimal schedule is not easily determined before run time. Applications running on high end petascale systems are expected to be prone to high communication overheads, especially those employing many-way multicore processors.