HPX-5: Difference between revisions

Revision as of 17:18, April 28, 2016

Announcement

HPX-5 version 2.0 Runtime System Release Announced by Indiana University!

The Center for Research in Extreme Scale Technologies (CREST) at Indiana University is pleased to announce the release of version 2.0 of the HPX-5 runtime system for petascale/exascale computing. Building on CREST’s commitment to developing new approaches for achieving the highest levels of performance on current and next-generation supercomputing platforms, HPX-5 is provided to support the international high-performance computing community in addressing significant challenges involved in achieving exascale computing.

HPX-5 is a reduction to practice of the revolutionary ParalleX execution model, which establishes roles and responsibilities between layers in an exascale system. It is implemented in portable C99 and is organized around a cooperative lightweight thread scheduler, a global address space, an active-message parcel transport, and a group of globally addressable local synchronization object classes. Internally, the infrastructure is built on scalable concurrent data structures to minimize shared-memory synchronization overhead. The global address space and parcel transport are based on the innovative Photon network transport library, which supports low-level access to network hardware and provides RDMA with remote completion events for low overhead signaling. An alternative ISend/IRecv network layer is included for portability, along with a reference MPI implementation. HPX-5 is compatible with Linux running on Intel x86 and Xeon Phi processors and various ARM core platforms (including both ARMv7 and ARMv8/Aarch64). A pre-release version of HPX-5 v2.0 is available for Mac OS X 10.10+.

“HPX-5 is a useful environment for exploring dynamic adaptive execution for high scalability computation as well as critical support for truly dynamic end-user science and engineering problems” said Thomas Sterling, professor at Indiana University and creator of the ParalleX execution model.

Introduction

HPX-5 v2.0 (High Performance ParalleX) is the latest version of the HPX-5 runtime system that provides a unified programming model for parallel and distributed applications, allowing programs to run unmodified on systems from a single SMP to large clusters and supercomputers with thousands of nodes. HPX-5 is a realization of ParalleX, an abstract cross-cutting exascale execution model, which establishes roles and responsibilities between system layers.

The HPX-5 interface and library implementation is guided by the ParalleX execution model and is freely available, open source, portable, and performance-oriented. HPX-5 is a general-purpose runtime system for applications, targeted at conventional, widely available architectures. It provides a unified programming model for parallel and distributed applications. As a dynamic adaptive runtime system, it is event-driven and embodies the principles of multi-threaded computing while also providing a global name and address space and advanced synchronization constructs. For communication between nodes, it uses the Photon network layer which has been tuned for optimal single sided communications as an alternative to the Message Passing Interface (MPI).

HPX-5 Architecture


Team Members	Thomas Sterling, Andrew Lumsdaine, Kelsey Shephard, Jayashree Ajay Candadai, Matt Anderson, Luke Dalessandro, Daniel Kogler, Abhishek Kulkarni
PI	Ron Brightwell, Sandia
Co-PIs	Andrew Lumsdaine
Website	http://hpx.crest.iu.edu
Download	http://hpx.crest.iu.edu/download

The ParalleX execution model itself is experimental and continues to evolve as driven by quantitative insights gathered from the Starvation-Latency-Overhead-Waiting for contention (SLOW) performance model. It aims to address the aggravating effects of asynchrony for extreme-scale machines and to improve the efficiency and scalability of scaling constrained applications. ParalleX utilizes lightweight concurrent threads managed using synchronization primitives such as dataflow and futures in order to alter the application flow structure from being message-passing to becoming message-driven. It also includes an advanced global address space model instead of relying on more conventional distributed memory structures.

Audience

HPX-5 is used for a broad range of scientific applications, helping scientists and developers write code that shows better performance on irregular applications and at scale when compared to more conventional programming models such as MPI. For the application developer, it provides dynamic adaptive resource management and task scheduling to reach otherwise unachievable efficiencies in time and energy and scalability. HPX-5 supports such applications with implementation of features like Active Global Address Space (AGAS), ParalleX Processes, Complexes (ParalleX Threads and Thread Management), Parcel Transport and Parcel Management, Local Control Objects (LCOs) and Localities. Fine-grained computation is expressed using actions. Computation is logically grouped into processes to provide quiescence and termination detection. LCOs are synchronization objects that manage local and distributed control flow and have a global address. The heart of HPX-5 is a lightweight thread scheduler that directly schedules lightweight actions by multiplexing them on a set of heavyweight scheduler threads.

Features in HPX-5

Fine grained execution through cooperative lightweight threads and unified access to a global address space.
High-performance PGAS implementation which supports low-level one-sided operations and two-sided active messages with continuations, and an experimental AGAS option that allows the binding of global to physical addresses to vary dynamically.
Makes concurrency manageable with globally allocated lightweight control objects (LCOs) based synchronization (futures, gates, reductions) allowing thread execution or parcel instantiation to wait for events without execution resource consumption.
Higher level abstractions including asynchronous remote-procedure-call options, data parallel loop constructs, and system abstractions.
Early implementation of ParalleX processes providing programmers with termination detection and per-process collectives.
Photon networking library synthesizing RDMA-with-remote-completion directly on top of uGNI, IB verbs, or libfabric. For portability and legacy support, HPX-5 emulates RDMA-with-remote-completion using MPI point-to-point messaging.
Distributed GPU and co-processors (Intel Xeon Phi) through experimental OpenCL support.
PAPI support for profiling. APEX policy engine (Autonomic Performance Environment for eXascale) support for runtime adaption.
Migration of legacy applications through easy interoperability and co-existence with traditional runtimes like MPI. HPX-5 2.0 is also released along with several applications: LULESH, Wavelet AMR, HPCG and the ParalleX Graph Library.

Timeline

The HPX-5 source code is distributed with a liberal open-source license.

v1.0.0 released on 3rd May 2015.
v2.0.0 released on 17th November 2015.
v3.0.0 is scheduled to be released in April/May 2016.

HPX-5 is developed with an agile process that includes continuous integration, regular point releases, and frequent regression tests for correctness and performance. Users can submit issue reports to the development team through the HPX-5 web site(https://gitlab.crest.iu.edu/extreme/hpx/issues). Future major releases of HPX-5 will be delivered semi-annually although bug fixes will be made available between major releases.

The HPX-5 source code is released under the BSD open-source license and is distributed with a complete set of tests along with selected sample applications. HPX-5 is funded and supported by the DoD, DoE, and NSF, and is used actively in projects such as PSAAP, XPRESS, XSEDE. Further information and downloads for HPX-5 can be found at http://hpx.crest.iu.edu.

Quick Start Instructions

If you plan to use HPX–5, we suggest to start with the latest released version (currently HPX–5 v2.0.0) which can be downloaded from https://hpx.crest.iu.edu/download. Follow the installation directions located under hpx/INSTALL

Links

Webpage: http://hpx.crest.iu.edu
HPX-5 Documentation (http://hpx.crest.iu.edu/documentation). User guide and developers guide are online, along with HPX-5 developers policy
HPX-5 tutorial is available at (http://hpx.crest.iu.edu/tutorials).
HPX-5 Source code (http://hpx.crest.iu.edu/git_repository) The HPX-5 source code is available as a git repository to be cloned.
Applications (http://hpx.crest.iu.edu/applications).
Frequently asked questions is available at (http://hpx.crest.iu.edu/faqs_and_tutorials).
Mailing lists are available here: http://hpx.crest.iu.edu/contact.

@@ Line 16: / Line 16: @@
 The HPX-5 interface and library implementation is guided by the ParalleX execution model and is freely available, open source, portable, and performance-oriented. HPX-5 is a general-purpose runtime system for applications, targeted at conventional, widely available architectures. It provides a unified programming model for parallel and distributed applications. As a dynamic adaptive runtime system, it is event-driven and embodies the principles of multi-threaded computing while also providing a global name and address space and advanced synchronization constructs. For communication between nodes, it uses the Photon network layer which has been tuned for optimal single sided communications as an alternative to the Message Passing Interface (MPI).
-[[ File:Architecture.png|300x300px|frame|HPX-5 Architecture ]]
+{{Infobox project
+| title = HPX-5 Architecture
+| image = [[File:Architecture.png|320x300px]]
+| website = http://hpx.crest.iu.edu
+| imagecaption =
+| download = http://hpx.crest.iu.edu/download
+| team-members = Thomas Sterling, Andrew Lumsdaine, Kelsey Shephard, Jayashree Ajay Candadai, Matt Anderson, Luke Dalessandro, Daniel Kogler, Abhishek Kulkarni
+| pi = Ron Brightwell, Sandia
+| co-pi = Andrew Lumsdaine
+}}
 The ParalleX execution model itself is experimental and continues to evolve as driven by quantitative insights gathered from the Starvation-Latency-Overhead-Waiting for contention (SLOW) performance model. It aims to address the aggravating effects of asynchrony for extreme-scale machines and to improve the efficiency and scalability of scaling constrained applications. ParalleX utilizes lightweight concurrent threads managed using synchronization primitives such as dataflow and futures in order to alter the application flow structure from being message-passing to becoming message-driven. It also includes an advanced global address space model instead of relying on more conventional distributed memory structures.