Actions

HPX-5: Difference between revisions

From Modelado Foundation

imported>Jayaajay
(Initial draft of HPX-5 details)
 
imported>Jayaajay
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 2: Line 2:
== Announcement ==
== Announcement ==


HPX-5 v1.0.0 released! Links
'''''Indiana University announces HPX-5 version 4.0 runtime system software!'''''


== Introduction ==
The Center for Research in Extreme Scale Technologies (CREST) at Indiana University is pleased to announce the release of version 4.0 of HPX-5, a state-of-the-art runtime system for extreme-scale computing. Version 4.0 of the HPX-5 runtime system represents a significant maturation of the sequence of HPX-5 releases to date for efficient scalable general purpose high performance computing. It incorporates new optimization for performance, features associated with the ParalleX execution model, and programmer services including C++ bindings and collectives.


HPX-5 v1.0.0 (High Performance ParalleX) is the latest version of the HPX-5 runtime system that provides a unified programming model for parallel and distributed applications, allowing programs to run unmodified on systems from a single SMP to large clusters and supercomputers with thousands of nodes. HPX-5 is a realization of ParalleX, an abstract cross-cutting exascale execution model, which establishes roles and responsibilities between system layers.
HPX-5 is a realization of the ParalleX execution model, which establishes the runtime's roles and responsibilities with respect to other interoperating system layers, and explicitly includes a performance model that provides an analytic framework for performance and optimization. As an Asynchronous Multi-Tasking (AMT) software system, HPX-5 is event-driven, enabling the migration of continuations and the movement of work to data, when appropriate, based on sophisticated local control synchronization objects (e.g., futures, dataflow) and active messages. ParalleX compute complexes, embodied as lightweight, first-class threads, can block, perform global mutable side-effects, employ non-strict firing rules, and serve as continuations. HPX-5 employs an active global address space in which virtually addressed objects can migrate across the physical system without changing address. First-class named processes can span and share nodes.


The HPX-5 interface and library implementation is guided by the ParalleX execution model and is freely available, open source, portable, and performance-oriented. HPX-5 is a general-purpose runtime system for applications, targeted at conventional, widely available architectures. It provides a unified programming model for parallel and distributed applications. HPX-5 includes support for Linux on x86_64 and ARM architectures along with experimental support for Mac OS X. As a dynamic adaptive runtime system, it is event-driven and embodies the principles of multi-threaded computing while also providing a global name and address space and advanced synchronization constructs. For communication between nodes, it uses the Photon network layer which has been tuned for optimal single sided communications as an alternative to the Message Passing Interface (MPI).
HPX-5 is an evolving runtime system used both to enable dynamic adaptive parallel applications and to conduct path-finding experimentation to quantify effects of latency, overhead, contention, and parallelism of its integral mechanisms. These performance parameters determine a trade-off space within which dynamic control is performed for best performance. It is an area of active research driven by complex applications and advances in HPC architecture. HPX-5 employs dynamic and adaptive resource management and task scheduling to achieve the significant improvements in efficiency and scalability necessary to deploy many classes of parallel applications on the largest (current and future) supercomputers in the nation and world. Although still under development, HPX-5 is portable to a diverse set of systems, is reliable and programmable, scales across multi-core and multi-node systems, and delivers efficiency improvements for irregular, time-varying problems.  


The ParalleX execution model itself is experimental and continues to evolve as driven by quantitative insights gathered from the Starvation-Latency-Overhead-Waiting for contention (SLOW) performance model. It aims to address the aggravating effects of asynchrony for extreme-scale machines and to improve the efficiency and scalability of scaling constrained applications. ParalleX utilizes lightweight concurrent threads managed using synchronization primitives such as dataflow and futures in order to alter the application flow structure from being message-passing to becoming message-driven. It also includes an advanced global address space model instead of relying on more conventional distributed memory structures.  
HPX-5 is written primarily in portable C99 and is released under an open source BSD license. Future major releases will be delivered semi-annually, and correctness and performance bug fixes will be made available as required. To support active engagement with the larger developer community, active development branches are available. HPX-5 will also be disseminated through the OpenHPC consortium led by the Linux Foundation.
 
{{Infobox project
| title = HPX-5 Architecture
| image = [[File:Architecture.png|320x300px]]
| website = http://hpx.crest.iu.edu
| imagecaption =
| download = http://hpx.crest.iu.edu/download
| team-members = Thomas Sterling, Andrew Lumsdaine, Kelsey Shephard, Jayashree Candadai, Matt Anderson, Luke Dalessandro, Daniel Kogler, Abhishek Kulkarni
| pi = Ron Brightwell, Sandia
| co-pi = Andrew Lumsdaine
}}


== Audience ==
== Audience ==
Line 18: Line 29:
== Features in HPX-5 ==
== Features in HPX-5 ==


The HPX-5 C library implementation, libhpx, provides a high-performance, portable implementation of the API that runs on both SMP and distributed systems. It is designed around a cooperative lightweight thread scheduler and unified access to a global address space. Event driven programs invoke remote actions on global addresses using HPX-5’s parcel active-message plus continuation abstraction, or read or write global data directly through one-sided network operations. Globally addressable lightweight control objects (LCOs) provide control and data synchronization, allowing thread execution or parcel instantiation to wait for events without execution resource consumption. Finally, HPX-5’s implementation of ParalleX processes provides programmers the powerful abstraction of termination groups for parcel and thread execution. In addition to this core-programming model HPX-5 provides a number of higher level abstractions including asynchronous remote-procedure-call options, data parallel loop constructs, global memory allocation, and system abstractions like timers.  
*  Fine grained execution through blockable lightweight threads and unified access to a global address space.
 
*  High-performance PGAS implementation which supports low-level one-sided operations and two-sided active messages with continuations, and an experimental AGAS option with active load balancing that allows the binding of global to physical addresses to vary dynamically.
'''''Global Address Space:''''' The global address space is modeled as a linear byte addressable virtual space, where the binding between global addresses and local virtual addresses may change at runtime. Global addresses serve as the targets of active message parcels, as well as supporting put/get operations directly. HPX-5 ships with two implementations of the global address space. A traditional, high-performance partitioned global address space (PGAS) allows low-latency, one-sided put/get operations in addition to parcel access but does not support remapping, while an experimental active global address space emulates put/get with parcels while allowing the binding of global to physical addresses to vary dynamically.
*  Makes concurrency manageable with globally allocated lightweight control objects (LCOs) based synchronization (futures, gates, reductions, dataflow) allowing thread execution or parcel instantiation to wait for events without execution resource consumption.
 
*  Higher level abstractions including asynchronous remote-procedure-call options, data parallel loop constructs, and system abstractions like timers.
Memory management within the global address space leverages features of the high-performance jemalloc allocator, to provide scalable local allocation of globally addressable data, as well as cyclic distributed arrays.
*  Implementation of ParalleX processes providing programmers with termination detection and per-process collectives.
Lightweight Thread Scheduling: The HPX-5 lightweight-threading system enables massively threaded computation by minimizing the cost of thread creation and context switching. HPX-5 threads have unrestricted behavior and can wait for asynchronous events using lightweight control objects. Local load balancing is performed via workstealing, while distributed load balancing is managed by remapping in the active global address space. Threading overheads rival existing modern packages with compatible semantics.
*  Photon networking library synthesizing RDMA-with-remote-completion directly on top of uGNI, IB verbs, or libfabric. For portability and legacy support, HPX-5 emulates RDMA-with-remote-completion using MPI point-to-point messaging.
 
*  Programmer services including C++ bindings and collectives (prototype non-blocking network collectives for hierarchical process collective operation).
'''''Parcels:''''' HPX-5 parcels encode event-driven execution by extending a traditional active message abstraction with a continuation, enabling complex chains of computation in addition to traditional remote procedure calls. Parcels are sent to addresses within the global address space and embody two-sided network programming. Parcel instantiation is isomorphic with lightweight thread creation, in fact the parcel structure and thread descriptor are the same structure.
*  Leverages distributed GPU and co-processors (Intel Xeon Phi) through experimental OpenCL support.
 
*  PAPI support for profiling.
'''''Local Control Objects (LCOs):''''' LCOs represent control state in memory. Both threads and parcels interact with LCOs, threads may block until an LCO is ready and/or its data is available, while parcel send operations may wait until an LCO is ready. HPX-5 is distributed with a number of built-in LCO types, including futures, and gates, generation counters, parameterizable commutative-associative reductions, semaphores, etc., and supports user-defined LCOs as well.
*  Integration with APEX policy engine (Autonomic Performance Environment for eXascale) support for runtime adaption, RCR and LXK OS.
Processes: HPX-5 groups parcels and threads into processes, which may optionally have termination detection enabled. Termination detection triggers an event when the processed has reached quiescence. This powerful mechanism can be useful in algorithms that do not contain a natural "join" point.
*  Migration of legacy applications through easy interoperability and co-existence with traditional runtimes like MPI. HPX-5 4.0 is also released along with several applications: LULESH, Wavelet AMR, HPCG, CoMD and the ParalleX Graph Library.
 
'''''Networking:''''' In addition to active message parcel operations, HPX-5 provides direct access to the global address space through asynchronous put/get operations. Generic support for both parcel transport and put/get operations is designed around a novel networking abstraction, put-with-remote-notification. HPX-5 provides two implementations of this abstraction. The PWC network uses Photon’s put-with-remote-completion support directly and implements a parcel emulation layer for parcel transport. The ISIR network provides parcel transport on top of the traditional MPI Isend/Irecv point-to-point messaging, and emulates put-with-remote-notification with parcels.
 
'''''Convenience Features:''''' Development of HPX-5 was guided by codesign with computational scientists. This resulted in a number of convenience features that encapsulate lower level behavior in more convenient and familiar form. These include a suite of remote-procedure style calls, data parallel loops for local and distributed data, and numerous smaller features.


== Timeline ==
== Timeline ==
Line 39: Line 46:


*  v1.0.0 released on 3rd May 2015.
*  v1.0.0 released on 3rd May 2015.
*  v2.0.0 is scheduled to be released at SC 2015.
*  v2.0.0 released on 17th November 2015.
*  v3.0.0 released on 5th May 2016.
*  v4.0.0 released on 11th November 2016.
*  v5.0.0 is scheduled to be released in May 2017.


HPX-5 is developed through an agile process that includes continuous integration, regular point releases, and frequent regression tests for correctness and performance.  Users can submit issue reports to the development team through the HPX-5 web site.  Future major releases of HPX-5 will be delivered semi-annually although correctness and performance bug fixes will be made available between major releases. Users may download official releases as well as the latest development branch of the code.
HPX-5 is developed with an agile process that includes continuous integration, regular point releases, and frequent regression tests for correctness and performance.  Users can submit issue reports to the development team through the HPX-5 web site(https://gitlab.crest.iu.edu/extreme/hpx/issues).  Future major releases of HPX-5 will be delivered semi-annually although bug fixes will be made available between major releases.
 
The HPX-5 source code is released under the BSD open-source license and is distributed with a complete set of tests along with selected sample applications. HPX-5 is funded and supported by the DoD, DoE, and NSF, and is used actively in projects such as PSAAP, XPRESS, XSEDE. Further information and downloads for HPX-5 can be found at http://hpx.crest.iu.edu.


== Quick Start Instructions ==
== Quick Start Instructions ==


Git Clone
If you plan to use HPX–5, we suggest to start with the latest released version (currently HPX–5 v4.0.0) which can be downloaded from https://hpx.crest.iu.edu/download.
 
Clone the main HPX-5 Git repo at Gitlab. The main development work occurs on the “develop” branch in this repo. You'll easily be able to keep up with the latest source code using normal Git commands:
First, you will need a Git client. We recommend getting the latest version available. If you do not have the command “git” in your path, you will likely need to download and install Git.
 
<code bash>
shell$ git clone https://gitlab.crest.iu.edu/extreme/hpx.git
</code>
 
Follow the installation directions located under hpx/INSTALL
Follow the installation directions located under hpx/INSTALL



Latest revision as of 19:42, November 28, 2016

Announcement

Indiana University announces HPX-5 version 4.0 runtime system software!

The Center for Research in Extreme Scale Technologies (CREST) at Indiana University is pleased to announce the release of version 4.0 of HPX-5, a state-of-the-art runtime system for extreme-scale computing. Version 4.0 of the HPX-5 runtime system represents a significant maturation of the sequence of HPX-5 releases to date for efficient scalable general purpose high performance computing. It incorporates new optimization for performance, features associated with the ParalleX execution model, and programmer services including C++ bindings and collectives.

HPX-5 is a realization of the ParalleX execution model, which establishes the runtime's roles and responsibilities with respect to other interoperating system layers, and explicitly includes a performance model that provides an analytic framework for performance and optimization. As an Asynchronous Multi-Tasking (AMT) software system, HPX-5 is event-driven, enabling the migration of continuations and the movement of work to data, when appropriate, based on sophisticated local control synchronization objects (e.g., futures, dataflow) and active messages. ParalleX compute complexes, embodied as lightweight, first-class threads, can block, perform global mutable side-effects, employ non-strict firing rules, and serve as continuations. HPX-5 employs an active global address space in which virtually addressed objects can migrate across the physical system without changing address. First-class named processes can span and share nodes.

HPX-5 is an evolving runtime system used both to enable dynamic adaptive parallel applications and to conduct path-finding experimentation to quantify effects of latency, overhead, contention, and parallelism of its integral mechanisms. These performance parameters determine a trade-off space within which dynamic control is performed for best performance. It is an area of active research driven by complex applications and advances in HPC architecture. HPX-5 employs dynamic and adaptive resource management and task scheduling to achieve the significant improvements in efficiency and scalability necessary to deploy many classes of parallel applications on the largest (current and future) supercomputers in the nation and world. Although still under development, HPX-5 is portable to a diverse set of systems, is reliable and programmable, scales across multi-core and multi-node systems, and delivers efficiency improvements for irregular, time-varying problems.

HPX-5 is written primarily in portable C99 and is released under an open source BSD license. Future major releases will be delivered semi-annually, and correctness and performance bug fixes will be made available as required. To support active engagement with the larger developer community, active development branches are available. HPX-5 will also be disseminated through the OpenHPC consortium led by the Linux Foundation.

HPX-5 Architecture
Architecture.png
Team Members Thomas Sterling, Andrew Lumsdaine, Kelsey Shephard, Jayashree Candadai, Matt Anderson, Luke Dalessandro, Daniel Kogler, Abhishek Kulkarni
PI Ron Brightwell, Sandia
Co-PIs Andrew Lumsdaine
Website http://hpx.crest.iu.edu
Download http://hpx.crest.iu.edu/download

Audience

HPX-5 is used for a broad range of scientific applications, helping scientists and developers write code that shows better performance on irregular applications and at scale when compared to more conventional programming models such as MPI. For the application developer, it provides dynamic adaptive resource management and task scheduling to reach otherwise unachievable efficiencies in time and energy and scalability. HPX-5 supports such applications with implementation of features like Active Global Address Space (AGAS), ParalleX Processes, Complexes (ParalleX Threads and Thread Management), Parcel Transport and Parcel Management, Local Control Objects (LCOs) and Localities. Fine-grained computation is expressed using actions. Computation is logically grouped into processes to provide quiescence and termination detection. LCOs are synchronization objects that manage local and distributed control flow and have a global address. The heart of HPX-5 is a lightweight thread scheduler that directly schedules lightweight actions by multiplexing them on a set of heavyweight scheduler threads.

Features in HPX-5

  • Fine grained execution through blockable lightweight threads and unified access to a global address space.
  • High-performance PGAS implementation which supports low-level one-sided operations and two-sided active messages with continuations, and an experimental AGAS option with active load balancing that allows the binding of global to physical addresses to vary dynamically.
  • Makes concurrency manageable with globally allocated lightweight control objects (LCOs) based synchronization (futures, gates, reductions, dataflow) allowing thread execution or parcel instantiation to wait for events without execution resource consumption.
  • Higher level abstractions including asynchronous remote-procedure-call options, data parallel loop constructs, and system abstractions like timers.
  • Implementation of ParalleX processes providing programmers with termination detection and per-process collectives.
  • Photon networking library synthesizing RDMA-with-remote-completion directly on top of uGNI, IB verbs, or libfabric. For portability and legacy support, HPX-5 emulates RDMA-with-remote-completion using MPI point-to-point messaging.
  • Programmer services including C++ bindings and collectives (prototype non-blocking network collectives for hierarchical process collective operation).
  • Leverages distributed GPU and co-processors (Intel Xeon Phi) through experimental OpenCL support.
  • PAPI support for profiling.
  • Integration with APEX policy engine (Autonomic Performance Environment for eXascale) support for runtime adaption, RCR and LXK OS.
  • Migration of legacy applications through easy interoperability and co-existence with traditional runtimes like MPI. HPX-5 4.0 is also released along with several applications: LULESH, Wavelet AMR, HPCG, CoMD and the ParalleX Graph Library.

Timeline

The HPX-5 source code is distributed with a liberal open-source license.

  • v1.0.0 released on 3rd May 2015.
  • v2.0.0 released on 17th November 2015.
  • v3.0.0 released on 5th May 2016.
  • v4.0.0 released on 11th November 2016.
  • v5.0.0 is scheduled to be released in May 2017.

HPX-5 is developed with an agile process that includes continuous integration, regular point releases, and frequent regression tests for correctness and performance. Users can submit issue reports to the development team through the HPX-5 web site(https://gitlab.crest.iu.edu/extreme/hpx/issues). Future major releases of HPX-5 will be delivered semi-annually although bug fixes will be made available between major releases.

The HPX-5 source code is released under the BSD open-source license and is distributed with a complete set of tests along with selected sample applications. HPX-5 is funded and supported by the DoD, DoE, and NSF, and is used actively in projects such as PSAAP, XPRESS, XSEDE. Further information and downloads for HPX-5 can be found at http://hpx.crest.iu.edu.

Quick Start Instructions

If you plan to use HPX–5, we suggest to start with the latest released version (currently HPX–5 v4.0.0) which can be downloaded from https://hpx.crest.iu.edu/download. Follow the installation directions located under hpx/INSTALL

Links