Communications: Difference between revisions
From Modelado Foundation
imported>Schulzm No edit summary |
imported>Rbbrigh No edit summary |
||
Line 16: | Line 16: | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
|What are the communication "primitives" that you expect to emphasize within your project? (e.g. two-sided vs one-sided, collectives, topologies, groups) Do we need to define extensions to the traditional application level interfaces which now emphasize only data transfers and collective operations? Do we need atomics, remote invocation interfaces, or should these be provided ad-hoc by clients? | |What are the communication "primitives" that you expect to emphasize within your project? (e.g. two-sided vs one-sided, collectives, topologies, groups) Do we need to define extensions to the traditional application level interfaces which now emphasize only data transfers and collective operations? Do we need atomics, remote invocation interfaces, or should these be provided ad-hoc by clients? | ||
|''The communication primitive is based on the “parcel” protocol that is an expanded form of active-message and that operates within a global address space distributed across “localities” (approx. nodes). Logical destinations are hierarchical global names, actions include instantiations of threads and ParalleX processes (spanning multiple localities), data movement, compound atomic operations, and OS calls. Continuations determine follow-on actions. Payload conveys data operands and block data for moves. Parcels are an integral component of the semantics of the ParalleX execution model providing symmetric semantics in the domain of asynchronous distributed processing to local synchronous processing on localities (nodes).'' | |||
|(TG) | |(TG) | ||
|(DEGAS) | |(DEGAS) | ||
Line 28: | Line 29: | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
|Traditional communication libraries (e.g. MPI or GASNet) have been developed without tight integration with the computation "model". What is your strategy for integrating communication and computation to address the needs of non SPMD execution? | |Traditional communication libraries (e.g. MPI or GASNet) have been developed without tight integration with the computation "model". What is your strategy for integrating communication and computation to address the needs of non SPMD execution? | ||
| | |''It is important for performance portability that most communication related codes comprise invariants that will always be true. Aggregation, routing, order sensitivity, time to arrival, and error management should be transparent to the user. Destinations, in most cases, should be relative to placement of first class objects for adaptive placement and routing. Scheduling should tolerate asynchrony uncertainty of message delivery without forfeit of performance assuming sufficient parallelism.'' | ||
|(TG) | |(TG) | ||
|(DEGAS) | |(DEGAS) | ||
Line 41: | Line 42: | ||
|What type of optimizations should be transparently provided by a communication layer and what should be delegated to compilers or application developers? | |What type of optimizations should be transparently provided by a communication layer and what should be delegated to compilers or application developers? | ||
What is the primary performance metric for your runtime? | What is the primary performance metric for your runtime? | ||
| | |''Time to solution of application workload, with minimum energy cost within that scope.'' | ||
|(TG) | |(TG) | ||
|(DEGAS) | |(DEGAS) | ||
Line 53: | Line 54: | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
|What is your strategy towards resilient communication libraries? | |What is your strategy towards resilient communication libraries? | ||
| | |''To first order runtime system assumes correct operation of communication libraries as being pursued by Portals-4 and the experimental Photon communication fabric. Under NNSA PSAAP-2 the Micro-checkpoint Compute-Validate-Commit cycle will detect errors including those due to communication failures.'' | ||
|(TG) | |(TG) | ||
|(DEGAS) | |(DEGAS) | ||
Line 65: | Line 66: | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
|What and how can a communication layer help in power and energy optimizations? | |What and how can a communication layer help in power and energy optimizations? | ||
| | |''Energy waste on unused channels needs to be prevented. Delays due to contention for hotspots need to be mitigated through dynamic routing. Information on message traffic, granularity, and power needs to be provided to OSR.'' | ||
|(TG) | |(TG) | ||
|(DEGAS) | |(DEGAS) | ||
Line 77: | Line 78: | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
|Congestion management and flow control mechanisms are of particular concern at very large scale. How much can we rely on "vendor" mechanisms and how much do we need to address in higher level layers? | |Congestion management and flow control mechanisms are of particular concern at very large scale. How much can we rely on "vendor" mechanisms and how much do we need to address in higher level layers? | ||
| | |''Vendor systems can help with redundant paths and dynamic routing. Runtime system data and task placement can attempt to maximize locality for reduced message traffic contention.'' | ||
|(TG) | |(TG) | ||
|(DEGAS) | |(DEGAS) |
Revision as of 20:15, May 7, 2014
QUESTIONS | XPRESS | TG X-Stack | DEGAS | D-TEC | DynAX | X-TUNE | GVR | CORVETTE | SLEEC | PIPER |
---|---|---|---|---|---|---|---|---|---|---|
PI | Ron Brightwell | Shekhar Borkar | Katherine Yelick | Daniel Quinlan | Guang Gao | Mary Hall | Andrew Chien | Koushik Sen | Milind Kulkarni | Martin Schulz |
What are the communication "primitives" that you expect to emphasize within your project? (e.g. two-sided vs one-sided, collectives, topologies, groups) Do we need to define extensions to the traditional application level interfaces which now emphasize only data transfers and collective operations? Do we need atomics, remote invocation interfaces, or should these be provided ad-hoc by clients? | The communication primitive is based on the “parcel” protocol that is an expanded form of active-message and that operates within a global address space distributed across “localities” (approx. nodes). Logical destinations are hierarchical global names, actions include instantiations of threads and ParalleX processes (spanning multiple localities), data movement, compound atomic operations, and OS calls. Continuations determine follow-on actions. Payload conveys data operands and block data for moves. Parcels are an integral component of the semantics of the ParalleX execution model providing symmetric semantics in the domain of asynchronous distributed processing to local synchronous processing on localities (nodes). | (TG) | (DEGAS) | (D-TEC) | (DynAX) | (X-TUNE) | (GVR) | (CORVETTE) | SLEEC | Communication will be out of band and need to be isolated, emphasis on streaming communication |
Traditional communication libraries (e.g. MPI or GASNet) have been developed without tight integration with the computation "model". What is your strategy for integrating communication and computation to address the needs of non SPMD execution? | It is important for performance portability that most communication related codes comprise invariants that will always be true. Aggregation, routing, order sensitivity, time to arrival, and error management should be transparent to the user. Destinations, in most cases, should be relative to placement of first class objects for adaptive placement and routing. Scheduling should tolerate asynchrony uncertainty of message delivery without forfeit of performance assuming sufficient parallelism. | (TG) | (DEGAS) | (D-TEC) | (DynAX) | (X-TUNE) | (GVR) | (CORVETTE) | N/A | N/A |
What type of optimizations should be transparently provided by a communication layer and what should be delegated to compilers or application developers?
What is the primary performance metric for your runtime? |
Time to solution of application workload, with minimum energy cost within that scope. | (TG) | (DEGAS) | (D-TEC) | (DynAX) | (X-TUNE) | (GVR) | (CORVETTE) | N/A | unclear |
What is your strategy towards resilient communication libraries? | To first order runtime system assumes correct operation of communication libraries as being pursued by Portals-4 and the experimental Photon communication fabric. Under NNSA PSAAP-2 the Micro-checkpoint Compute-Validate-Commit cycle will detect errors including those due to communication failures. | (TG) | (DEGAS) | (D-TEC) | (DynAX) | (X-TUNE) | (GVR) | (CORVETTE) | N/A | ability to drop and reroute around failed processes |
What and how can a communication layer help in power and energy optimizations? | Energy waste on unused channels needs to be prevented. Delays due to contention for hotspots need to be mitigated through dynamic routing. Information on message traffic, granularity, and power needs to be provided to OSR. | (TG) | (DEGAS) | (D-TEC) | (DynAX) | (X-TUNE) | (GVR) | (CORVETTE) | N/A | N/A |
Congestion management and flow control mechanisms are of particular concern at very large scale. How much can we rely on "vendor" mechanisms and how much do we need to address in higher level layers? | Vendor systems can help with redundant paths and dynamic routing. Runtime system data and task placement can attempt to maximize locality for reduced message traffic contention. | (TG) | (DEGAS) | (D-TEC) | (DynAX) | (X-TUNE) | (GVR) | (CORVETTE) | N/A | N/A |