Actions

Resilience Research Questions: Difference between revisions

From Modelado Foundation

imported>Jsstone1
(Created page with "Below are the questions addressed in the Resiliance Research Panel. Please add your comments (with you name) after each question. == Challenges == What resilience challenges f...")
 
imported>Jsstone1
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Below are the questions addressed in the Resiliance Research Panel. Please add your comments (with you name) after each question.
The main question is where does resilience fit into the X-stack runtime abstract architecture, with guiding questions being:
== Challenges ==
 
What resilience challenges for exascale systems are you aiming to address? (and any open challenges)
1) What features of other levels of the stack (algorithm, programming model, compiler, runtime, and hardware) should resilience depend on?
== Results and Capabilities ==
* Carbin: All of them. Uncertainty will be a first class concern in future systems.
What recent results and capabilities can you share?
 
== Technologies ==
2) How can resilience schemes best exploit application, runtime, or programming model semantics?
How will new resilience technologies capabilities be demonstrated?
* Carbin: Developers expose application-specific flexibility via programming model. Runtime has different capabilities. Resilience schemes coordinate.
== Convergence ==
 
How will new resilience technologies come together with other resilience technologies?  Other X-stack technologies?
3) What are the biggest missing pieces needed from the various layers to make resilience schemes succeed?
* Carbin: Coordinated understanding of uncertainty across stack (UQ?). We’ve explored uncertainty/approximation only in limited scopes.
 
4) What is the impact on resilience of the wide range of expected operating scenarios with respect to dynamically changing resources, application characteristics, and the wide range of possible error and failure rates?
* Not obvious.  State-of-the art: explore a variety reasoning approaches and mechanisms. Open problem: balancing complexity and benefit
 
Please add your comments (with you name) after each question.
 
Presentations:
* [[media:Carbin-resilience-panel.pdf|Michael Carbin]]

Latest revision as of 23:27, May 29, 2014

The main question is where does resilience fit into the X-stack runtime abstract architecture, with guiding questions being:

1) What features of other levels of the stack (algorithm, programming model, compiler, runtime, and hardware) should resilience depend on?

  • Carbin: All of them. Uncertainty will be a first class concern in future systems.

2) How can resilience schemes best exploit application, runtime, or programming model semantics?

  • Carbin: Developers expose application-specific flexibility via programming model. Runtime has different capabilities. Resilience schemes coordinate.

3) What are the biggest missing pieces needed from the various layers to make resilience schemes succeed?

  • Carbin: Coordinated understanding of uncertainty across stack (UQ?). We’ve explored uncertainty/approximation only in limited scopes.

4) What is the impact on resilience of the wide range of expected operating scenarios with respect to dynamically changing resources, application characteristics, and the wide range of possible error and failure rates?

  • Not obvious. State-of-the art: explore a variety reasoning approaches and mechanisms. Open problem: balancing complexity and benefit

Please add your comments (with you name) after each question.

Presentations: