Actions

Correctness Tools: Difference between revisions

From Modelado Foundation

imported>Schulzm
No edit summary
imported>Rbbrigh
No edit summary
Line 14: Line 14:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kind of runtime overhead can you accept when running your application with a dynamic analysis tools such as a data race detector? 1.1X, 01.5X, 2X, 5X, 10X, 100X.
| What kind of runtime overhead can you accept when running your application with a dynamic analysis tools such as a data race detector? 1.1X, 01.5X, 2X, 5X, 10X, 100X.
|
|''Runtime overhead determines task granularity and therefore available parallelism (strong scaling) determining time to solution. Only fractional overheads can be tolerated at runtime.''
|
|
|
|
Line 28: Line 28:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
|  What types of bugs do you want correctness tools to detect?  Data races, deadlocks, non-determinism.
|  What types of bugs do you want correctness tools to detect?  Data races, deadlocks, non-determinism.
|
|''Hardware errors, runtime errors, discrepancies with respect to expectations (e.g., task time to completion, satisfying a condition) are among classes of bugs that detection mechanisms would be address. Probably these have to be build into the runtime/compile-time system and be consistent with overall execution model.''
|
|
|
|
Line 47: Line 47:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kind of correctness tools can help you with interactive debugging in a debugger?  Do you want those tools to discover bugs for you?
| What kind of correctness tools can help you with interactive debugging in a debugger?  Do you want those tools to discover bugs for you?
|
|''Beyond a certain level of scaling, debugging is almost indistinguishable from fault tolerance methods. Detection of errors (not found at compile time) requires equivalent mechanisms, diagnosis needs to add code as a possible source of error, but an additional kind of user interface is required, perhaps to provide a patch at runtime permitting continued execution from point of error. This is supported under DOE OS/R Hobbes project.''
|
|
|
|
Line 62: Line 62:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| Current auto-tuning and performance/precision debugging tools treat the program as a black-box.  What kind of white-box program analysis techniques can help in auto-tuning your application with floating-point precision and alternative algorithms?
| Current auto-tuning and performance/precision debugging tools treat the program as a black-box.  What kind of white-box program analysis techniques can help in auto-tuning your application with floating-point precision and alternative algorithms?
|
|''XPRESS does not address floating-point precision issues and will benefit from other programs in possible solutions. XPRESS does incorporate the APEX and RCR components for introspective performance optimization at runtime system controlling load balancing and task scheduling. It measures progress towards goals to adjust ordering, especially for critical path tasks.''
|
|
|
|
Line 77: Line 77:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kind of testing strategy do you normally apply while developing your software component?  Do you write tests before or after the development process?  What kind of coverage do you try to achieve?  Do you write small unit tests or large systems tests to achieve coverage?  Are your tests non-deterministic?
| What kind of testing strategy do you normally apply while developing your software component?  Do you write tests before or after the development process?  What kind of coverage do you try to achieve?  Do you write small unit tests or large systems tests to achieve coverage?  Are your tests non-deterministic?
|
|''Tests will be written after code and incorporated as part of Microcheckpointing Compute-Validate-Commit cycle for fault tolerance and debugging. Tests will be hierarchical for better but incomplete coverage. Phased checkpointing will provide very coarse-grained fall back in case of unrecoverable errors.''
|
|
|
|
Line 92: Line 92:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
|  After a bug is discovered, do you want automated support to create a simplified test that will reproduce the bug?
|  After a bug is discovered, do you want automated support to create a simplified test that will reproduce the bug?
|
|''Yes.''
|
|
|
|
Line 106: Line 106:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What is your strategy to debug a DSL and its runtime?  What kind of multi-level debugging support do you want for DSLs?
| What is your strategy to debug a DSL and its runtime?  What kind of multi-level debugging support do you want for DSLs?
|
|''XPRESS provides the HPX runtime that will incorporate its own set of test and correctness mechanisms under the Microcheckpointing methodology using the Compute-Validate-Commit cycle and built in testing. It is expected that DSL will support the generation of reverse test cases to support this.''
|
|
|
|
Line 120: Line 120:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kind of visualization and presentation support for bugs do you want from correctness tools?  What kind of IDE integration do you want for the correctness tools?
| What kind of visualization and presentation support for bugs do you want from correctness tools?  What kind of IDE integration do you want for the correctness tools?
|
|''Visualization and presentation support to correlate a detected error in terms of physical location, point in code, virtual user thread instantiation, and exact instruction causing the error will be of great value, especially if combined with a control framework for manipulating the execution stream at the offending point for diagnosis and correction.''
|
|
|
|
Line 135: Line 135:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| When combining various languages/runtimes, can your language/runtime be debugged in isolation from the rest of the system?
| When combining various languages/runtimes, can your language/runtime be debugged in isolation from the rest of the system?
|
|''Yes''
|
|
|
|
Line 150: Line 150:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| List the testing and debugging challenges that you think the next generation programming languages and models would face?
| List the testing and debugging challenges that you think the next generation programming languages and models would face?
|
|''The principal challenges are detection and isolation.''
|
|
|
|
Line 164: Line 164:
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| How can correctness tools help with reasoning about energy?  How can correctness tools help with resilience?
| How can correctness tools help with reasoning about energy?  How can correctness tools help with resilience?
|
|''With respect to energy, tools that can determine the critical path of execution would provide the basis for energy/power scheduling. As above for reliability, see 10) and other answers.''
|
|
|
|

Revision as of 20:19, May 7, 2014

QUESTIONS XPRESS TG X-Stack DEGAS D-TEC DynAX X-TUNE GVR CORVETTE SLEEC PIPER
What kind of runtime overhead can you accept when running your application with a dynamic analysis tools such as a data race detector? 1.1X, 01.5X, 2X, 5X, 10X, 100X. Runtime overhead determines task granularity and therefore available parallelism (strong scaling) determining time to solution. Only fractional overheads can be tolerated at runtime.

This should really be a question for the DoE app writers. I imagine that if run in debug mode, performance degrations are acceptable up to a few X. But most likely, if a dynamic tool is modifying program's execution time significantly, it also probably changes the way the program is executed (dynamic task schedules, non-determinism) and hence may analyze irrelevant things.

N/A
What types of bugs do you want correctness tools to detect? Data races, deadlocks, non-determinism. Hardware errors, runtime errors, discrepancies with respect to expectations (e.g., task time to completion, satisfying a condition) are among classes of bugs that detection mechanisms would be address. Probably these have to be build into the runtime/compile-time system and be consistent with overall execution model.

SWARM proposes mechanisms (such as tags) to help avoiding data races. However, SWARM is compatible with traditional synchronization and data access mechanisms, and hence the programmer can create data races.

Deadlocks can appear with any programming model based on task graphs, including the one supported by the SWARM runtime. They happen as soon as a cycle is created among tasks. ETI's tracer can help detect deadlocks by showing which tasks were running when the deadlock happened.

Non-determinism is often a feature. It may be desired in parts of the programs (for instance when running a parallel reduction), not in others. So a tool for detecting non-determinism per se wouldn't be sufficient, it would require an API to specify when it is unexpected.

N/A
What kind of correctness tools can help you with interactive debugging in a debugger? Do you want those tools to discover bugs for you? Beyond a certain level of scaling, debugging is almost indistinguishable from fault tolerance methods. Detection of errors (not found at compile time) requires equivalent mechanisms, diagnosis needs to add code as a possible source of error, but an additional kind of user interface is required, perhaps to provide a patch at runtime permitting continued execution from point of error. This is supported under DOE OS/R Hobbes project.

Some bugs can be discovered automatically, such as deadlocks and livelocks. For the others, tools need to reduce the time required to find the source of the bug.

N/A
Current auto-tuning and performance/precision debugging tools treat the program as a black-box. What kind of white-box program analysis techniques can help in auto-tuning your application with floating-point precision and alternative algorithms? XPRESS does not address floating-point precision issues and will benefit from other programs in possible solutions. XPRESS does incorporate the APEX and RCR components for introspective performance optimization at runtime system controlling load balancing and task scheduling. It measures progress towards goals to adjust ordering, especially for critical path tasks.

Compilers can instrument the code to let auto-tuners focus on particular parts of the code and on particular execution characteristics (e.g., by selecting hardware counters).

Please define "alternative algorithms".

N/A
What kind of testing strategy do you normally apply while developing your software component? Do you write tests before or after the development process? What kind of coverage do you try to achieve? Do you write small unit tests or large systems tests to achieve coverage? Are your tests non-deterministic? Tests will be written after code and incorporated as part of Microcheckpointing Compute-Validate-Commit cycle for fault tolerance and debugging. Tests will be hierarchical for better but incomplete coverage. Phased checkpointing will provide very coarse-grained fall back in case of unrecoverable errors.

Unit tests are written for each piece of software, and massive system tests are run every day. Tests support some amount of non-determinism, which result in bounded numerical variations in the results. Coverage of the test reflects expected use: often-used components get tested more intensively. System tests are often available before the development process, while unit tests are usually written during and after each code contribution.

tbd.
After a bug is discovered, do you want automated support to create a simplified test that will reproduce the bug? Yes.

This would sound useful. The more simplified, the better.

N/A
What is your strategy to debug a DSL and its runtime? What kind of multi-level debugging support do you want for DSLs? XPRESS provides the HPX runtime that will incorporate its own set of test and correctness mechanisms under the Microcheckpointing methodology using the Compute-Validate-Commit cycle and built in testing. It is expected that DSL will support the generation of reverse test cases to support this.

N/A

N/A
What kind of visualization and presentation support for bugs do you want from correctness tools? What kind of IDE integration do you want for the correctness tools? Visualization and presentation support to correlate a detected error in terms of physical location, point in code, virtual user thread instantiation, and exact instruction causing the error will be of great value, especially if combined with a control framework for manipulating the execution stream at the offending point for diagnosis and correction.

Any tool that allows big-picture and detailed views, as in ETI's tracer for SWARM. Tools that don't integrate with a particular IDE end up being compatible with all IDEs, including vi and emacs, which is desirable.

Integration with performance analysis tools (similar domains to represent, don't want to burden users with multiple different tools)
When combining various languages/runtimes, can your language/runtime be debugged in isolation from the rest of the system? Yes

The entire execution stack is visible to the programmer at debug time, including the SWARM parts. We do not think that opacity across system parts would bring clarity in the debugging process.

N/A
List the testing and debugging challenges that you think the next generation programming languages and models would face? The principal challenges are detection and isolation.

We expect the main challenges to be related to tractability of debugging and testing applications that run on millions of cores, and reproducibility of non-deterministic applications on a non-reliable hardware platform.

N/A
How can correctness tools help with reasoning about energy? How can correctness tools help with resilience? With respect to energy, tools that can determine the critical path of execution would provide the basis for energy/power scheduling. As above for reliability, see 10) and other answers.

Correctness tools may be able to help the programmer make design choices which impact energy consumption, such as enabling non-determinism in parts of their application.

Lightweight tools that detect faults are presumably directly applicable to resiliency, for instance by coupling them with checkpointing.

Energy/power monitoring is a performance problem, in particular in power constrained scenarios