Actions

Correctness Tools

From Modelado Foundation

Revision as of 21:31, May 9, 2014 by imported>MarkusSchordan
QUESTIONS XPRESS TG X-Stack DEGAS D-TEC DynAX X-TUNE GVR CORVETTE SLEEC PIPER
What kind of runtime overhead can you accept when running your application with a dynamic analysis tools such as a data race detector? 1.1X, 01.5X, 2X, 5X, 10X, 100X. Runtime overhead determines task granularity and therefore available parallelism (strong scaling) determining time to solution. Only fractional overheads can be tolerated at runtime.

In debug mode 10X or even 100X for selected small input data sets. We consider running the app with dynamic analysis tools as part of testing. In release mode we do not expect to run with dynamic analysis tools.

This should really be a question for the DoE app writers. I imagine that if run in debug mode, performance degrations are acceptable up to a few X. But most likely, if a dynamic tool is modifying program's execution time significantly, it also probably changes the way the program is executed (dynamic task schedules, non-determinism) and hence may analyze irrelevant things.

N/A
What types of bugs do you want correctness tools to detect? Data races, deadlocks, non-determinism. Hardware errors, runtime errors, discrepancies with respect to expectations (e.g., task time to completion, satisfying a condition) are among classes of bugs that detection mechanisms would be address. Probably these have to be build into the runtime/compile-time system and be consistent with overall execution model.

It is important to distinguish when an error is detected, at compile time by static analyis, on-the-fly at run-time, requiring less storage space than post-mortem detection since much information can be discarded as execution progresses, or after execution by analyzing traces. Non-determinism can be the result of data races, hence, detecting data races is most important as it may contribute to identify sources of deadlocks and non-determinism. One might also combine the aforementioned different points in time of analysis and combine the results. We consider the different approaches to complement each other by having different trade offs in performance impact and memory consumption.

SWARM proposes mechanisms (such as tags) to help avoiding data races. However, SWARM is compatible with traditional synchronization and data access mechanisms, and hence the programmer can create data races.

Deadlocks can appear with any programming model based on task graphs, including the one supported by the SWARM runtime. They happen as soon as a cycle is created among tasks. ETI's tracer can help detect deadlocks by showing which tasks were running when the deadlock happened.

Non-determinism is often a feature. It may be desired in parts of the programs (for instance when running a parallel reduction), not in others. So a tool for detecting non-determinism per se wouldn't be sufficient, it would require an API to specify when it is unexpected.

N/A
What kind of correctness tools can help you with interactive debugging in a debugger? Do you want those tools to discover bugs for you? Beyond a certain level of scaling, debugging is almost indistinguishable from fault tolerance methods. Detection of errors (not found at compile time) requires equivalent mechanisms, diagnosis needs to add code as a possible source of error, but an additional kind of user interface is required, perhaps to provide a patch at runtime permitting continued execution from point of error. This is supported under DOE OS/R Hobbes project. A tool that can identify which assertions it cannot statically verify and that is applied in combination with a slicing tool that allows to compute a backward slice with a given set of concrete values of variables that are provided by the debugger.

Some bugs can be discovered automatically, such as deadlocks and livelocks. For the others, tools need to reduce the time required to find the source of the bug.

N/A
Current auto-tuning and performance/precision debugging tools treat the program as a black-box. What kind of white-box program analysis techniques can help in auto-tuning your application with floating-point precision and alternative algorithms? XPRESS does not address floating-point precision issues and will benefit from other programs in possible solutions. XPRESS does incorporate the APEX and RCR components for introspective performance optimization at runtime system controlling load balancing and task scheduling. It measures progress towards goals to adjust ordering, especially for critical path tasks. Precision analysis that can verify assertions at different points in a program that specify expected precision and ranges of values of variables, subsets, or all values of an array at a point in execution of a program.

Compilers can instrument the code to let auto-tuners focus on particular parts of the code and on particular execution characteristics (e.g., by selecting hardware counters).

Please define "alternative algorithms".

N/A
What kind of testing strategy do you normally apply while developing your software component? Do you write tests before or after the development process? What kind of coverage do you try to achieve? Do you write small unit tests or large systems tests to achieve coverage? Are your tests non-deterministic? Tests will be written after code and incorporated as part of Microcheckpointing Compute-Validate-Commit cycle for fault tolerance and debugging. Tests will be hierarchical for better but incomplete coverage. Phased checkpointing will provide very coarse-grained fall back in case of unrecoverable errors. Writing tests is part of the development process. System and unit tests are run as part of an incremental integration process. Any found bugs are destilled into test cases that become part of the integration tests.

Unit tests are written for each piece of software, and massive system tests are run every day. Tests support some amount of non-determinism, which result in bounded numerical variations in the results. Coverage of the test reflects expected use: often-used components get tested more intensively. System tests are often available before the development process, while unit tests are usually written during and after each code contribution.

tbd.
After a bug is discovered, do you want automated support to create a simplified test that will reproduce the bug? Yes. Yes. We consider SMT solvers in this respect most promising.

This would sound useful. The more simplified, the better.

N/A
What is your strategy to debug a DSL and its runtime? What kind of multi-level debugging support do you want for DSLs? XPRESS provides the HPX runtime that will incorporate its own set of test and correctness mechanisms under the Microcheckpointing methodology using the Compute-Validate-Commit cycle and built in testing. It is expected that DSL will support the generation of reverse test cases to support this. For DSLs that are translated and/or lowered to a general purpose language we expect assertions present in the DSL also to be generated in lower-level code, and allowing to relate any failing assertions to points in the original DSL. We expect DSLs to allow for domain-specific assertions in combination with user-provided domain-specific knowledge in form of properties that can then be checked in the generated lower-level code.

N/A

N/A
What kind of visualization and presentation support for bugs do you want from correctness tools? What kind of IDE integration do you want for the correctness tools? Visualization and presentation support to correlate a detected error in terms of physical location, point in code, virtual user thread instantiation, and exact instruction causing the error will be of great value, especially if combined with a control framework for manipulating the execution stream at the offending point for diagnosis and correction. We want a query language that allows to mine any collected data as well as its scalable visualization showing dependencies and attached properties. This could support debugging at multiple levels.

Any tool that allows big-picture and detailed views, as in ETI's tracer for SWARM. Tools that don't integrate with a particular IDE end up being compatible with all IDEs, including vi and emacs, which is desirable.

Integration with performance analysis tools (similar domains to represent, don't want to burden users with multiple different tools)
When combining various languages/runtimes, can your language/runtime be debugged in isolation from the rest of the system? Yes This depends if the DSL is integrated into a general purpose language or a separate language that interfaces with the language run-time of the host language. In the latter case 'yes', otherwise 'no'.

The entire execution stack is visible to the programmer at debug time, including the SWARM parts. We do not think that opacity across system parts would bring clarity in the debugging process.

N/A
List the testing and debugging challenges that you think the next generation programming languages and models would face? The principal challenges are detection and isolation. For the next generation of programming languages (and resiliency issues of next gen exascale hardware) we expect that avoiding errors by verifying properties at compile time will become increasingly important because debugging at run-time will become even more challenging due to resiliency complicating the debugging process in future.

We expect the main challenges to be related to tractability of debugging and testing applications that run on millions of cores, and reproducibility of non-deterministic applications on a non-reliable hardware platform.

N/A
How can correctness tools help with reasoning about energy? How can correctness tools help with resilience? With respect to energy, tools that can determine the critical path of execution would provide the basis for energy/power scheduling. As above for reliability, see 10) and other answers. Correctness tool can identify erroneous paths and can therefore help to identify and remove those paths which otherwise could have a severe impact on the power-reasoning. Any violation of a property at run-time that has been established by a correctness tool to hold for a given execution path (before at compile-time) can be identified to be the consequence of resilience and trigger appropriate recovery actions at run-time.

Correctness tools may be able to help the programmer make design choices which impact energy consumption, such as enabling non-determinism in parts of their application.

Lightweight tools that detect faults are presumably directly applicable to resiliency, for instance by coupling them with checkpointing.

Energy/power monitoring is a performance problem, in particular in power constrained scenarios