CORVETTE: Difference between revisions
From Modelado Foundation
imported>Cdenny (Created page with "{{Infobox project | title = CORVETTE | image = 180px | imagecaption = | team-members = List of team members | pi = Lead PI (Institute) | co-pi = C...") |
imported>Cdenny No edit summary |
||
Line 1: | Line 1: | ||
{{Infobox project | {{Infobox project | ||
| title = CORVETTE | | title = CORVETTE | ||
| image = [[File: | | image = [[File:CORVETTE-Logos.png|300px]] | ||
| imagecaption = | | imagecaption = | ||
| team-members = | | team-members = [http://www.berkeley.edu/ UC Berkeley], [http://www.lbl.gov/ LBNL] | ||
| pi = | | pi = Koushik Sen (UC Berkeley) | ||
| co-pi = | | co-pi = James W. Demmel (UC Berkeley), Costin Iancu (LBNL) | ||
| website = team website | | website = team website | ||
}} | }} | ||
'' | '''Program ''Cor''rectness, ''Ve''rification, and ''T''es''t''ing for ''E''xascale''' or '''CORVETTE''' | ||
== Team Members == | == Team Members == | ||
* [http://www.berkeley.edu/ University of California, Berkeley (UC Berkeley)] | |||
* [http://www.lbl.gov/ Lawrence Berkeley National Laboratory (LBNL)] | |||
== Motivation == | |||
*High performance scientific computing | |||
** Exascale: O(106) nodes, O(103) cores per node | |||
** Requires asynchrony and “relaxed” memory consistency | |||
** Shared memory with dynamic task parallelism | |||
** Languages allow remote memory modification | |||
* Correctness challenges | |||
** Non-deterministic causes hard to diagnose correctness and performance bugs | |||
*** Data races, atomicity violations, deadlocks … | |||
** Bugs in DSL | |||
** Scientific applications use floating-points: non-determinism leads to non-reproducible results | |||
** Numerical exceptions can cause rare but critical bugs that are hard for non-experts to detect and fix | |||
== Goals == | |||
Develop correctness tools for different programming models: PGAS, MPI, dynamic parallelism | |||
I. Testing and Verification | |||
* Identify sources of non-determinism in executions | |||
* Data races, atomicity violations, non‐reproducible floating point results | |||
* Explore state-of-the-art techniques that use dynamic analysis | |||
* Develop precise and scalable tools: < 2X overhead | |||
II. Debugging | |||
* Use minimal amount of concurrency to reproduce bug | |||
* Support two-level debugging of high-level abstractions | |||
* Detect causes of floating-point anomalies and determine the minimum precision needed to fix them | |||
== Testing and Verification Tools == | |||
=== Scalable Testing of Parallel Programs === | |||
* Concurrent Programming is hard | |||
** Bugs happen non‐deterministically | |||
** Data races, deadlocks, atomicity violations, etc. | |||
* Goals: build a tool to test and debug concurrent and parallel programs | |||
** Efficient: reduce overhead from 10x‐100x to 2x | |||
** Precise | |||
** Reproducible | |||
** Scalable | |||
* Active random testing | |||
=== Active Testing === | |||
* Phase 1: Static or dynamic analysis to find potential concurrency bug patterns, such as data races, deadlocks, atomicity violations | |||
** Data races: Eraser or lockset based [PLDI’08] | |||
** Atomicity violations: cycle in transactions and happens‐before relation [FSE’08] | |||
** Deadlocks: cycle in resource acquisition graph [PLDI’09] | |||
** Publicly available tool for Java/Pthreads/UPC [CAV’09] | |||
** Memory model bugs: cycle in happens‐before graph [ISSTA’11] | |||
** For UPC programs running on thousands of cores [SC’11] | |||
* Phase 2: “Direct” testing (or model checking) based on the bug patterns obtained from phase 1 | |||
** Confirm bugs | |||
'''Goals''' | |||
* Goal 1. Nice to have a trace exhibiting the data race | |||
* Goal 2. Nice to have a trace exhibiting the assertion failure | |||
* Goal 3. Nice to have a trace with fewer threads | |||
* Goal 4. Nice to have a trace with fewer context switches | |||
=== Challenges for Exascale === | |||
* Java and pthreads programs | |||
** Synchronization with locks and condition variables | |||
** Single node | |||
* Exascale has different programming models | |||
** Large scale | |||
** Bulk communication | |||
** Collective operations with data movement | |||
** Memory consistency | |||
** Distributed shared memory | |||
* Cannot use centralized dynamic analyses | |||
* Cannot instrument and track every statement | |||
'''Further Challenges''' | |||
* Targeted a simple programming paradigm | |||
** Threads and shared memory | |||
* Similar techniques are available for MPI and CUDA | |||
** ISP, DAMPI, MARMOT, Umpire, MessageChecker | |||
** TASS uses symbolic execu7on | |||
** PUG for CUDA | |||
* Analyze programs that mix different paradigms | |||
** OpenMP, MPI, Shared Distributed Memory | |||
** Need to correlate non‐determinism across paradigms | |||
=== How Well Does it Scale? === | |||
[[File:CORVETTE-Franklin.png|right|150px]] | |||
* Maximum 8% slowdown at 8K cores | |||
** Franklin Cray XT4 Supercomputer at NERSC | |||
** Quad‐core 2.x3GHz CPU and 8GB RAM per node | |||
** Portals interconnect | |||
* Optimizations for scalability | |||
** Efficient Data Structures | |||
** Minimize Communication | |||
** Sampling with Exponential Backoff | |||
[[File:CORVETTE-Optimization.png|700px]] | |||
== Debugging Tools == | |||
=== Debugging Project I === | |||
* Detect bug with fewer threads and fewer context switches | |||
'''Experience with C/PThreads''': Over 90% of simplified traces were within 2 context switches of optimal | |||
[[File:CORVETTE-Context-Switch-Results.png|600px]] | |||
'''Small model hypothesis for parallel programs''' | |||
* Most bugs can be found with few threads | |||
** 2‐3 threads | |||
** No need to run on thousands of nodes | |||
* Most bugs can be found with fewer context switches [Musuvathi and Qadeer, PLDI 07] | |||
** Helps in sequential debugging | |||
=== Debugging Project II === | |||
* Two‐level debugging of DSLs | |||
* Correlate program state across program versions | |||
[[File:CORVETTE-Debugging-DSL.png|600px]] | |||
=== Debugging Project III === | |||
* Find floating point anomalies | |||
* Recommend safe reduction of precision | |||
== Floating Point Debugging == | |||
=== Why do we care? === | |||
* Usage of floating point programs has been growing rapidly | |||
** HPC | |||
** Cloud, games, graphics, finance, speech, signal processing | |||
* Most programmers are not expert in floating‐point! | |||
** Why not use highest precision everywhere | |||
* High precision wastes | |||
** Energy | |||
** Time | |||
** Storage | |||
=== Floating Point Debugging Problem 1: Reduce unnecessary precision === | |||
* Consider the problem of finding the arc length of the function | |||
'''How can we find a minimal set of code fragments whose precision must be high?''' | |||
[[File:CORVETTE-Debugging-Problem-1.png|600px]] | |||
=== Floating Point Debugging Problem 2: Detect Inaccuracy and Anomaly === | |||
[[File:CORVETTE-Debugging-Problem-2.png|500px]] | |||
=== What can we do? === | |||
* We can reduce precision “safely” | |||
** reduce power, improve performance, get better answer | |||
* Automated testing and debugging techniques | |||
** To recommend “precision reduction” | |||
** Formal proof of “safety” can be replaced by Concolic testing | |||
* Approach: automate previously hand-made debugging | |||
** Concolic testing | |||
** Delta debugging [Zeller et al.] | |||
== Implementation == | |||
* Prototype implementation for C programs | |||
** Uses CIL compiler framework | |||
** http://perso.univ-perp.fr/guillaume.revy/index.php?page=debugging | |||
* Future plans | |||
** Build on top of LLVM compiler framework | |||
== Potential Collaboration == | |||
* Dynamic analyses to find bugs ‐ dynamic parallelism, unstructured parallelism, shared memory | |||
** DEGAS, XPRESS, Traleika Glacier | |||
* Floating point debugging | |||
** Co‐design centers | |||
* 2‐level debugging | |||
** D-TEC | |||
== Summary == | |||
[[File:CORVETTE-Summary.png|600px]] | |||
* Build testing tools | |||
** Close to what programmers use | |||
** Hide formal methods and program analysis under testing | |||
* If you are not obsessed with formal correctness | |||
** Testing and debugging can help you solve these problems with high confidence |
Revision as of 17:59, February 12, 2013
CORVETTE | |
---|---|
Team Members | UC Berkeley, LBNL |
PI | Koushik Sen (UC Berkeley) |
Co-PIs | James W. Demmel (UC Berkeley), Costin Iancu (LBNL) |
Website | team website |
Download | {{{download}}} |
Program Correctness, Verification, and Testing for Exascale or CORVETTE
Team Members
Motivation
- High performance scientific computing
- Exascale: O(106) nodes, O(103) cores per node
- Requires asynchrony and “relaxed” memory consistency
- Shared memory with dynamic task parallelism
- Languages allow remote memory modification
- Correctness challenges
- Non-deterministic causes hard to diagnose correctness and performance bugs
- Data races, atomicity violations, deadlocks …
- Bugs in DSL
- Scientific applications use floating-points: non-determinism leads to non-reproducible results
- Numerical exceptions can cause rare but critical bugs that are hard for non-experts to detect and fix
- Non-deterministic causes hard to diagnose correctness and performance bugs
Goals
Develop correctness tools for different programming models: PGAS, MPI, dynamic parallelism
I. Testing and Verification
- Identify sources of non-determinism in executions
- Data races, atomicity violations, non‐reproducible floating point results
- Explore state-of-the-art techniques that use dynamic analysis
- Develop precise and scalable tools: < 2X overhead
II. Debugging
- Use minimal amount of concurrency to reproduce bug
- Support two-level debugging of high-level abstractions
- Detect causes of floating-point anomalies and determine the minimum precision needed to fix them
Testing and Verification Tools
Scalable Testing of Parallel Programs
- Concurrent Programming is hard
- Bugs happen non‐deterministically
- Data races, deadlocks, atomicity violations, etc.
- Goals: build a tool to test and debug concurrent and parallel programs
- Efficient: reduce overhead from 10x‐100x to 2x
- Precise
- Reproducible
- Scalable
- Active random testing
Active Testing
- Phase 1: Static or dynamic analysis to find potential concurrency bug patterns, such as data races, deadlocks, atomicity violations
- Data races: Eraser or lockset based [PLDI’08]
- Atomicity violations: cycle in transactions and happens‐before relation [FSE’08]
- Deadlocks: cycle in resource acquisition graph [PLDI’09]
- Publicly available tool for Java/Pthreads/UPC [CAV’09]
- Memory model bugs: cycle in happens‐before graph [ISSTA’11]
- For UPC programs running on thousands of cores [SC’11]
- Phase 2: “Direct” testing (or model checking) based on the bug patterns obtained from phase 1
- Confirm bugs
Goals
- Goal 1. Nice to have a trace exhibiting the data race
- Goal 2. Nice to have a trace exhibiting the assertion failure
- Goal 3. Nice to have a trace with fewer threads
- Goal 4. Nice to have a trace with fewer context switches
Challenges for Exascale
- Java and pthreads programs
- Synchronization with locks and condition variables
- Single node
- Exascale has different programming models
- Large scale
- Bulk communication
- Collective operations with data movement
- Memory consistency
- Distributed shared memory
- Cannot use centralized dynamic analyses
- Cannot instrument and track every statement
Further Challenges
- Targeted a simple programming paradigm
- Threads and shared memory
- Similar techniques are available for MPI and CUDA
- ISP, DAMPI, MARMOT, Umpire, MessageChecker
- TASS uses symbolic execu7on
- PUG for CUDA
- Analyze programs that mix different paradigms
- OpenMP, MPI, Shared Distributed Memory
- Need to correlate non‐determinism across paradigms
How Well Does it Scale?
- Maximum 8% slowdown at 8K cores
- Franklin Cray XT4 Supercomputer at NERSC
- Quad‐core 2.x3GHz CPU and 8GB RAM per node
- Portals interconnect
- Optimizations for scalability
- Efficient Data Structures
- Minimize Communication
- Sampling with Exponential Backoff
Debugging Tools
Debugging Project I
- Detect bug with fewer threads and fewer context switches
Experience with C/PThreads: Over 90% of simplified traces were within 2 context switches of optimal
Small model hypothesis for parallel programs
- Most bugs can be found with few threads
- 2‐3 threads
- No need to run on thousands of nodes
- Most bugs can be found with fewer context switches [Musuvathi and Qadeer, PLDI 07]
- Helps in sequential debugging
Debugging Project II
- Two‐level debugging of DSLs
- Correlate program state across program versions
Debugging Project III
- Find floating point anomalies
- Recommend safe reduction of precision
Floating Point Debugging
Why do we care?
- Usage of floating point programs has been growing rapidly
- HPC
- Cloud, games, graphics, finance, speech, signal processing
- Most programmers are not expert in floating‐point!
- Why not use highest precision everywhere
- High precision wastes
- Energy
- Time
- Storage
Floating Point Debugging Problem 1: Reduce unnecessary precision
- Consider the problem of finding the arc length of the function
How can we find a minimal set of code fragments whose precision must be high?
Floating Point Debugging Problem 2: Detect Inaccuracy and Anomaly
What can we do?
- We can reduce precision “safely”
- reduce power, improve performance, get better answer
- Automated testing and debugging techniques
- To recommend “precision reduction”
- Formal proof of “safety” can be replaced by Concolic testing
- Approach: automate previously hand-made debugging
- Concolic testing
- Delta debugging [Zeller et al.]
Implementation
- Prototype implementation for C programs
- Uses CIL compiler framework
- http://perso.univ-perp.fr/guillaume.revy/index.php?page=debugging
- Future plans
- Build on top of LLVM compiler framework
Potential Collaboration
- Dynamic analyses to find bugs ‐ dynamic parallelism, unstructured parallelism, shared memory
- DEGAS, XPRESS, Traleika Glacier
- Floating point debugging
- Co‐design centers
- 2‐level debugging
- D-TEC
Summary
- Build testing tools
- Close to what programmers use
- Hide formal methods and program analysis under testing
- If you are not obsessed with formal correctness
- Testing and debugging can help you solve these problems with high confidence