Actions

CORVETTE: Difference between revisions

From Modelado Foundation

imported>Ksen
imported>Ksen
Line 61: Line 61:
* ''Scaling Data Race Detection for Partitioned Global Address Space Programs''. C. Park, K. Sen, C. Iancu. In Proceedings of the International Conference on Supercomputing (ICS'13), Eugene, OR, June 2013.
* ''Scaling Data Race Detection for Partitioned Global Address Space Programs''. C. Park, K. Sen, C. Iancu. In Proceedings of the International Conference on Supercomputing (ICS'13), Eugene, OR, June 2013.
* ''Precimonious: Tuning Assistant for Floating-Point Precision''. C. Rubio-González, C. Nguyen, H.D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, November 2013.
* ''Precimonious: Tuning Assistant for Floating-Point Precision''. C. Rubio-González, C. Nguyen, H.D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, November 2013.
* "Parallel Reproducible Summation". To appear in IEEE Transaction on Computer, Special Section on Computer Arithmetic 2014.
*''Parallel Reproducible Summation''. J. Demmel and H.D. Nguyen. To appear in IEEE Transaction on Computer, Special Section on Computer Arithmetic 2014.


== Technical Presentations ==
== Technical Presentations ==

Revision as of 15:51, April 25, 2014

CORVETTE
CORVETTE-Logos.png
Team Members UC Berkeley, LBNL
PI Koushik Sen (UC Berkeley)
Co-PIs James W. Demmel (UC Berkeley), Costin Iancu (LBNL)
Website CORVETTE
Download {{{download}}}

Program Correctness, Verification, and Testing for Exascale or CORVETTE


Team Members

Researchers

Postdoctoral Researchers

Graduate Student Researcher

Motivation

  • High performance scientific computing
    • Exascale: O(106) nodes, O(103) cores per node
    • Requires asynchrony and “relaxed” memory consistency
    • Shared memory with dynamic task parallelism
    • Languages allow remote memory modification
  • Correctness challenges
    • Non-deterministic causes hard to diagnose correctness and performance bugs
      • Data races, atomicity violations, deadlocks …
    • Bugs in DSL
    • Scientific applications use floating-points: non-determinism leads to non-reproducible results
    • Numerical exceptions can cause rare but critical bugs that are hard for non-experts to detect and fix

Goals

Develop correctness tools for different programming models: PGAS, MPI, dynamic parallelism

I. Testing and Verification

  • Identify sources of non-determinism in executions
  • Data races, atomicity violations, non‐reproducible floating point results
  • Explore state-of-the-art techniques that use dynamic analysis
  • Develop precise and scalable tools: < 2X overhead

II. Debugging

  • Use minimal amount of concurrency to reproduce bug
  • Support two-level debugging of high-level abstractions
  • Detect causes of floating-point anomalies and determine the minimum precision needed to fix them

Publications

  • Fast Reproducible Floating-Point Summation. J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
  • Numerical Accuracy and Reproducibility at ExaScale. J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
  • Scaling Data Race Detection for Partitioned Global Address Space Programs. C. Park, K. Sen, C. Iancu. In Proceedings of the International Conference on Supercomputing (ICS'13), Eugene, OR, June 2013.
  • Precimonious: Tuning Assistant for Floating-Point Precision. C. Rubio-González, C. Nguyen, H.D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, November 2013.
  • Parallel Reproducible Summation. J. Demmel and H.D. Nguyen. To appear in IEEE Transaction on Computer, Special Section on Computer Arithmetic 2014.

Technical Presentations

Software Releases

Testing and Verification Tools

Scalable Testing of Parallel Programs

  • Concurrent Programming is hard
    • Bugs happen non­‐deterministically
    • Data races, deadlocks, atomicity violations, etc.
  • Goals: build a tool to test and debug concurrent and parallel programs
    • Efficient: reduce overhead from 10x‐100x to 2x
    • Precise
    • Reproducible
    • Scalable
  • Active random testing

Active Testing

  • Phase 1: Static or dynamic analysis to find potential concurrency bug patterns, such as data races, deadlocks, atomicity violations
    • Data races: Eraser or lockset based [PLDI’08]
    • Atomicity violations: cycle in transactions and happens‐before relation [FSE’08]
    • Deadlocks: cycle in resource acquisition graph [PLDI’09]
    • Publicly available tool for Java/Pthreads/UPC [CAV’09]
    • Memory model bugs: cycle in happens­‐before graph [ISSTA’11]
    • For UPC programs running on thousands of cores [SC’11]
  • Phase 2: “Direct” testing (or model checking) based on the bug patterns obtained from phase 1
    • Confirm bugs

Goals

  • Goal 1. Nice to have a trace exhibiting the data race
  • Goal 2. Nice to have a trace exhibiting the assertion failure
  • Goal 3. Nice to have a trace with fewer threads
  • Goal 4. Nice to have a trace with fewer context switches

Challenges for Exascale

  • Java and pthreads programs
    • Synchronization with locks and condition variables
    • Single node
  • Exascale has different programming models
    • Large scale
    • Bulk communication
    • Collective operations with data movement
    • Memory consistency
    • Distributed shared memory
  • Cannot use centralized dynamic analyses
  • Cannot instrument and track every statement

Further Challenges

  • Targeted a simple programming paradigm
    • Threads and shared memory
  • Similar techniques are available for MPI and CUDA
    • ISP, DAMPI, MARMOT, Umpire, MessageChecker
    • TASS uses symbolic execu7on
    • PUG for CUDA
  • Analyze programs that mix different paradigms
    • OpenMP, MPI, Shared Distributed Memory
    • Need to correlate non‐determinism across paradigms

How Well Does it Scale?

CORVETTE-Franklin.png
  • Maximum 8% slowdown at 8K cores
    • Franklin Cray XT4 Supercomputer at NERSC
    • Quad­‐core 2.x3GHz CPU and 8GB RAM per node
    • Portals interconnect
  • Optimizations for scalability
    • Efficient Data Structures
    • Minimize Communication
    • Sampling with Exponential Backoff

CORVETTE-Optimization.png


Debugging Tools

Debugging Project I

  • Detect bug with fewer threads and fewer context switches


Experience with C/PThreads: Over 90% of simplified traces were within 2 context switches of optimal

CORVETTE-Context-Switch-Results.png


Small model hypothesis for parallel programs

  • Most bugs can be found with few threads
    • 2‐3 threads
    • No need to run on thousands of nodes
  • Most bugs can be found with fewer context switches [Musuvathi and Qadeer, PLDI 07]
    • Helps in sequential debugging

Debugging Project II

  • Two‐level debugging of DSLs
  • Correlate program state across program versions

CORVETTE-Debugging-DSL.png

Debugging Project III

  • Find floating point anomalies
  • Recommend safe reduction of precision


Floating Point Debugging

Why do we care?

  • Usage of floating point programs has been growing rapidly
    • HPC
    • Cloud, games, graphics, finance, speech, signal processing
  • Most programmers are not expert in floating‐point!
    • Why not use highest precision everywhere
  • High precision wastes
    • Energy
    • Time
    • Storage

Floating Point Debugging Problem 1: Reduce unnecessary precision

  • Consider the problem of finding the arc length of the function

How can we find a minimal set of code fragments whose precision must be high?

CORVETTE-Debugging-Problem-1.png

Floating Point Debugging Problem 2: Detect Inaccuracy and Anomaly

CORVETTE-Debugging-Problem-2.png

What can we do?

  • We can reduce precision “safely”
    • reduce power, improve performance, get better answer
  • Automated testing and debugging techniques
    • To recommend “precision reduction”
    • Formal proof of “safety” can be replaced by Concolic testing
  • Approach: automate previously hand-made debugging
    • Concolic testing
    • Delta debugging [Zeller et al.]


Implementation

  • Future plans
    • Build on top of LLVM compiler framework


Potential Collaboration

  • Dynamic analyses to find bugs ‐ dynamic parallelism, unstructured parallelism, shared memory
    • DEGAS, XPRESS, Traleika Glacier
  • Floating point debugging
    • Co‐design centers
  • 2‐level debugging
    • D-TEC


Summary

CORVETTE-Summary.png

  • Build testing tools
    • Close to what programmers use
    • Hide formal methods and program analysis under testing
  • If you are not obsessed with formal correctness
    • Testing and debugging can help you solve these problems with high confidence