Actions

CORVETTE: Difference between revisions

From Modelado Foundation

imported>Ksen
imported>Ksen
Line 64: Line 64:


== Technical Presentations ==
== Technical Presentations ==
* 06/2014. ''Testing, Debugging, and Precision-tuning of Large-scale Parallel and Floating-point Programs''. Presenter: Koushik Sen. Invited R. Narasimhan Lecture Award, Tata Institute of Fundamental Research, India.
* 04/2014 \indent University of California, Irvine, Irvine, CA. \\
* 03/2014 \indent University of Texas at Dallas, Dallas, TX. \\
* 03/2014 \indent  University of California, Davis, Davis, CA. \\
* 02/2014 \indent The University of New York at Buffalo, Buffalo, NY. \\
* 02/2014 \indent College of William and Mary, Williamsburg, VA. \\
* 02/2014 \indent Invited talk at SRI International, Menlo Park, CA. \\
* 01/2014 \indent UC Berkeley ASPIRE Winter Retreat, Tahoe, CA. \\
* 12/2013 \indent Invited talk at Bay Area Scientific Computing Day (BASCD'13), LBNL, Berkeley, CA. \\
* 11/2013 \indent Supercomputing Conference (SC'13), Denver, CO. \\
* 11/2013 \indent Massachusetts Institute of Technology, Cambridge, MA. \\
* 11/2013 \indent Rising Stars in EECS Workshop, Poster Session, MIT, Cambridge, MA. \\
* 08/2013 \indent Invited talk at Oracle, Compiler Technical Talks Series, Santa Clara, CA. \\
* 06/2013 \indent Lawrence Berkeley National Laboratory DEGAS Retreat, Santa Cruz, CA. \\
* 11/2013 \indent Supercomputing Conference (SC'13), Denver, CO. \\
* 06/2013 \indent SIAM Annual Meeting 2013  (AN13), San Diego, CA. \\
* 04/2013 \indent 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX. \\


== Software Releases ==
== Software Releases ==

Revision as of 15:57, April 25, 2014

CORVETTE
CORVETTE-Logos.png
Team Members UC Berkeley, LBNL
PI Koushik Sen (UC Berkeley)
Co-PIs James W. Demmel (UC Berkeley), Costin Iancu (LBNL)
Website CORVETTE
Download {{{download}}}

Program Correctness, Verification, and Testing for Exascale or CORVETTE


Team Members

Researchers

Postdoctoral Researchers

Graduate Student Researcher

Motivation

  • High performance scientific computing
    • Exascale: O(106) nodes, O(103) cores per node
    • Requires asynchrony and “relaxed” memory consistency
    • Shared memory with dynamic task parallelism
    • Languages allow remote memory modification
  • Correctness challenges
    • Non-deterministic causes hard to diagnose correctness and performance bugs
      • Data races, atomicity violations, deadlocks …
    • Bugs in DSL
    • Scientific applications use floating-points: non-determinism leads to non-reproducible results
    • Numerical exceptions can cause rare but critical bugs that are hard for non-experts to detect and fix

Goals

Develop correctness tools for different programming models: PGAS, MPI, dynamic parallelism

I. Testing and Verification

  • Identify sources of non-determinism in executions
  • Data races, atomicity violations, non‐reproducible floating point results
  • Explore state-of-the-art techniques that use dynamic analysis
  • Develop precise and scalable tools: < 2X overhead

II. Debugging

  • Use minimal amount of concurrency to reproduce bug
  • Support two-level debugging of high-level abstractions
  • Detect causes of floating-point anomalies and determine the minimum precision needed to fix them

Publications

  • Fast Reproducible Floating-Point Summation. J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
  • Numerical Accuracy and Reproducibility at ExaScale. J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
  • Scaling Data Race Detection for Partitioned Global Address Space Programs. C. Park, K. Sen, C. Iancu. In Proceedings of the International Conference on Supercomputing (ICS'13), Eugene, OR, June 2013.
  • Precimonious: Tuning Assistant for Floating-Point Precision. C. Rubio-González, C. Nguyen, H.D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, November 2013.
  • Parallel Reproducible Summation. J. Demmel and H.D. Nguyen. To appear in IEEE Transaction on Computer, Special Section on Computer Arithmetic 2014.

Technical Presentations

  • 06/2014. Testing, Debugging, and Precision-tuning of Large-scale Parallel and Floating-point Programs. Presenter: Koushik Sen. Invited R. Narasimhan Lecture Award, Tata Institute of Fundamental Research, India.
  • 04/2014 \indent University of California, Irvine, Irvine, CA. \\
  • 03/2014 \indent University of Texas at Dallas, Dallas, TX. \\
  • 03/2014 \indent University of California, Davis, Davis, CA. \\
  • 02/2014 \indent The University of New York at Buffalo, Buffalo, NY. \\
  • 02/2014 \indent College of William and Mary, Williamsburg, VA. \\
  • 02/2014 \indent Invited talk at SRI International, Menlo Park, CA. \\
  • 01/2014 \indent UC Berkeley ASPIRE Winter Retreat, Tahoe, CA. \\
  • 12/2013 \indent Invited talk at Bay Area Scientific Computing Day (BASCD'13), LBNL, Berkeley, CA. \\
  • 11/2013 \indent Supercomputing Conference (SC'13), Denver, CO. \\
  • 11/2013 \indent Massachusetts Institute of Technology, Cambridge, MA. \\
  • 11/2013 \indent Rising Stars in EECS Workshop, Poster Session, MIT, Cambridge, MA. \\
  • 08/2013 \indent Invited talk at Oracle, Compiler Technical Talks Series, Santa Clara, CA. \\
  • 06/2013 \indent Lawrence Berkeley National Laboratory DEGAS Retreat, Santa Cruz, CA. \\
  • 11/2013 \indent Supercomputing Conference (SC'13), Denver, CO. \\
  • 06/2013 \indent SIAM Annual Meeting 2013 (AN13), San Diego, CA. \\
  • 04/2013 \indent 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX. \\

Software Releases

Testing and Verification Tools

Scalable Testing of Parallel Programs

  • Concurrent Programming is hard
    • Bugs happen non­‐deterministically
    • Data races, deadlocks, atomicity violations, etc.
  • Goals: build a tool to test and debug concurrent and parallel programs
    • Efficient: reduce overhead from 10x‐100x to 2x
    • Precise
    • Reproducible
    • Scalable
  • Active random testing

Active Testing

  • Phase 1: Static or dynamic analysis to find potential concurrency bug patterns, such as data races, deadlocks, atomicity violations
    • Data races: Eraser or lockset based [PLDI’08]
    • Atomicity violations: cycle in transactions and happens‐before relation [FSE’08]
    • Deadlocks: cycle in resource acquisition graph [PLDI’09]
    • Publicly available tool for Java/Pthreads/UPC [CAV’09]
    • Memory model bugs: cycle in happens­‐before graph [ISSTA’11]
    • For UPC programs running on thousands of cores [SC’11]
  • Phase 2: “Direct” testing (or model checking) based on the bug patterns obtained from phase 1
    • Confirm bugs

Goals

  • Goal 1. Nice to have a trace exhibiting the data race
  • Goal 2. Nice to have a trace exhibiting the assertion failure
  • Goal 3. Nice to have a trace with fewer threads
  • Goal 4. Nice to have a trace with fewer context switches

Challenges for Exascale

  • Java and pthreads programs
    • Synchronization with locks and condition variables
    • Single node
  • Exascale has different programming models
    • Large scale
    • Bulk communication
    • Collective operations with data movement
    • Memory consistency
    • Distributed shared memory
  • Cannot use centralized dynamic analyses
  • Cannot instrument and track every statement

Further Challenges

  • Targeted a simple programming paradigm
    • Threads and shared memory
  • Similar techniques are available for MPI and CUDA
    • ISP, DAMPI, MARMOT, Umpire, MessageChecker
    • TASS uses symbolic execu7on
    • PUG for CUDA
  • Analyze programs that mix different paradigms
    • OpenMP, MPI, Shared Distributed Memory
    • Need to correlate non‐determinism across paradigms

How Well Does it Scale?

CORVETTE-Franklin.png
  • Maximum 8% slowdown at 8K cores
    • Franklin Cray XT4 Supercomputer at NERSC
    • Quad­‐core 2.x3GHz CPU and 8GB RAM per node
    • Portals interconnect
  • Optimizations for scalability
    • Efficient Data Structures
    • Minimize Communication
    • Sampling with Exponential Backoff

CORVETTE-Optimization.png


Debugging Tools

Debugging Project I

  • Detect bug with fewer threads and fewer context switches


Experience with C/PThreads: Over 90% of simplified traces were within 2 context switches of optimal

CORVETTE-Context-Switch-Results.png


Small model hypothesis for parallel programs

  • Most bugs can be found with few threads
    • 2‐3 threads
    • No need to run on thousands of nodes
  • Most bugs can be found with fewer context switches [Musuvathi and Qadeer, PLDI 07]
    • Helps in sequential debugging

Debugging Project II

  • Two‐level debugging of DSLs
  • Correlate program state across program versions

CORVETTE-Debugging-DSL.png

Debugging Project III

  • Find floating point anomalies
  • Recommend safe reduction of precision


Floating Point Debugging

Why do we care?

  • Usage of floating point programs has been growing rapidly
    • HPC
    • Cloud, games, graphics, finance, speech, signal processing
  • Most programmers are not expert in floating‐point!
    • Why not use highest precision everywhere
  • High precision wastes
    • Energy
    • Time
    • Storage

Floating Point Debugging Problem 1: Reduce unnecessary precision

  • Consider the problem of finding the arc length of the function

How can we find a minimal set of code fragments whose precision must be high?

CORVETTE-Debugging-Problem-1.png

Floating Point Debugging Problem 2: Detect Inaccuracy and Anomaly

CORVETTE-Debugging-Problem-2.png

What can we do?

  • We can reduce precision “safely”
    • reduce power, improve performance, get better answer
  • Automated testing and debugging techniques
    • To recommend “precision reduction”
    • Formal proof of “safety” can be replaced by Concolic testing
  • Approach: automate previously hand-made debugging
    • Concolic testing
    • Delta debugging [Zeller et al.]


Implementation

  • Future plans
    • Build on top of LLVM compiler framework


Potential Collaboration

  • Dynamic analyses to find bugs ‐ dynamic parallelism, unstructured parallelism, shared memory
    • DEGAS, XPRESS, Traleika Glacier
  • Floating point debugging
    • Co‐design centers
  • 2‐level debugging
    • D-TEC


Summary

CORVETTE-Summary.png

  • Build testing tools
    • Close to what programmers use
    • Hide formal methods and program analysis under testing
  • If you are not obsessed with formal correctness
    • Testing and debugging can help you solve these problems with high confidence