Actions

CORVETTE: Difference between revisions

From Modelado Foundation

imported>Ksen
No edit summary
No edit summary
 
(39 intermediate revisions by 2 users not shown)
Line 4: Line 4:
| imagecaption =  
| imagecaption =  
| team-members = [http://www.berkeley.edu/ UC Berkeley], [http://www.lbl.gov/ LBNL]
| team-members = [http://www.berkeley.edu/ UC Berkeley], [http://www.lbl.gov/ LBNL]
| pi = Koushik Sen (UC Berkeley)
| pi = [[Koushik Sen]]
| co-pi = James W. Demmel (UC Berkeley), Costin Iancu (LBNL)
| co-pi = James W. Demmel (UC Berkeley), Costin Iancu (LBNL)
| website = [http://crd.lbl.gov/groups-depts/ftg/projects/current-projects/corvette/]
| website = [http://crd.lbl.gov/groups-depts/ftg/projects/current-projects/corvette/ CORVETTE]
}}
}}


Line 55: Line 55:
* Detect causes of floating-point anomalies and determine the minimum precision needed to fix them
* Detect causes of floating-point anomalies and determine the minimum precision needed to fix them


==Impact Summary ==
* [https://xstackwiki.modelado.org/images/4/4c/Corvette-highlight_summary.pdf Corvette Impact Summary]
== List of Products ==
=== Publications ===
* [http://crd.lbl.gov/assets/pubs_presos/main2.pdf ''Floating Point Precision Tuning Using Blame Analysis''] Cindy Rubio-Gonz ́alez, Cuong Nguyen, James Demmel, William Kahan, Koushik Sen, Wim Lavrijsen, Costin Iancu, LBNL TR, April 17, 2015.
* [http://crd.lbl.gov/assets/pubs_presos/main3.pdf ''OPR: Partial Deterministic Record and Replay for One-Sided Communication''] Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Iancu, LBNL TR, April 17, 2015.
* [http://crd.lbl.gov/assets/pubs_presos/nwbar.pdf ''Barrier Elision for Production Parallel Programs''] Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor Crummey, Costin Iancu, PPOPP 2015, February 5, 2015.
* "Parallel Reproducible Summation James Demmel", Hong-Diep Nguyen, In IEEE Transactions on Computers, Special Section on Computer Arithmetic 2014, April 30, 2014
* [http://www.eecs.berkeley.edu/~rubio/includes/sc13.pdf ''Precimonious: Tuning Assistant for Floating-Point Precision.''] C. Rubio-González, C. Nguyen, H.D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, November 2013.
* [http://www.eecs.berkeley.edu/~hdnguyen/public/papers/ARITH21_Fast_Sum.pdf ''Fast Reproducible Floating-Point Summation''.] J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
* [http://www.eecs.berkeley.edu/~hdnguyen/public/papers/ARITH21_ExaScale.pdf ''Numerical Accuracy and Reproducibility at ExaScale''.] J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
* [http://srl.cs.berkeley.edu/~ksen/papers/thrille-exp.pdf ''Scaling Data Race Detection for Partitioned Global Address Space Programs''.] C. Park, K. Sen, C. Iancu. In Proceedings of the International Conference on Supercomputing (ICS'13), Eugene, OR, June 2013.
=== Technical Presentations ===
* 06/2014. ''Testing, Debugging, and Precision-tuning of Large-scale Parallel and Floating-point Programs''. Presenter: Koushik Sen. Invited R. Narasimhan Lecture Award, TIFR, India.
* 05/2014. ''Precision Tuning of Floating-Point Programs''. Presenters: Cindy Rubio-González and Cuong Nguyen. Berkeley Programming Systems Retreat, Santa Cruz, CA.
* 04/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. University of California, Irvine, Irvine, CA.
* 03/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. University of Texas at Dallas, Dallas, TX.
* 03/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. University of California, Davis, Davis, CA.
* 02/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. The University of New York at Buffalo, Buffalo, NY.
* 02/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. College of William and Mary, Williamsburg, VA.
* 02/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. Invited talk at SRI International, Menlo Park, CA.
* 01/2014. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. UC Berkeley ASPIRE Winter Retreat, Tahoe, CA.
* 12/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Invited talk at Bay Area Scientific Computing Day (BASCD'13), LBNL, Berkeley, CA.
* 11/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Supercomputing Conference (SC'13), Denver, CO.
* 11/2013. ''ReproBLAS: Reproducible BLAS''. Presenter: Hong Diep Nguyen. Supercomputing Conference (SC'13), Denver, CO.
* 11/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Massachusetts Institute of Technology, Cambridge, MA.
* 11/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Rising Stars in EECS Workshop, Poster Session, MIT, Cambridge, MA.
* 08/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Invited talk at Oracle, Compiler Technical Talks Series, Santa Clara, CA.
* 06/2013. ''Scaling Data Race Detection for Partitioned Global Address Space Programs''. Presenter: Costin Iancu. International Supercomputing Conference (ICS'13), Eugene, OR.
* 06/2013. ''Efficient Reproducible Floating-Point Reduction Operations on Large Scale Systems''. Presenter: Hong Diep Nguyen. SIAM Annual Meeting 2013  (AN13), San Diego, CA.
* 06/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Lawrence Berkeley National Laboratory DEGAS Retreat, Santa Cruz, CA.
* 04/2013. ''Fast Reproducible Floating-Point Summation''. Presenter: Hong Diep Nguyen. 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX.
* 04/2013. ''Numerical Reproducibility and Accuracy at ExaScale''. Presenter: Hong Diep Nguyen. 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX.
* 11/2012. ''Reproducible Floating Point Computation: Motivation, Algorithms, Diagnostics''. Presenter: James Demmel. Birds of a Feather (BOF), Supercomputing Conference (SC'12), Salt Lake City, UT.
=== Software Releases ===
* [http://upc.lbl.gov/thrille.html/ UPC-Thrille]
* [https://github.com/corvette-berkeley/precimonious/ Precimonious]
* [http://bebop.cs.berkeley.edu/reproblas/ ReproBLAS]


== Testing and Verification Tools ==
== Testing and Verification Tools ==

Latest revision as of 04:46, July 10, 2023

CORVETTE
CORVETTE-Logos.png
Team Members UC Berkeley, LBNL
PI Koushik Sen
Co-PIs James W. Demmel (UC Berkeley), Costin Iancu (LBNL)
Website CORVETTE
Download {{{download}}}

Program Correctness, Verification, and Testing for Exascale or CORVETTE


Team Members

Researchers

Postdoctoral Researchers

Graduate Student Researcher

Motivation

  • High performance scientific computing
    • Exascale: O(106) nodes, O(103) cores per node
    • Requires asynchrony and “relaxed” memory consistency
    • Shared memory with dynamic task parallelism
    • Languages allow remote memory modification
  • Correctness challenges
    • Non-deterministic causes hard to diagnose correctness and performance bugs
      • Data races, atomicity violations, deadlocks …
    • Bugs in DSL
    • Scientific applications use floating-points: non-determinism leads to non-reproducible results
    • Numerical exceptions can cause rare but critical bugs that are hard for non-experts to detect and fix

Goals

Develop correctness tools for different programming models: PGAS, MPI, dynamic parallelism

I. Testing and Verification

  • Identify sources of non-determinism in executions
  • Data races, atomicity violations, non‐reproducible floating point results
  • Explore state-of-the-art techniques that use dynamic analysis
  • Develop precise and scalable tools: < 2X overhead

II. Debugging

  • Use minimal amount of concurrency to reproduce bug
  • Support two-level debugging of high-level abstractions
  • Detect causes of floating-point anomalies and determine the minimum precision needed to fix them

Impact Summary

List of Products

Publications

Technical Presentations

  • 06/2014. Testing, Debugging, and Precision-tuning of Large-scale Parallel and Floating-point Programs. Presenter: Koushik Sen. Invited R. Narasimhan Lecture Award, TIFR, India.
  • 05/2014. Precision Tuning of Floating-Point Programs. Presenters: Cindy Rubio-González and Cuong Nguyen. Berkeley Programming Systems Retreat, Santa Cruz, CA.
  • 04/2014. Improving Software Reliability and Performance Using Program Analysis. Presenter: Cindy Rubio-González. University of California, Irvine, Irvine, CA.
  • 03/2014. Improving Software Reliability and Performance Using Program Analysis. Presenter: Cindy Rubio-González. University of Texas at Dallas, Dallas, TX.
  • 03/2014. Improving Software Reliability and Performance Using Program Analysis. Presenter: Cindy Rubio-González. University of California, Davis, Davis, CA.
  • 02/2014. Improving Software Reliability and Performance Using Program Analysis. Presenter: Cindy Rubio-González. The University of New York at Buffalo, Buffalo, NY.
  • 02/2014. Improving Software Reliability and Performance Using Program Analysis. Presenter: Cindy Rubio-González. College of William and Mary, Williamsburg, VA.
  • 02/2014. Improving Software Reliability and Performance Using Program Analysis. Presenter: Cindy Rubio-González. Invited talk at SRI International, Menlo Park, CA.
  • 01/2014. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. UC Berkeley ASPIRE Winter Retreat, Tahoe, CA.
  • 12/2013. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. Invited talk at Bay Area Scientific Computing Day (BASCD'13), LBNL, Berkeley, CA.
  • 11/2013. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. Supercomputing Conference (SC'13), Denver, CO.
  • 11/2013. ReproBLAS: Reproducible BLAS. Presenter: Hong Diep Nguyen. Supercomputing Conference (SC'13), Denver, CO.
  • 11/2013. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. Massachusetts Institute of Technology, Cambridge, MA.
  • 11/2013. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. Rising Stars in EECS Workshop, Poster Session, MIT, Cambridge, MA.
  • 08/2013. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. Invited talk at Oracle, Compiler Technical Talks Series, Santa Clara, CA.
  • 06/2013. Scaling Data Race Detection for Partitioned Global Address Space Programs. Presenter: Costin Iancu. International Supercomputing Conference (ICS'13), Eugene, OR.
  • 06/2013. Efficient Reproducible Floating-Point Reduction Operations on Large Scale Systems. Presenter: Hong Diep Nguyen. SIAM Annual Meeting 2013 (AN13), San Diego, CA.
  • 06/2013. Precimonious: Tuning Assistant for Floating-Point Precision. Presenter: Cindy Rubio-González. Lawrence Berkeley National Laboratory DEGAS Retreat, Santa Cruz, CA.
  • 04/2013. Fast Reproducible Floating-Point Summation. Presenter: Hong Diep Nguyen. 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX.
  • 04/2013. Numerical Reproducibility and Accuracy at ExaScale. Presenter: Hong Diep Nguyen. 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX.
  • 11/2012. Reproducible Floating Point Computation: Motivation, Algorithms, Diagnostics. Presenter: James Demmel. Birds of a Feather (BOF), Supercomputing Conference (SC'12), Salt Lake City, UT.

Software Releases

Testing and Verification Tools

Scalable Testing of Parallel Programs

  • Concurrent Programming is hard
    • Bugs happen non­‐deterministically
    • Data races, deadlocks, atomicity violations, etc.
  • Goals: build a tool to test and debug concurrent and parallel programs
    • Efficient: reduce overhead from 10x‐100x to 2x
    • Precise
    • Reproducible
    • Scalable
  • Active random testing

Active Testing

  • Phase 1: Static or dynamic analysis to find potential concurrency bug patterns, such as data races, deadlocks, atomicity violations
    • Data races: Eraser or lockset based [PLDI’08]
    • Atomicity violations: cycle in transactions and happens‐before relation [FSE’08]
    • Deadlocks: cycle in resource acquisition graph [PLDI’09]
    • Publicly available tool for Java/Pthreads/UPC [CAV’09]
    • Memory model bugs: cycle in happens­‐before graph [ISSTA’11]
    • For UPC programs running on thousands of cores [SC’11]
  • Phase 2: “Direct” testing (or model checking) based on the bug patterns obtained from phase 1
    • Confirm bugs

Goals

  • Goal 1. Nice to have a trace exhibiting the data race
  • Goal 2. Nice to have a trace exhibiting the assertion failure
  • Goal 3. Nice to have a trace with fewer threads
  • Goal 4. Nice to have a trace with fewer context switches

Challenges for Exascale

  • Java and pthreads programs
    • Synchronization with locks and condition variables
    • Single node
  • Exascale has different programming models
    • Large scale
    • Bulk communication
    • Collective operations with data movement
    • Memory consistency
    • Distributed shared memory
  • Cannot use centralized dynamic analyses
  • Cannot instrument and track every statement

Further Challenges

  • Targeted a simple programming paradigm
    • Threads and shared memory
  • Similar techniques are available for MPI and CUDA
    • ISP, DAMPI, MARMOT, Umpire, MessageChecker
    • TASS uses symbolic execu7on
    • PUG for CUDA
  • Analyze programs that mix different paradigms
    • OpenMP, MPI, Shared Distributed Memory
    • Need to correlate non‐determinism across paradigms

How Well Does it Scale?

CORVETTE-Franklin.png
  • Maximum 8% slowdown at 8K cores
    • Franklin Cray XT4 Supercomputer at NERSC
    • Quad­‐core 2.x3GHz CPU and 8GB RAM per node
    • Portals interconnect
  • Optimizations for scalability
    • Efficient Data Structures
    • Minimize Communication
    • Sampling with Exponential Backoff

CORVETTE-Optimization.png


Debugging Tools

Debugging Project I

  • Detect bug with fewer threads and fewer context switches


Experience with C/PThreads: Over 90% of simplified traces were within 2 context switches of optimal

CORVETTE-Context-Switch-Results.png


Small model hypothesis for parallel programs

  • Most bugs can be found with few threads
    • 2‐3 threads
    • No need to run on thousands of nodes
  • Most bugs can be found with fewer context switches [Musuvathi and Qadeer, PLDI 07]
    • Helps in sequential debugging

Debugging Project II

  • Two‐level debugging of DSLs
  • Correlate program state across program versions

CORVETTE-Debugging-DSL.png

Debugging Project III

  • Find floating point anomalies
  • Recommend safe reduction of precision


Floating Point Debugging

Why do we care?

  • Usage of floating point programs has been growing rapidly
    • HPC
    • Cloud, games, graphics, finance, speech, signal processing
  • Most programmers are not expert in floating‐point!
    • Why not use highest precision everywhere
  • High precision wastes
    • Energy
    • Time
    • Storage

Floating Point Debugging Problem 1: Reduce unnecessary precision

  • Consider the problem of finding the arc length of the function

How can we find a minimal set of code fragments whose precision must be high?

CORVETTE-Debugging-Problem-1.png

Floating Point Debugging Problem 2: Detect Inaccuracy and Anomaly

CORVETTE-Debugging-Problem-2.png

What can we do?

  • We can reduce precision “safely”
    • reduce power, improve performance, get better answer
  • Automated testing and debugging techniques
    • To recommend “precision reduction”
    • Formal proof of “safety” can be replaced by Concolic testing
  • Approach: automate previously hand-made debugging
    • Concolic testing
    • Delta debugging [Zeller et al.]


Implementation

  • Future plans
    • Build on top of LLVM compiler framework


Potential Collaboration

  • Dynamic analyses to find bugs ‐ dynamic parallelism, unstructured parallelism, shared memory
    • DEGAS, XPRESS, Traleika Glacier
  • Floating point debugging
    • Co‐design centers
  • 2‐level debugging
    • D-TEC


Summary

CORVETTE-Summary.png

  • Build testing tools
    • Close to what programmers use
    • Hide formal methods and program analysis under testing
  • If you are not obsessed with formal correctness
    • Testing and debugging can help you solve these problems with high confidence