Actions

DEGAS: Difference between revisions

From Modelado Foundation

imported>Cdenny
No edit summary
No edit summary
 
(17 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Infobox project
{{Infobox project
| title = DEGAS
| title = DEGAS
| image = [[File:Your-team-logo.png|180px]]
| image = [[File:DEGAS-Logos.png|350px]]
| imagecaption =  
| imagecaption =  
| team-members = LBNL, Rice U., UC Berkeley, UT Austin, LLNL, NCSU
| team-members = [http://www.lbl.gov/ LBNL], [http://www.rice.edu/ Rice U.], [http://www.berkeley.edu/ UC Berkeley], [https://www.utexas.edu/ UT Austin], [https://www.llnl.gov/ LLNL], [http://www.ncsu.edu/ NCSU]
| pi = Katherine Yelick (LBNL))
| pi = [[Katherine Yelick]]
| co-pi = Vivek Sarkar (Rice U.), James Demmel (UC Berkeley),  
| co-pi = Vivek Sarkar (Rice U.), James Demmel (UC Berkeley), Mattan Erez (UT Austin), Dan Quinlan (LLNL)
Mattan Erez (UT Austin), Dan Quinlan (LLNL)
| website = [http://crd.lbl.gov/departments/computer-science/CLaSS/research/DEGAS/ DEGAS]
| website = team website
}}
}}


''Description about your project goes here.....''
'''Dynamic Exascale Global Address Space''' or '''DEGAS'''
 


== Team Members ==
== Team Members ==
* Lawrence Berkeley National Laboratory (LBNL)
* [http://www.lbl.gov/ Lawrence Berkeley National Laboratory (LBNL)]
* Rice University
* [http://www.rice.edu/ Rice University]
* University of California, Berkeley
* [http://www.berkeley.edu/ University of California, Berkeley]
* University of Texas at Austin
* [https://www.utexas.edu/ University of Texas at Austin]
* Lawrence Livermore National Laboratory (LLNL)
* [https://www.llnl.gov/ Lawrence Livermore National Laboratory (LLNL)]
* North Carolina State University (NCSU)
* [http://www.ncsu.edu/ North Carolina State University (NCSU)]


== Project Impact ==
* [https://xstackwiki.modelado.org/images/0/09/DEGAS-Highlight_Summary.pdf DEGAS Project Impact]


== Mission ==
== Mission ==
Line 25: Line 27:


[[File:DEGAS-Mission.png]]
[[File:DEGAS-Mission.png]]


== Goals & Objectives ==
== Goals & Objectives ==
Line 38: Line 41:
=== Two Distinct Parallel Programming Questions ===
=== Two Distinct Parallel Programming Questions ===
* What is the parallel control model?
* What is the parallel control model?
[[File:DEGAS-Parallel-Control-Model.png]]
 
[[File:DEGAS-Parallel-Control-Model.png|500px]]
 
 


* What is the model for sharing/communication?
* What is the model for sharing/communication?
[[File:DEGAS-Sharing-Model.png]]
 
[[File:DEGAS-Sharing-Model.png|500px]]
 
 


=== Applications Drive New Programming Models ===
=== Applications Drive New Programming Models ===
[[File:DEGAS-Message-Passing.png]]
* Message Passing Programming
* Message Passing Programming
** Divide up domain in pieces
** Divide up domain in pieces
** Compute one piece and exchange
** Compute one piece and exchange
** '''MPI and many libraries'''
** '''MPI and many libraries'''
[[File:DEGAS-Message-Passing.png]]
 
[[File:DEGAS-Global-Address-Space.png]]


* Global Address Space Programming
* Global Address Space Programming
Line 54: Line 66:
** Grab whatever/whenever
** Grab whatever/whenever
** '''UPC, CAF, X10, Chapel, Fortress, Titanium, GlobalArrays'''
** '''UPC, CAF, X10, Chapel, Fortress, Titanium, GlobalArrays'''
[[File:DEGAS-Global-Address-Space.png]]
 


=== Hierarchical Programming Model ===
=== Hierarchical Programming Model ===
[[File:DEGAS-Hierarchical-PM.png|right]]
[[File:DEGAS-Hierarchical-PM.png|right|400px]]
* '''Goal:''' Programmability of exascale applications while providing scalability, locality, energy efficiency, resilience, and portability
* '''Goal:''' Programmability of exascale applications while providing scalability, locality, energy efficiency, resilience, and portability
** ''Implicit constructs:'' parallel multidimensional loops, global distributed data structures, adaptation for performance heterogeneity
** ''Implicit constructs:'' parallel multidimensional loops, global distributed data structures, adaptation for performance heterogeneity
Line 78: Line 90:
** End‐to‐end support for asynchrony (messaging, tasking, bandwidth utilization through concurrency)
** End‐to‐end support for asynchrony (messaging, tasking, bandwidth utilization through concurrency)
** Early concept exploration for applications and benchmarks
** Early concept exploration for applications and benchmarks


=== Communication-Avoiding Compilers ===
=== Communication-Avoiding Compilers ===
[[File:DEGAS-Communication-Node.png|right]]
[[File:DEGAS-Communication-Node.png|300px|right]]
* '''Goal:''' massive parallelism, deep memory and network hierarchies, plus functional and performance heterogeneity
* '''Goal:''' massive parallelism, deep memory and network hierarchies, plus functional and performance heterogeneity
** '''Fine‐grained task and data parallelism:''' enable performance portability
** '''Fine‐grained task and data parallelism:''' enable performance portability
Line 95: Line 108:
** BUPC and Habanero‐C; Zoltan
** BUPC and Habanero‐C; Zoltan
** Additional theory of CA code generation
** Additional theory of CA code generation


=== Exascale Programming: Support for Future Algorithms ===
=== Exascale Programming: Support for Future Algorithms ===
[[File:DEGAS-Algorithm.png]]
[[File:DEGAS-Algorithm.png|600px]]
* '''Approach:''' “Rethink” algorithms to optimize for data movement
* '''Approach:''' “Rethink” algorithms to optimize for data movement
** New class of communication‐optimal algorithms
** New class of communication‐optimal algorithms
Line 105: Line 119:
** Can they be automated and for what types of loops?
** Can they be automated and for what types of loops?
** How much benefit is there in practice?
** How much benefit is there in practice?


=== Adaptive Runtime Systems (ARTS) ===
=== Adaptive Runtime Systems (ARTS) ===
Line 122: Line 137:
** '''Lithe''' scheduler composition; '''Juggle'''
** '''Lithe''' scheduler composition; '''Juggle'''
** '''BUPC and Habanero‐C''' runtimes
** '''BUPC and Habanero‐C''' runtimes


=== Synchronization Avoidance vs Resource Management ===
=== Synchronization Avoidance vs Resource Management ===
Line 135: Line 151:
** Adaptation based on history and (user‐supplied) intent?
** Adaptation based on history and (user‐supplied) intent?
** Where will bottlenecks be for a given architecture and application?
** Where will bottlenecks be for a given architecture and application?


=== Lith Scheduling Abstraction: "Harts" (Hardware Threads) ===
=== Lith Scheduling Abstraction: "Harts" (Hardware Threads) ===
[[File:DEGAS-Harts.png|700px]]
[[File:DEGAS-Harts.png|700px]]


=== Lightweight Communication (GASNet-EX) ===
=== Lightweight Communication (GASNet-EX) ===
Line 155: Line 173:
** Major changes for on‐chip interconnects
** Major changes for on‐chip interconnects
** Each network has unique opportunities
** Each network has unique opportunities


=== Resilience through Containment Domains ===
=== Resilience through Containment Domains ===
Line 188: Line 207:
* Leverage BLCR, GASnet, Berkeley UPC for development, and use Containment Domains as prototype API for requirements discovery
* Leverage BLCR, GASnet, Berkeley UPC for development, and use Containment Domains as prototype API for requirements discovery
[[File:DEGAS-Resilience-Research-Area.png|300px]]
[[File:DEGAS-Resilience-Research-Area.png|300px]]


=== Energy and Performance Feedback ===
=== Energy and Performance Feedback ===
Line 213: Line 233:
== DEGAS Pieces of the Puzzle ==
== DEGAS Pieces of the Puzzle ==
[[File:DEGAS-Puzzle.png|500px]]
[[File:DEGAS-Puzzle.png|500px]]
== [http://crd.lbl.gov/assets/Uploads/FTG/Projects/DEGAS/DEGAS-products-April2016.pdf Products] from DEGAS research (as of 04/2016) ==
== [http://crd.lbl.gov/departments/computer-science/CLaSS/research/DEGAS/degas-software-releases Software Releases] ==

Latest revision as of 04:50, July 10, 2023

DEGAS
DEGAS-Logos.png
Team Members LBNL, Rice U., UC Berkeley, UT Austin, LLNL, NCSU
PI Katherine Yelick
Co-PIs Vivek Sarkar (Rice U.), James Demmel (UC Berkeley), Mattan Erez (UT Austin), Dan Quinlan (LLNL)
Website DEGAS
Download {{{download}}}

Dynamic Exascale Global Address Space or DEGAS


Team Members

Project Impact

Mission

Mission Statement: To ensure the broad success of Exascale systems through a unified programming model that is productive, scalable, portable, and interoperable, and meets the unique Exascale demands of energy efficiency and resilience.

DEGAS-Mission.png


Goals & Objectives

  • Scalability: Billion‐way concurrency, thousand‐way on chip with new architectures
  • Programmability: Convenient programming through a global address space and high‐level abstractions for parallelism, data movement and resilience
  • Performance Portability: Ensure applications can be moved across diverse machines using implicit (automatic) compiler optimizations and runtime adaptation
  • Resilience: Integrated language support for capturing state and recovering from faults
  • Energy Efficiency: Avoid communication, which will dominate energy costs, and adapt to performance heterogeneity due to system-­‐level energy management
  • Interoperability: Encourage use of languages and features through incremental adoption


Programming Models

Two Distinct Parallel Programming Questions

  • What is the parallel control model?

DEGAS-Parallel-Control-Model.png


  • What is the model for sharing/communication?

DEGAS-Sharing-Model.png


Applications Drive New Programming Models

DEGAS-Message-Passing.png

  • Message Passing Programming
    • Divide up domain in pieces
    • Compute one piece and exchange
    • MPI and many libraries

DEGAS-Global-Address-Space.png

  • Global Address Space Programming
    • Each start computing
    • Grab whatever/whenever
    • UPC, CAF, X10, Chapel, Fortress, Titanium, GlobalArrays


Hierarchical Programming Model

DEGAS-Hierarchical-PM.png
  • Goal: Programmability of exascale applications while providing scalability, locality, energy efficiency, resilience, and portability
    • Implicit constructs: parallel multidimensional loops, global distributed data structures, adaptation for performance heterogeneity
    • Explicit constructs: asynchronous tasks, phaser synchronization, locality
  • Built on scalability, performance, and asynchrony of PGAS models
    • Language experience from UPC, Habanero‐C, Co‐Array Fortran, Titanium
  • Both intra and inter‐node; focus is on node model
  • Languages demonstrate DEGAS programming model
    • Habanero‐UPC: Habanero’s intra‐node model with UPC’s inter‐node model
    • Hierarchical Co‐Array Fortran (CAF): CAF for on‐chip scaling and more
    • Exploration of high level languages: E.g., Python extended with H‐PGAS
  • Language‐independent H‐PGAS Features:
    • Hierarchical distributed arrays, asynchronous tasks, and compiler specialization for hybrid (task/loop) parallelism and heterogeneity
    • Semantic guarantees for deadlock avoidance, determinism, etc.
    • Asynchronous collectives, function shipping, and hierarchical places
    • End‐to‐end support for asynchrony (messaging, tasking, bandwidth utilization through concurrency)
    • Early concept exploration for applications and benchmarks


Communication-Avoiding Compilers

DEGAS-Communication-Node.png
  • Goal: massive parallelism, deep memory and network hierarchies, plus functional and performance heterogeneity
    • Fine‐grained task and data parallelism: enable performance portability
    • Heterogeneity: guided by functional, energy and performance characteristics
    • Energy efficiency: minimize data movement and hooks to runtime adaptation
    • Programmability: manage details of memory, heterogeneity, and containment
    • Scalability: communication and synchronization hiding through asynchrony
  • H-PGAS into the Node
    • Communication is all data movement
  • Build on code‐generation infrastructure
    • ROSE for H‐CAF and Communication‐Avoidance optimizations
    • BUPC and Habanero‐C; Zoltan
    • Additional theory of CA code generation


Exascale Programming: Support for Future Algorithms

DEGAS-Algorithm.png

  • Approach: “Rethink” algorithms to optimize for data movement
    • New class of communication‐optimal algorithms
    • Most codes are not bandwidth limited, but many should be
  • Challenges: How general are these algorithms?
    • Can they be automated and for what types of loops?
    • How much benefit is there in practice?


Adaptive Runtime Systems (ARTS)

DEGAS-Infiniband-Throughput.png
  • Goal: Adaptive runtime for manycore systems that are hierarchical, heterogeneous and provide asymmetric performance
    • Reactive and proactive control: for utilization and energy efficiency
    • Integrated tasking and communication: for hybrid programming
    • Sharing of hardware threads: required for library interoperability
  • Novelty: Scalable control; integrated tasking with communication
    • Adaptation: Runtime annotated with performance history/intentions
    • Performance models: Guide runtime optimizations, specialization
    • Hierarchical: Resource/energy
    • Tunable control: Locality/load balance
  • Leverages: Existing runtimes
    • Lithe scheduler composition; Juggle
    • BUPC and Habanero‐C runtimes


Synchronization Avoidance vs Resource Management

DEGAS-Resource-Mgmt.png


  • Management of critical resources will be more important:
    • Memory and network bandwidth limited by cost and energy
    • Capacity limited at many levels: network buffers at interfaces, internal network congestion are real and growing problems
  • Can runtimes manage these or do users need to help?
    • Adaptation based on history and (user‐supplied) intent?
    • Where will bottlenecks be for a given architecture and application?


Lith Scheduling Abstraction: "Harts" (Hardware Threads)

DEGAS-Harts.png


Lightweight Communication (GASNet-EX)

DEGAS-GASNet.png
  • Goal: Maximize bandwidth use with lightweight communication
    • One‐sided communication: to avoid over‐synchronization
    • Active‐Messages: for productivity and portability
    • Interoperability: with MPI and threading layers
  • Novelty:
    • Congestion management: for 1‐sided communication with ARTS
    • Hierarchical: communication management for H‐PGAS
    • Resilience: globally consist states and fine‐grained fault recovery
    • Progress: new models for scalability and interoperatbility
  • Leverage GASNet (redesigned):
    • Major changes for on‐chip interconnects
    • Each network has unique opportunities


Resilience through Containment Domains

DEGAS-Resilience.png
  • Goal: Provide a resilient runtime for PGAS applications
    • Applications should be able to customize resilience to their needs
    • Resilient runtime that provides easy‐to‐use mechanisms
  • Novelty: Single analyzable abstraction for resilience
    • PGAS Resilience consistency model
    • Directed and hierarchical preservation
    • Global or localized recovery
    • Algorithm and system‐specific detection, elision, and recovery
  • Leverage: Combined superset of prior approaches
    • Fast checkpoints for large bulk updates
    • Journal for small frequent updates
    • Hierarchical checkpoint‐restart
    • OS‐level save and restore
    • Distributed recovery


Resilience: Research Questions

1. How to define consistent (i.e. allowable) states in the PGAS model?

  • Theory well understood for fail‐stop message‐passing, but not PGAS.

2. How do we discover consistent states once we've defined them?

  • Containment domains offer a new approach, beyond conventional sync-and‐stop algorithms.

3. How do we reconstruct consistent states after a failure?

  • Explore low overhead techniques that minimize effort required by applications programmers.
  • Leverage BLCR, GASnet, Berkeley UPC for development, and use Containment Domains as prototype API for requirements discovery

DEGAS-Resilience-Research-Area.png


Energy and Performance Feedback

DEGAS-Nvidia-graph.png
  • Goal: Monitoring and feedback of performance and energy for online and offline optimization
    • Collect and distill: performance/energy/timing data
    • Identify and report bottlenecks: through summarization/visualization
    • Provide mechanisms: for autonomous runtime adaptation
  • Novelty: Automated runtime introspection
    • Provide monitoring: power/network utilization
    • Machine Learning: identify common characteristics
    • Resource management: including dark silicon
  • Leverage: Performance/energy counters
    • Integrated Performance Monitoring (IPM)
    • Roofline formalism
    • Performance/energy counters


Software Stack

DEGAS-Software-Stack.png


DEGAS Pieces of the Puzzle

DEGAS-Puzzle.png


Products from DEGAS research (as of 04/2016)

Software Releases