Actions

ExaCT: Difference between revisions

From Modelado Foundation

imported>Cdenny
(Blanked the page)
imported>Cdenny
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Infobox  Co-design
|name = ExaCT
|image = [[File:ExaCTWebBanner.jpg|400px]]
|imagecaption =
|developer = [http://www.lbl.gov/ LBNL], [http://www.sandia.gov/ SNL], [http://www.lanl.gov/ LANL], [http://www.ornl.gov/ ORNL], [https://www.llnl.gov/ LLNL], [http://www.nrel.gov/ NREL], [http://www.rutgers.edu/ Rutgers U.], [https://www.utexas.edu/ UT Austin], [http://www.gatech.edu/ Georgia Tech], [http://www.stanford.edu/ Standford U.], [http://www.utah.edu/ U. of Utah]
|latest_release_version = version x.y.z
|latest_release_date = Latest Release Date here
|operating_system = Linux, Unix, etc.
|genre = Computational Chemistry?
|license = Open Source or else?
|website = [http://exactcodesign.org http://exactcodesign.org]
}}


== Introduction ==
=== Physics of Gas-Phase Combustion represented by PDE’s ===
[[File:ExaCT-Gas-Phase-Combustion.png|right|250px]]
* Focus on gas phase combustion in both compressible and low-Mach limits
* Fluid mechanics
** Conservation of mass
** Conservation of momentum
** Conservation of energy
* Thermodynamics
** Pressure, density, temperature relationships for multicomponent mixtures
* Chemistry
** Reaction kinetics
* Species transport
** Diffusive transport of different chemical species within the flame
=== Code Base ===
* S3D
** Fully compressible Navier Stokes
** Eighth-order in space, fourth order in time
** Fully explicit, uniform grid
** Time step limited by acoustics / chemical time scales
** Hybrid implementation with MPI + OpenMP
** Implemented for Titan at ORNL using OpenACC
* LMC
** Low Mach number formulation
** Projection-based discretization strategy
** Second-order in space and time
** Semi-implicit treatment of advection and diffusion
** Time step based on advection velocity
** Stiff ODE integration methodology for chemical kinetics
** Incorporates block-structured adaptive mesh refinement
** Hybrid implementation with MPI + OpenMP
* Target is computational model that supports compressible and low Mach number AMR simulation with integrated UQ
=== Adaptive Mesh Refinement ===
[[File:ExaCT-AMR.png|right|300px]]
* Need for AMR
** Reduce memory
** Scaling analysis – For explicit schemes flops scale with memory ^ 4/3
* Block-structured AMR
** Data organized into logically-rectangular structured grids
** Amortize irregular work
** Good match for multicore architectures
* AMR introduces extra algorithm issues not found in static codes
** Metadata manipulation
** Regridding operations
** Communications patterns
=== Preliminary Observations ===
* Need to rethink how we approach PDE discretization methods for multiphysics applications
** Exploit relationship between scales
** More concurrency
** More locality with reduced synchronization
** Less memory / FLOP
** Analysis of algorithms has typically been based on a performance = FLOPS paradigm – can we analyze algorithms in terms of a more realistic performance model
* Need to integrate analysis with simulation
** Combustion simulations are data rich
** Writing data to disk for subsequent analysis is currently near infeasibility
** Makes simulation look much more like physical experiments in terms of methodology
* Current programming models are inadequate for the task
** We describe algorithms serially and add things to express parallelism at different levels of the algorithm
** We express codes in terms of FLOPS and let the compiler figure out the data movement
** Non-uniform memory access is already an issue but programmers can’t easily control data layout
* Need to evaluate tradeoffs in terms of potential architectural features
=== How Core Numerics Will Change ===
* Core numerics
** Higher-order for low Mach number formulations
** Improved coupling methodologies for multiphysics problems
** Asynchronous treatment of physical processes
* Refactoring AMR for the exascale
** Current AMR characteristics
*** Global flat metadata
*** Load-balancing based on floating point work
*** Sequential treatment of levels of refinement
** For next generation
*** Hierarchical, distributed metadata
*** Consider communication cost as part of load balancing for more realistic estimate of work (topology aware)
*** Regridding includes cost of data motion
*** Statistical performance models
*** Alternative time-stepping algorithm – treat levels simultaneously
=== Data Analysis ===
[[File:ExaCT-Data-Analysis.png|right|300px]]
* Current simulations produce 1.5 Tbytes of data for analysis at each time step (Checkpoint data is 3.2 Tbytes)
** Archiving data for subsequent analysis is currently at limit of what can be done
** Extrapolating to the exascale, this becomes completely infeasible
* Need to integrate analysis with simulation
** Design the analysis to be run as part of the simulation definition
*** Visualizations
*** Topological analysis
*** Lagrangian tracer particles
*** Local flame coordinates
*** Etc.
* Approach based on hybrid staging concept
** Incorporate computing to reduce data volume at different stages along the path from memory to permanent file storage
== Co-design Process ==
[[File:ExaCT-Co-design_Process.png|right|400px]]
* Identify key simulation element
** Algorithmic
** Software
** Hardware
* Define representative code (proxy app)
* Analytic performance model
** Algorithm variations
** Architectural features
** Identify critical parameters
* Validate performance with hardware simulators/measurements
* Document tradeoffs
** Input to vendors
** Helps define programming model requirements
* Refine and iterate
== Applications ==
=== Proxy Applications ===
* Caveat
** Proxy apps are designed to address a specific co-design issue.
** Union of proxy apps is not a complete characterization of application
** Anticipated methodology for exascale not fully captured by current full applications
* Proxies
** Compressible Navier Stokes without species
*** Basic test for stencil operations, primarily at node level
*** Coming soon – generalization to multispecies with reactions (minimalist full application)
** Multigrid algorithm – 7 point stencil
*** Basic test for network issues
*** Coming soon – denser stencils
** Chemical integration
*** Kernel test for local, computationally intense kernel
** Others coming soon
*** Integrated UQ kernels
*** Skeletal model of full workflow
*** Visualization / analysis proxy apps
=== Visualization/Topology/Statistics Proxy Applications ===
* Proxies are algorithms with flexibility to explore multiple execution models
** Multiple strategies for local computation algorithms
** Support for various merge/broadcast communication patterns
* Topological analysis
** Three phases (local compute/communication/feature-based statistics)
** Low/no flops, highly branching code
** Compute complexity is data dependent
** Communication load is data dependent
** Requires gather/scatter of data
* Visualization
** Two phases (local compute/image compositing)
** Moderate FLOPS
** Compute complexity is data dependent
** Communication load is data dependent
** Requires gather
* Statistics
** Two phases (local compute/aggregation)
** Compute is all FLOPs
** Communication load is constant and small
** Requires gather, optional scatter of data
== Kernel Use ==
== Description ==
== Download ==

Latest revision as of 16:47, February 13, 2013

ExaCT
ExaCTWebBanner.jpg
Developer(s) LBNL, SNL, LANL, ORNL, LLNL, NREL, Rutgers U., UT Austin, Georgia Tech, Standford U., U. of Utah
Stable Release version x.y.z/Latest Release Date here
Operating Systems Linux, Unix, etc.
Type Computational Chemistry?
License Open Source or else?
Website http://exactcodesign.org

Introduction

Physics of Gas-Phase Combustion represented by PDE’s

ExaCT-Gas-Phase-Combustion.png
  • Focus on gas phase combustion in both compressible and low-Mach limits
  • Fluid mechanics
    • Conservation of mass
    • Conservation of momentum
    • Conservation of energy
  • Thermodynamics
    • Pressure, density, temperature relationships for multicomponent mixtures
  • Chemistry
    • Reaction kinetics
  • Species transport
    • Diffusive transport of different chemical species within the flame

Code Base

  • S3D
    • Fully compressible Navier Stokes
    • Eighth-order in space, fourth order in time
    • Fully explicit, uniform grid
    • Time step limited by acoustics / chemical time scales
    • Hybrid implementation with MPI + OpenMP
    • Implemented for Titan at ORNL using OpenACC
  • LMC
    • Low Mach number formulation
    • Projection-based discretization strategy
    • Second-order in space and time
    • Semi-implicit treatment of advection and diffusion
    • Time step based on advection velocity
    • Stiff ODE integration methodology for chemical kinetics
    • Incorporates block-structured adaptive mesh refinement
    • Hybrid implementation with MPI + OpenMP
  • Target is computational model that supports compressible and low Mach number AMR simulation with integrated UQ

Adaptive Mesh Refinement

ExaCT-AMR.png
  • Need for AMR
    • Reduce memory
    • Scaling analysis – For explicit schemes flops scale with memory ^ 4/3
  • Block-structured AMR
    • Data organized into logically-rectangular structured grids
    • Amortize irregular work
    • Good match for multicore architectures
  • AMR introduces extra algorithm issues not found in static codes
    • Metadata manipulation
    • Regridding operations
    • Communications patterns

Preliminary Observations

  • Need to rethink how we approach PDE discretization methods for multiphysics applications
    • Exploit relationship between scales
    • More concurrency
    • More locality with reduced synchronization
    • Less memory / FLOP
    • Analysis of algorithms has typically been based on a performance = FLOPS paradigm – can we analyze algorithms in terms of a more realistic performance model
  • Need to integrate analysis with simulation
    • Combustion simulations are data rich
    • Writing data to disk for subsequent analysis is currently near infeasibility
    • Makes simulation look much more like physical experiments in terms of methodology
  • Current programming models are inadequate for the task
    • We describe algorithms serially and add things to express parallelism at different levels of the algorithm
    • We express codes in terms of FLOPS and let the compiler figure out the data movement
    • Non-uniform memory access is already an issue but programmers can’t easily control data layout
  • Need to evaluate tradeoffs in terms of potential architectural features

How Core Numerics Will Change

  • Core numerics
    • Higher-order for low Mach number formulations
    • Improved coupling methodologies for multiphysics problems
    • Asynchronous treatment of physical processes
  • Refactoring AMR for the exascale
    • Current AMR characteristics
      • Global flat metadata
      • Load-balancing based on floating point work
      • Sequential treatment of levels of refinement
    • For next generation
      • Hierarchical, distributed metadata
      • Consider communication cost as part of load balancing for more realistic estimate of work (topology aware)
      • Regridding includes cost of data motion
      • Statistical performance models
      • Alternative time-stepping algorithm – treat levels simultaneously

Data Analysis

ExaCT-Data-Analysis.png
  • Current simulations produce 1.5 Tbytes of data for analysis at each time step (Checkpoint data is 3.2 Tbytes)
    • Archiving data for subsequent analysis is currently at limit of what can be done
    • Extrapolating to the exascale, this becomes completely infeasible
  • Need to integrate analysis with simulation
    • Design the analysis to be run as part of the simulation definition
      • Visualizations
      • Topological analysis
      • Lagrangian tracer particles
      • Local flame coordinates
      • Etc.
  • Approach based on hybrid staging concept
    • Incorporate computing to reduce data volume at different stages along the path from memory to permanent file storage


Co-design Process

ExaCT-Co-design Process.png
  • Identify key simulation element
    • Algorithmic
    • Software
    • Hardware
  • Define representative code (proxy app)
  • Analytic performance model
    • Algorithm variations
    • Architectural features
    • Identify critical parameters
  • Validate performance with hardware simulators/measurements
  • Document tradeoffs
    • Input to vendors
    • Helps define programming model requirements
  • Refine and iterate


Applications

Proxy Applications

  • Caveat
    • Proxy apps are designed to address a specific co-design issue.
    • Union of proxy apps is not a complete characterization of application
    • Anticipated methodology for exascale not fully captured by current full applications
  • Proxies
    • Compressible Navier Stokes without species
      • Basic test for stencil operations, primarily at node level
      • Coming soon – generalization to multispecies with reactions (minimalist full application)
    • Multigrid algorithm – 7 point stencil
      • Basic test for network issues
      • Coming soon – denser stencils
    • Chemical integration
      • Kernel test for local, computationally intense kernel
    • Others coming soon
      • Integrated UQ kernels
      • Skeletal model of full workflow
      • Visualization / analysis proxy apps

Visualization/Topology/Statistics Proxy Applications

  • Proxies are algorithms with flexibility to explore multiple execution models
    • Multiple strategies for local computation algorithms
    • Support for various merge/broadcast communication patterns
  • Topological analysis
    • Three phases (local compute/communication/feature-based statistics)
    • Low/no flops, highly branching code
    • Compute complexity is data dependent
    • Communication load is data dependent
    • Requires gather/scatter of data
  • Visualization
    • Two phases (local compute/image compositing)
    • Moderate FLOPS
    • Compute complexity is data dependent
    • Communication load is data dependent
    • Requires gather
  • Statistics
    • Two phases (local compute/aggregation)
    • Compute is all FLOPs
    • Communication load is constant and small
    • Requires gather, optional scatter of data

Kernel Use

Description

Download