ExaCT
From Modelado Foundation
ExaCT | |
---|---|
Location to an image/logo (if any) Image Caption | |
Developer(s) | LBNL, SNL, LANL, ORNL, LLNL, ONREL, Rutgers U.,
U. of Texas at Austin, Georgia Tech, Standford U., U. of Utah |
Stable Release | version x.y.z/Latest Release Date here |
Operating Systems | Linux, Unix, etc. |
Type | Computational Chemistry? |
License | Open Source or else? |
Website | http://exactcodesign.org |
Introduction
Physics of Gas-Phase Combustion represented by PDE’s
- Focus on gas phase combustion in both compressible and low-Mach limits
- Fluid mechanics
- Conservation of mass
- Conservation of momentum
- Conservation of energy
- Thermodynamics
- Pressure, density, temperature relationships for multicomponent mixtures
- Chemistry
- Reaction kinetics
- Species transport
- Diffusive transport of different chemical species within the flame
Code Base
- S3D
- Fully compressible Navier Stokes
- Eighth-order in space, fourth order in time
- Fully explicit, uniform grid
- Time step limited by acoustics / chemical time scales
- Hybrid implementation with MPI + OpenMP
- Implemented for Titan at ORNL using OpenACC
- LMC
- Low Mach number formulation
- Projection-based discretization strategy
- Second-order in space and time
- Semi-implicit treatment of advection and diffusion
- Time step based on advection velocity
- Stiff ODE integration methodology for chemical kinetics
- Incorporates block-structured adaptive mesh refinement
- Hybrid implementation with MPI + OpenMP
- Target is computational model that supports compressible and low Mach number AMR simulation with integrated UQ
Adaptive Mesh Refinement
- Need for AMR
- Reduce memory
- Scaling analysis – For explicit schemes flops scale with memory ^ 4/3
- Block-structured AMR
- Data organized into logically-rectangular structured grids
- Amortize irregular work
- Good match for multicore architectures
- AMR introduces extra algorithm issues not found in static codes
- Metadata manipulation
- Regridding operations
- Communications patterns
Preliminary Observations
- Need to rethink how we approach PDE discretization methods for multiphysics applications
- Exploit relationship between scales
- More concurrency
- More locality with reduced synchronization
- Less memory / FLOP
- Analysis of algorithms has typically been based on a performance = FLOPS paradigm – can we analyze algorithms in terms of a more realistic performance model
- Need to integrate analysis with simulation
- Combustion simulations are data rich
- Writing data to disk for subsequent analysis is currently near infeasibility
- Makes simulation look much more like physical experiments in terms of methodology
- Current programming models are inadequate for the task
- We describe algorithms serially and add things to express parallelism at different levels of the algorithm
- We express codes in terms of FLOPS and let the compiler figure out the data movement
- Non-uniform memory access is already an issue but programmers can’t easily control data layout
- Need to evaluate tradeoffs in terms of potential architectural features
How Core Numerics Will Change
- Core numerics
- Higher-order for low Mach number formulations
- Improved coupling methodologies for multiphysics problems
- Asynchronous treatment of physical processes
- Refactoring AMR for the exascale
- Current AMR characteristics
- Global flat metadata
- Load-balancing based on floating point work
- Sequential treatment of levels of refinement
- For next generation
- Hierarchical, distributed metadata
- Consider communication cost as part of load balancing for more realistic estimate of work (topology aware)
- Regridding includes cost of data motion
- Statistical performance models
- Alternative time-stepping algorithm – treat levels simultaneously
- Current AMR characteristics
Data Analysis
- Current simulations produce 1.5 Tbytes of data for analysis at each time step (Checkpoint data is 3.2 Tbytes)
- Archiving data for subsequent analysis is currently at limit of what can be done
- Extrapolating to the exascale, this becomes completely infeasible
- Need to integrate analysis with simulation
- Design the analysis to be run as part of the simulation definition
- Visualizations
- Topological analysis
- Lagrangian tracer particles
- Local flame coordinates
- Etc.
- Design the analysis to be run as part of the simulation definition
- Approach based on hybrid staging concept
- Incorporate computing to reduce data volume at different stages along the path from memory to permanent file storage
Co-design Process
- Identify key simulation element
- Algorithmic
- Software
- Hardware
- Define representative code (proxy app)
- Analytic performance model
- Algorithm variations
- Architectural features
- Identify critical parameters
- Validate performance with hardware simulators/measurements
- Document tradeoffs
- Input to vendors
- Helps define programming model requirements
- Refine and iterate
Applications
Proxy Applications
- Caveat
- Proxy apps are designed to address a specific co-design issue.
- Union of proxy apps is not a complete characterization of application
- Anticipated methodology for exascale not fully captured by current full applications
- Proxies
- Compressible Navier Stokes without species
- Basic test for stencil operations, primarily at node level
- Coming soon – generalization to multispecies with reactions (minimalist full application)
- Multigrid algorithm – 7 point stencil
- Basic test for network issues
- Coming soon – denser stencils
- Chemical integration
- Kernel test for local, computationally intense kernel
- Others coming soon
- Integrated UQ kernels
- Skeletal model of full workflow
- Visualization / analysis proxy apps
- Compressible Navier Stokes without species
Visualization/Topology/Statistics Proxy Applications
- Proxies are algorithms with flexibility to explore multiple execution models
- Multiple strategies for local computation algorithms
- Support for various merge/broadcast communication patterns
- Topological analysis
- Three phases (local compute/communication/feature-based statistics)
- Low/no flops, highly branching code
- Compute complexity is data dependent
- Communication load is data dependent
- Requires gather/scatter of data
- Visualization
- Two phases (local compute/image compositing)
- Moderate FLOPS
- Compute complexity is data dependent
- Communication load is data dependent
- Requires gather
- Statistics
- Two phases (local compute/aggregation)
- Compute is all FLOPs
- Communication load is constant and small
- Requires gather, optional scatter of data