SLEEC: Difference between revisions
From Modelado Foundation
imported>Cdenny No edit summary |
No edit summary |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 4: | Line 4: | ||
| imagecaption = | | imagecaption = | ||
| team-members = [http://www.purdue.edu/ Purdue U.], [http://www.sandia.gov/ SNL] | | team-members = [http://www.purdue.edu/ Purdue U.], [http://www.sandia.gov/ SNL] | ||
| pi = Milind Kulkarni | | pi = [[Milind Kulkarni]] | ||
| co-pi = Arun Prakash (Purdue), Michael Parks (SNL) | | co-pi = Arun Prakash (Purdue), Michael Parks (SNL) | ||
| website = https://engineering.purdue.edu/SLEEC | | website = https://engineering.purdue.edu/SLEEC | ||
Line 77: | Line 77: | ||
* Needed: ''a generic infrastructure for incorporating domain knowledge'' | * Needed: ''a generic infrastructure for incorporating domain knowledge'' | ||
== Project Impact == | |||
* [https://xstackwiki.modelado.org/images/b/b9/SLEEC-Highlight_summary.pdf SLEEC Project Impact] | |||
== Principles == | == Principles == | ||
Line 209: | Line 212: | ||
* Generic compiler infrastructure | * Generic compiler infrastructure | ||
* “Showcase” annotated libraries | * “Showcase” annotated libraries | ||
== Products == | |||
=== Software Releases === | |||
* '''SemCache''' Will provide annotated, concrete, domain-specific libraries that use SLEEC technology to automatically manage communication between CPU and GPU for codes using Trilinos/Kokkos. Integration in progress. | |||
=== Presentations/Papers === | |||
* "Exploiting Domain Knowledge to Optimize Parallel Computational Mechanics Codes." ICS 2013 [https://engineering.purdue.edu/~milind/docs/ics13b.pdf (PDF)] | |||
* "SemCache: Semantics-aware Caching for Efficient GPU Offloading." ICS 2013. [https://engineering.purdue.edu/~milind/docs/ics13a.pdf (PDF)] | |||
* SLEEC 2014 Progress Presentation [https://engineering.purdue.edu/SLEEC/SLEEC-2014.pdf (PDF)] | |||
* "SemCache++: Semantics-aware Caching for Efficient Multi-GPU Offloading." ICS 2015. [https://engineering.purdue.edu/~milind/docs/ics15.pdf (PDF)] | |||
* "Exploiting Domain Knowledge to Optimize Mesh Partitioning for Multiscale Methods." Supercomputing 2015 (Poster). [https://engineering.purdue.edu/~milind/docs/sc15.pdf (PDF)] | |||
* "Optimizing the LULESH Stencil Code using Concurrent Collections." WolfHPC 2015. [https://engineering.purdue.edu/~milind/docs/wolfhpc15.pdf (PDF)] |
Latest revision as of 05:03, July 10, 2023
SLEEC | |
---|---|
Team Members | Purdue U., SNL |
PI | Milind Kulkarni |
Co-PIs | Arun Prakash (Purdue), Michael Parks (SNL) |
Website | https://engineering.purdue.edu/SLEEC |
Download | {{{download}}} |
Semantics-rich Libraries for Effective Exascale Computation or SLEEC
Team Members
- Purdue University: Milind Kulkarni, Arun Prakash, Vijay Pai, Sam Midkiff
- Sandia National Laboratory: Michael Parks
Motivations
- Modern computational science applications composed of many different libraries
- Computational libraries, communication libraries, data structure libraries, etc.
- Peridigm, developed by co-PI Mike Parks, builds on 10 different Trilinos libraries
- Each library has its own idioms and expected usage
- Determining right way to compose and use libraries to solve a problem is difficult
Compositional complexity
- Consider loosely-coupled multi-scale computational mechanics problem (developed by co-PI Arun Prakash)
- Must determine right way to decompose problem, couple separate solutions, etc.
- Simple case: fixed number of subdomains, only consider how to couple them together
- Vast space of configurations: 8 subdomains → 135K possible schedules
- Large variation in performance of different orders
- Exploration of different variants requires knowledge of domain semantics, cost estimates
Difficult interaction between libraries
- Peridigm: computational peridynamics code
- Allows modeling of materials under stress without explicit accounting for discontinuities (fractures, etc.)
- Built on Trilinos components
- Set of computation and communication libraries
- Requires careful coordination of data movement operations to manage shadow data, etc. needed by solvers
- But data movement requirements can be directly inferred from which equations are being solved
Prior Results
- Exploiting library semantics to improve lock placement
- Exploiting library semantics to improve parallelism and locality
Why not compilers
- Compilers do not understand library calls as abstractions
- Option one: see them as black boxes which give no information → no opportunity for optimization
- Option two: break abstraction boundaries and try to optimize → many transformation opportunities are only possible by understanding semantics of abstractions
- Needed: a way for compilers to understand abstractions
- Broadway project attempted this, but focused on analyzing across abstractions, not semantics-driven transformations
Why not domain-specific languages?
- DSLs are a great fit for this
- Bake abstractions into the language
- Optimize code at high level of abstraction based on semantic properties
- Shown to be effective in various domains
- SPL/Spiral for digital signal processing, Tensor contraction engine, etc.
- But they are not generalizable
- New domain? New DSL!
- What about applications that span domains? (e.g., multiphysics codes)
- Needed: a generic infrastructure for incorporating domain knowledge
Project Impact
Principles
- Abstractions carried by domain libraries
- Domain experts encode semantics, not compiler writers
- Need effective annotation language for capturing semantics
- Compiler should be domain agnostic
- Same infrastructure used for optimization and transformation regardless of domain
- Need common IR for capturing abstractions
- Compiler should be able to optimize for various objectives
- Do not want to focus solely on performance
- Need generic optimization ability and cost models
Components
- Annotation language for capturing semantic properties of domain libraries
- High-level intermediate representation to represent programs that use annotated domain libraries
- Transformation strategies that leverage annotations to perform semantics-driven code transformations
- Optimization heuristics that use domain-specific cost models to find more efficient program variants
- Iterative refinement techniques that let the compiler work with incomplete information and infer missing information when possible
Example
- Consider annotated linear algebra library that supports two methods
- Matrix multiply
- Equation solve
- Operations have mathematical properties that establish equivalence
- Can solve ABx = b in two ways:
- C = AB followed by Cx = b
- Az = b followed by Bx = z
- Latter may be more effective if A & B have special properties (e.g., triangular)
- Can solve ABx = b in two ways:
Abstract
- Abstract into high level representation
- Expression tree to capture flow of data
- Library methods represented as high level operations
- Operands can be subtrees, too, to support composition
Transform
- Transformations expressed as rewrite rules on expression trees
- Rewrites match operation types (domain specific) but compiler applies them without understanding domain semantics
Concretize
- Re-materialize back to source code, or transform to other, lower-level IR
Annotation Language
- Domain libraries annotated by domain experts to interface with compiler infrastructure
- Questions
- How to abstract libraries into IR
- What kinds of transformations are legal
- Represent as rewrite rules
- How to verify? Can we synthesize?
- How to concretize
- Can this be inferred?
Cost models
- Most annotations deal with library interface
- Semantic properties are associated with library specification, not implementation
- Can also provide cost estimates for library methods
- Implementation and architecture specific
- Can express other properties of implementation
- Energy estimates
- Accuracy information
Compiler Infrastructure
- Compiler does not explicitly understand domains
- But is extensible, allowing IR to be extended as new domains are added
- Transformations are just pattern-matched rewrite rules
- Can use domain-specific information such as domain-specific equivalences, domain-specific properties
- Can also substitute equivalent implementations of same method
- Generic compiler + annotated domain library = domain-specific compiler
Cost-drive optimization
- Applying transformations to program generates semantically equivalent program variants
- No “best” variant: different implementations will work better in different situations or optimize for different metrics
- Compilation as optimization problem
- Minimize objective function
- FLOPs, energy efficiency, etc.
- Subject to constraints
- Semantically equivalent to original program, meets accuracy constraints, etc.
- Minimize objective function
- Same infrastructure can be used to optimize for a variety of metrics
Iterative Refinement
- Typical problem with domain-specific languages or annotation approaches: what if program is incompletely annotated?
- Want compiler to still produce useful results
- Key property: compilation process is about optimization, not correctness
- Lack of information does not raise correctness issues
- As more annotations are provided, compilation results improve
Inference
- Can we infer missing information?
- Transformation annotations
- Can we use synthesis techniques to infer legal transformations?
- Cost models
- Can we use machine learning techniques to build cost models automatically?
Potential Impacts
- Programmability: Programmers can focus on developing methods, using high level libraries, without worrying about careful optimization
- Performance portability: Ability to select between library variants automatically eases transition to new architectures
- Scalability: Cost models can incorporate parallelism, locality, communication to enhance scalability
- Energy efficiency: Parameterized compilation can optimize for energy use instead of performance without rewriting infrastructure
- Resilience: Cost models can incorporate resilience information (e.g., algorithmic fault tolerance information), compilation can choose variants based on resilience properties
Implementation Plan
- Work driven by “challenge” applications and domains
- Computational mechanics and multiscale techniques (lead: Arun Prakash)
- Peridynamics and Trilinos libraries (lead: Michael Parks)
- Build compiler infrastructure in ROSE
- Compiler infrastructure and optimization strategies (leads: Milind Kulkarni and Sam Midkiff)
- Annotation language and IR (leads: Milind Kulkarni and Sam Midkiff)
- Cost models and performance modeling (lead: Vijay Pai)
Concrete Deliverables
- Annotation language
- Common IR
- Generic compiler infrastructure
- “Showcase” annotated libraries
Products
Software Releases
- SemCache Will provide annotated, concrete, domain-specific libraries that use SLEEC technology to automatically manage communication between CPU and GPU for codes using Trilinos/Kokkos. Integration in progress.
Presentations/Papers
- "Exploiting Domain Knowledge to Optimize Parallel Computational Mechanics Codes." ICS 2013 (PDF)
- "SemCache: Semantics-aware Caching for Efficient GPU Offloading." ICS 2013. (PDF)
- SLEEC 2014 Progress Presentation (PDF)
- "SemCache++: Semantics-aware Caching for Efficient Multi-GPU Offloading." ICS 2015. (PDF)
- "Exploiting Domain Knowledge to Optimize Mesh Partitioning for Multiscale Methods." Supercomputing 2015 (Poster). (PDF)
- "Optimizing the LULESH Stencil Code using Concurrent Collections." WolfHPC 2015. (PDF)