Modelado Foundation - User contributions [en]

Main Page

2025-06-09T00:14:03Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
|[[About|<div class="navButton">About</div>]]
|[[Team|<div class="navButton">Team</div>]]
|[[Groups|<div class="navButton">Groups</div>]]
|[[Extreme Scale Software Stack|<div class="navButton">Content</div>]]
|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}
====Empowering Collaboration for Advanced Computing and Smart City Technology====
The Modelado Foundation, also known as OpenCommons, is a 501c6 non-profit organization that serves as a neutral platform for companies, universities, and government agencies to collaborate on advanced and extreme-scale and distributed computing. With a rich history rooted in fostering innovation and collaboration, the foundation has become a trusted facilitator for cutting-edge initiatives in the technology sector.

<div style='text-align: center;'>
{{#ask:
[[Category:Simimage]]
|order=random
|?=#
|?Has image#=2
|?Has author#=3
|?Has description#=4

|format=slideshow
|template=Single image
|nav controls=no
|delay=5
|height=800px
|width=1170px
|effect=none

}}
</div>


{{#seo:
|title=Modelado Foundation – Empowering Collaborative Development
|description=Meet the growing need for collaboration and knowledge sharing in the fields of advanced and distributed computing.
|keywords=Modelado, HPC, Sipercomputing
|image=https://cullyclc.opencommons.org/images/2/2c/Cully_Logo.png
|image_alt=https://modelado.org/File:modeladoZ.jpg
|type=website
|site_name=Modelado Foundation
|og:type=website
|og:locale=en_US
}}

Sandbox

2025-06-08T22:36:24Z

Pinfold:

<div style='text-align: center;'>
{{#ask:
[[Category:Simimage]]
|order=random
|?=#
|?Has image#=2
|?Has author#=3
|?Has description#=4

|format=slideshow
|template=Single image
|nav controls=no
|delay=5
|height=800px
|width=1170px
|effect=none

}}
</div>

Sandbox

2025-06-08T22:35:09Z

Pinfold:

;Zoom
:{{#ev:youtube|cIzilZcemzQ}}

<div style='text-align: center;'>
{{#ask:
[[Category:Simimage]]
|order=random
|?=#
|?Has image#=2
|?Has author#=3
|?Has description#=4

|format=slideshow
|template=Single image
|nav controls=no
|delay=5
|height=800px
|width=1170px
|effect=none

}}
</div>

Sandbox

2025-06-08T22:34:53Z

Pinfold:

;Zoom
:{{#ev:YouTube|cIzilZcemzQ}}

<div style='text-align: center;'>
{{#ask:
[[Category:Simimage]]
|order=random
|?=#
|?Has image#=2
|?Has author#=3
|?Has description#=4

|format=slideshow
|template=Single image
|nav controls=no
|delay=5
|height=800px
|width=1170px
|effect=none

}}
</div>

Welcome

2023-08-31T05:50:44Z

Pinfold: Changed redirect target from Pagename to Main Page

#REDIRECT [[Main_Page]]

Welcome

2023-08-31T05:50:04Z

Pinfold: Redirected page to Pagename

#REDIRECT [[pagename]]

Groups

2023-08-31T05:46:35Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}__NOTOC__
<span style="font-size:150%;">There are several groups that do business under the 501c6 structure of the Modelado Foundation. Two are public:</span>

[[File:OpenCommons.png|350px|link=https://opencommons.org/Main_Page|OpenCommons]]
[[File:HiHAT Logo.png|350px|link=https://hihat.opencommons.org/Main_Page|HiHAT]]
[[File:sc.svg|350px|link=https://modelado.org/Reproducible_Computational_Science|Reproducible Computational Science]]

<span style="font-size:150%;">One is Private</span>

[[File:TheCirrusGroupC.png|500px|link=https://thecirrusgroup.co/Main_Page|The Cirrus Group]]

Groups

2023-08-31T05:46:12Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}__NOTOC__
<span style="font-size:150%;">There are several groups that do business under the 501c6 structure of the Modelado Foundation. Two are public:</span>

[[File:OpenCommons.png|400px|link=https://opencommons.org/Main_Page|OpenCommons]]
[[File:HiHAT Logo.png|400px|link=https://hihat.opencommons.org/Main_Page|HiHAT]]
[[File:sc.svg|400px|link=https://modelado.org/Reproducible_Computational_Science|Reproducible Computational Science]]

<span style="font-size:150%;">One is Private</span>

[[File:TheCirrusGroupC.png|500px|link=https://thecirrusgroup.co/Main_Page|The Cirrus Group]]

Groups

2023-08-31T05:45:56Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}__NOTOC__
<span style="font-size:150%;">There are several groups that do business under the 501c6 structure of the Modelado Foundation. Two are public:</span>

[[File:OpenCommons.png|300px|link=https://opencommons.org/Main_Page|OpenCommons]]
[[File:HiHAT Logo.png|300px|link=https://hihat.opencommons.org/Main_Page|HiHAT]]
[[File:sc.svg|300px|link=https://modelado.org/Reproducible_Computational_Science|Reproducible Computational Science]]

<span style="font-size:150%;">One is Private</span>

[[File:TheCirrusGroupC.png|500px|link=https://thecirrusgroup.co/Main_Page|The Cirrus Group]]

File:Sc.svg

2023-08-31T05:45:26Z

Pinfold:

Groups

2023-08-31T05:43:36Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}__NOTOC__
<span style="font-size:150%;">There are several groups that do business under the 501c6 structure of the Modelado Foundation. Two are public:</span>

[[File:OpenCommons.png|500px|link=https://opencommons.org/Main_Page|OpenCommons]]
[[File:HiHAT Logo.png|500px|link=https://hihat.opencommons.org/Main_Page|HiHAT]]
[[File:sc.svg|500px|link=https://modelado.org/Reproducible_Computational_Science|Reproducible Computational Science]]

<span style="font-size:150%;">One is Private</span>

[[File:TheCirrusGroupC.png|500px|link=https://thecirrusgroup.co/Main_Page|The Cirrus Group]]

Groups

2023-08-31T05:12:24Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}__NOTOC__
<span style="font-size:150%;">There are several groups that do business under the 501c6 structure of the Modelado Foundation. Two are public:</span>

[[File:OpenCommons.png|500px|link=https://opencommons.org/Main_Page|OpenCommons]]
[[File:HiHAT Logo.png|500px|link=https://hihat.opencommons.org/Main_Page|HiHAT]]

<span style="font-size:150%;">One is Private</span>

[[File:TheCirrusGroupC.png|500px|link=https://thecirrusgroup.co/Main_Page|The Cirrus Group]]

Groups

2023-08-31T05:11:19Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}__NOTOC__
<span style="font-size:150%;">There are several groups that do business under the 501c6 structure of the Modelado Foundation. Two are public:</span>

[[File:OpenCommons.png|500px|link=https://opencommons.org/Main_Page|OpenCommons]]
[[File:HiHAT Logo.png|500px|link=https://hihat.opencommons.org/Main_Page|HiHAT]]

<span style="font-size:150%;">One is Private</span>

[[File:TheCirrusGroupC.png|500px|link=https://thecirrusgroup.co/Main_Page The Cirrus Group]]

Welcome

2023-08-31T05:09:10Z

Pinfold:

== Please choose the following program landing page: ==

'''[[Extreme Scale Software Stack]]''' (ESS)

'''[https://hihat.opencommons.org/Main_Page Hierarchical Heterogeneous Asynchronous Tasking]''' (HiHat)

'''[[Global City Teams Challenge Super Action Cluster Summit]]'''

'''[[Smart Citizen Communities]]''' (SCC)

'''[[Reproducible Computational Science]]''' (RCS)

----

'''[[Dynamic Runtime Community Meeting]]''' July 25-26, 2017

Typhoon Mawar 2005 Computer Simulation

2023-07-28T18:55:18Z

Pinfold: Created page with "{{Simimage |description=A 48-hour computer simulation of Typhoon Mawar using the Weather Research and Forecasting model |location=wikipedia:Small Modular Reactor |image=Typhoo..."

{{Simimage
|description=A 48-hour computer simulation of Typhoon Mawar using the Weather Research and Forecasting model
|location=wikipedia:Small Modular Reactor
|image=Typhoon_Mawar_2005_computer_simulation_thumbnail.gif
|author=High Performance Computer Modernization Program
}}

Binary Black Hole

2023-07-28T18:46:37Z

Pinfold:

{{Simimage
|description=This computer generated image shows the warped view of a pair of supermassive black holes orbiting each other
|location=wikipedia:Binary_black_hole
|image=Binary_Black_Hole.jpg
|author=Jeremy Schnittman
}}

Ionring Blackhole

2023-07-28T18:44:14Z

Pinfold:

{{Simimage
|description=Simulation of a side view of a black hole with transparent toroidal ring of ionized matter according to a proposed model[88] for Sgr A*. This image shows the result of bending of light from behind the black hole, and it also shows the asymmetry arising by the Doppler effect from the extremely high orbital speed of the matter in the ring.
|location=wikipedia:Supermassive_black_hole
|image=IonringBlackhole.jpeg
}}

Ionring Blackhole

2023-07-28T18:40:45Z

Pinfold: Created page with "{{Simimage |description=Simulation of a side view of a black hole with transparent toroidal ring of ionized matter |image=IonringBlackhole.jpeg }} Simulation of a side view of..."

{{Simimage
|description=Simulation of a side view of a black hole with transparent toroidal ring of ionized matter
|image=IonringBlackhole.jpeg
}}
Simulation of a side view of a black hole with transparent toroidal ring of ionized matter according to a proposed model[88] for Sgr A*. This image shows the result of bending of light from behind the black hole, and it also shows the asymmetry arising by the Doppler effect from the extremely high orbital speed of the matter in the ring.

Team

2023-07-24T15:08:28Z

Pinfold:

__NOEDITSECTION__
{{DISPLAYTITLE:<span style="position: absolute; clip: rect(1px 1px 1px 1px); clip: rect(1px, 1px, 1px, 1px);">{{FULLPAGENAME}}</span>}}
{|style="text-align:center;"
| style="width: 20%"|[[About|<div class="navButton">About</div>]]
| style="width: 20%"|[[Team|<div class="navButton">Team</div>]]
| style="width: 20%"|[[Groups|<div class="navButton">Groups</div>]]
| style="width: 20%"|[[Extreme Scale Software Stack|<div class="navButton">ESS Content</div>]]
| style="width: 20%"|[[File:modeladoZ.jpg|100px|link=https://modelado.org|center]]
|}
=Wilfred Pinfold, President=

{|
|style="width:500px"|[[File:WilfredPinfold864.png|Wilfred Pinfold|350px|left]]
|Dr Pinfold has a passion for emerging technologies that are enabled by leading-edge research in computational and data science, giving him the ability to contribute to major advancements across a wide range of industries, including healthcare, environment, engineering, energy, packaged goods, and the entertainment industry.

His background includes experience delivering advanced computing platforms from small embedded technologies to large data centers and combining them into Cyber Physical Systems. In a 23-year career at Intel he built experience in engineering, research, strategy, business planning, account management, and marketing. He has held academic positions in schools of engineering and business in both the US and UK, has authored numerous technical reports and papers and participated in project and thesis reviews up to and including PhD.

He is a qualified Naval Architect and Structural Engineer , the Portland Mayor’s representative on the Technology Oversight Committee (TOC), a respected and highly sought-after spokesperson and contributor to the Smart City community.

|}

=Eli Lamb, Chief Financial Officer=

{|
|style="width:500px"|[[File:Eli Lamb.jpg|Eli Lamb|350px|left]]
|Eli Lamb comes to Modelado and OpenCommons after multiple careers in system software (Bell Labs, UNIX Europe, Sun Microsystems, and Intel), renewable energy startups (Ridgeline – wind and Hydrovolts – hydro), and project management (State of Oregon). In between, he co-founded the Portland chapter of Social Venture Partners (a non-profit practicing venture philanthropy) and joined the board of Green Empowerment (a non-profit dedicated to providing clean energy (hydro, solar) and water to remote villages in developing countries.

|}

Template:PITable

2023-07-10T05:08:21Z

Pinfold:

{| class="wikitable"
| colspan="2" | <span id="1"> '''PI''' </span>
|-
| XPRESS || [[Ron Brightwell | Ron Brightwell]]
|-
| TG || [[Shekhar Borkar | Shekhar Borkar]]
|-
| DEGAS || [[Katherine Yelick | Katherine Yelick]]
|-
| D-TEC || [[Dan Quinlan | Daniel Quinlan]]
|-
| DynAX || [[Guang Gao]]
|-
| X-TUNE || [[Mary Hall | Mary Hall]]
|-
| GVR || [[Andrew Chien | Andrew Chien]]
|-
| CORVETTE || [[Koushik Sen]]
|-
| SLEEC || [[Milind Kulkarni | Milind Kulkarni]]
|-
| PIPER || [[ media:2014-05-xstack-resiliencepanel-ms.pdf | Martin Schulz ]]
|-
|}

Template:PITable

2023-07-10T05:07:44Z

Pinfold:

{| class="wikitable"
| colspan="2" | <span id="1"> '''PI''' </span>
|-
| XPRESS || [[Ron Brightwell | Ron Brightwell]]
|-
| TG || [[Shekhar Borkar | Shekhar Borkar]]
|-
| DEGAS || [[Katherine Yelic | Katherine Yelick]]
|-
| D-TEC || [[Dan Quinlan | Daniel Quinlan]]
|-
| DynAX || [[Guang Gao]]
|-
| X-TUNE || [[Mhall | Mary Hall]]
|-
| GVR || [[Andrew Chien | Andrew Chien]]
|-
| CORVETTE || [[Koushik Sen]]
|-
| SLEEC || [[Milind Kulkarni | Milind Kulkarni]]
|-
| PIPER || [[ media:2014-05-xstack-resiliencepanel-ms.pdf | Martin Schulz ]]
|-
|}

X-TUNE

2023-07-10T05:04:05Z

Pinfold:

{{Infobox project
| title = X-TUNE
| image = [[File:XTUNE-logos.png|400px]]
| imagecaption =
| team-members = [http:///www.utah.edu/ U. of Utah], [http://www.anl.gov/ ANL], [http://www.lbl.gov/ LBNL], [http://www.isi.edu/ USC/ISI]
| pi = [[Mary Hall]]
| co-pi = Paul Hovland (ANL), Samuel Williams (LBNL), Jacqueline Chame (USC/ISI)
| website = [http://ctop.cs.utah.edu/x-tune/ http://ctop.cs.utah.edu/x-tune/]
}}

'''Autotuning for Exascale''' or '''X-TUNE''' - ''Self-Tuning Software to manage Heterogeneity''

== Team Members ==
* [http:///www.utah.edu/ University of Utah]: Mary Hall (Lead PI)
* [http://www.anl.gov/ Argonne National Laboratory (ANL)]: Paul Hovland, Stefan Wild, Krishna Narayanan, Jeff Hammond
* [http://www.lbl.gov/ Lawrence Berkeley National Laboratory (LBNL)]: Sam Williams, Lenny Oliker, Brian van Straalen
* [http://www.isi.edu/ University of Southern California: Information Science Institute (USC/ISI)]: Jacqueline Chame

== What is Autotuning? ==

'''Definition:'''
* Automatically generate a “search space” of possible implementations of a computation
** A ''code variant'' represents a unique implementation of a computation, among many
** A ''parameter'' represents a discrete set of values that govern code generation or execution of a variant
* Measure execution time and compare
* Select the best-performing implementation (for exascale, tradeoff between performance/energy/reliability)

'''Key Issues:'''
* Identifying the search space
* Pruning the search space to manage costs
* Off-line vs. on-line search

'''Three Types of Autotuning Systems'''
* Autotuning libraries
** Library that encapsulates knowledge of its performance under different execution environments
** Dense linear algebra: ''ATLAS, PhiPAC''
** Sparse linear algebra: ''OSKI''
** Signal processing: ''SPIRAL, FFTW''

* Application-specific autotuning
** ''Active Harmony'' provides parallel rank order search for tunable parameters and variants
** ''Sequoia'' and ''PetaBricks'' provide language mechanism for expressing tunable parameters and variants

* Compiler-based autotuning
** Other examples: Saday et al., Swany et al., Eignenmann et al.
** Related concepts: iterative compilation, learning-based compilation

== X-TUNE Goals ==
A unified autotuning framework that seamlessly integrates programmer-directed and compiler-directed autotuning
* Expert programmer and compiler work collaboratively to tune a code
** Unlike previous systems that place the burden on either programmer or compiler
** Provides access to compiler optimizations, offering expert programmers the control over optimization they so often desire

* Design autotuning to be encapsulated in domain-specific tools
** Enables less-sophisticated users of the software to reap the benefit of the expert programmers’ efforts

* Focus on Adaptive Mesh Refinement Multigrid (Combustion Co-Design Center, BoxLib, Chombo) and tensor contractions (TCE)

== Project Impact ==

* [https://xstackwiki.modelado.org/images/d/de/Xtune-impact_summary.pdf X-TUNE Project Impact]

== X-TUNE Structure ==
[[File:XTUNE-Structure.png|600px]]

== Autotuning Language Extensions ==
* Tunable Variables
** An annotation on the type of a variable (as in Sequoia)
** Additionally, specify range, constraints and a default value

* Computation Variants
** An annotation on the type of a function (as in PetaBricks)
** Additionally, specify (partial) selection criteria
** Multiple variants may be composed in the same execution

''Separate mapping description captures architecture-specific aspects of autotuning.''

== Compiler-Based Autotuning ==
* Foundational Concepts
** Identify search space through a high-level description that captures a large space of possible implementations
** Prune space through compiler domain knowledge and architecture features
** Provide access to programmers with ''transformation recipes'', or recipes generated automatically by compiler decision algorithm
** Uses source-to-source transformation for portability, and to leverage vendor code generation
** Requires ''restructuring of the compiler''

== CHiLL Implementation ==

[[File:XTUNE-CHiLL.png|600px]]

== Transformation Recipes for Autotuning ==

'''Incorporate the Best Ideas from Manual Tuning'''

[[File:XTUNE-Transformation-Recipes.png|600px]]

'''Compiler + Autotuning can yield comparable and even better performance than manually-tuned libraries'''

[[File:XTUNE-Results.png|600px]]

== Pbound: Performance Modeling for Autotuning ==

[[File:XTUNE-Pbound.png|right|200px]]

* Performance modeling increases the automation in autotuning
** Manual transformation recipe generation is tedious and error-prone
** Implicit models are not portable across platforms

* Models can unify programmer guidance and compiler analysis
** Programmer can invoke integrated models to guide autotuning from application code
** Compiler can invoke models during decision algorithms

* Models optimize autotuning search
** Identify starting points
** Prune search space to focus on most promising solutions
** Provide feedback from updates in response to code modifications

=== Reuse Distance and Cache Miss Prediction ===

'''Reuse distance'''
* For regular (affine) array references
** Compute reuse distance, to predict data footprints in memory hierarchy
** Guides transformation and data placement decisions

[[File:XTUNE-Pbound-Reuse-Distance.png]]

'''Cache miss prediction'''
* Use to predict misses
* Assuming fully associative cache with n lines (optimistic case), a reference will hit if the reuse distance d<n

[[File:XTUNE-Pbound-Cache-Miss.png]]

=== Application Signatures + Architecture ===

[[File:XTUNE-Pbound-Example.png|600px]]

== How will modeling be used? ==
* Single-core and multicore models for application performance will combine architectural information, user-guidance, and application analysis
* Models will be coupled with decision algorithms to automatically generate CHiLL transformation recipes
** Input: Reuse Information, Loop Information etc.
** Output: Set of transformation scripts to be used by empirical search
* Feedback to be used to refine model parameters and behavior
* Small and large application execution times will be considered

'''Example: Stencils and Multigrid'''

* Stencil performance bound, when bandwidth limited:
'''''Performance (gflops) <= stencil flops * STREAM bandwidth / grid size'''''
* Multigrid solves Au=f by calculating a number of corrections to an initial solution at varying grid coarsenings (“V-cycle”)
** Each level in the v-cycle: perform 1-4 relaxes (~stencil sweeps)
** Repeat multiple v-cycles reducing the norm of the residual by an order of magnitude each cycle

[[File:XTUNE-Example.png|600px]]

'''Multigrid and Adaptive Mesh Refinement'''

[[File:XTUNE-Multigrid.jpg|right|200px]]
* Some regions of the domain may require finer fidelity than others
* In Adaptive Mesh Refinement, we refine those regions to a higher resolution in time and space
* Typically, one performs a multigrid “level solve” for one level (green, blue, red) at a time
* Coarse-fine boundaries (neighboring points can be at different resolutions) complicate the calculation of the RHS and ghost zones for the level
* Each level is a collection of small (32<sup>3</sup> or 64<sup>3</sup>) boxes to minimize unnecessary work
* These boxes will be distributed across the machine for load balancing (neighbors are not obvious/implicit)

== Autotuning for AMR Multigrid ==
* Focus is addressing data movement, multifaceted:
** Automate fusion of stencils within an operator. Doing so may entail aggregation of communication (deeper ghost zones)
** Extend and automate the communication-avoiding techniques developed in CACHE
** Automate application of data movement-friendly coarse-fine boundary conditions
** Automate hierarchical parallelism within a node to AMR MG codes
** Explore alternate data structures
** Explore alternate stencil algorithms (higher order, …)
* Proxy architectures: MIC, BG/Q, GPUs
* Encapsulate into an embedded DSL approach

== Summary and Leverage ==
* Build integrated end-to-end autotuning, focused on AMR multigrid and tensor contractions
** Language and compiler guidance of autotuning
** Programmer and compiler collaborate to tune a code
** Modeling assists programmer, compiler writer, and search space pruning

* Leverage and integrate with other X-Stack teams
** Our compiler technology all based on ROSE so can leverage from and provide capability to ROSE
** Domain-specific technology to facilitate encapsulating our autotuning strategies
** Collaborate with MIT on autotuning interface
** Common run-time for a variety of platforms (e.g., GPUs and MIC), and supports a large number of potentially hierarchical threads

== Products ==
* Publications
** H. Zhang, A. Venkat, P. Basu and M. Hall, "On Combining Polyhedral and AST Transformations," International Workshop on Polyhedral Compilation Techniques, Jan. 2016.
** P. Basu, "Compiler Optimizations and Autotuning for Stencil Computations and Geometric Multigrid," PhD dissertation, University of Utah, December 2015.
** T. Nelson, "DSLs and Search for Linear Algebra Performance Optimization," PhD dissertation, University of Colorado, December 2015.
** T. Nelson, A. Rivera, M. Hall, P.D. Hovland, E. Jessup and B. Norris, "Generating Efficient Tensor Contractions for GPUs'', International Conference on Parallel Processing, Sept., 2015.
** P. Basu, S. Williams, B. V. Straalen, M. Hall, L. Oliker, P. Colella, "Compiler-Directed Transformation for Higher-Order Stencils'', International Parallel and Distributed Processing Symposium (IPDPS), 2015.
** A. Rivera. "Using Autotuning for Accelerating Tensor-Contraction on GPUs", Masters thesis, University of Utah, December 2014.
** P. Basu, S. Williams, B. V. Straalen, L. Oliker, M. Hall, ``Converting Stencils to Accumulations for Communication-Avoiding Optimization in Geometric Multigrid'', Workshop on Stencil Computations (WOSC), 2014.
** P. Basu, A. Venkat, M. Hall, S. Williams, B. V. Straalen, L. Oliker, "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid'', 20th International Conference on High Performance Computing (HiPC), 2013.
** P. Basu, M. Hall, M. Khan, S. Maindola, S. Muralidharan, S. Ramalingam, A. Rivera, M. Shantharam, A. Venkat. "Towards Making Autotuning Mainstream''. International Journal of High Performance Computing Applications, 27(4), November 2013.
** P. Basu, S. Williams, A. Venkat, B. Van Straalen, M. Hall, and L. Oliker. "Compiler generation and autotuning of communication-avoiding operators for geometric multigrid'', In Workshop on Optimizing Stencil Computations (WOSC), 2013.

* Software Releases
** CHiLL and CUDA-CHiLL provide the autotuning compiler technology, and are available from http://github.com/CtopCsUtahEdu/chill-dev
** Orio manages navigation of the autotuning search space and is available from http://brnorris03.github.io/Orio
** Orio-CHiLL: new module in Orio provides integration with CHiLL and CUDA-CHiLL, and is available from http://github.com/brnorris03/Orio/tree/master/orio/module/chill
** SURF: new search algorithm module in Orio available from http://github.com/brnorris03/Orio/tree/master/orio/main/tuner/search/mlsearch
** TCR: domain-specific tensor contraction code generation and decision algorithm for GPUs available from http://github.com/axelyamel/tcg-autotuning

* Other Software Products
** miniGMG application proxy: extended to include high‐order stencil implementations
** CHiLL and CUDA-CHiLL: new domain-specific transformations incorporated
** Orio: new search algorithm incorporated and integration with CHiLL and CUDA-CHiLL
** NWCHEM kernels: representative tensor computations released
** PBound: new decision algorithm and integration with CHiLL
** OCTOPI: tensor contraction domain-specific framework

SLEEC

2023-07-10T05:03:42Z

Pinfold:

{{Infobox project
| title = SLEEC
| image = [[File:SLEEC-Logos.png|300px]]
| imagecaption =
| team-members = [http://www.purdue.edu/ Purdue U.], [http://www.sandia.gov/ SNL]
| pi = [[Milind Kulkarni]]
| co-pi = Arun Prakash (Purdue), Michael Parks (SNL)
| website = https://engineering.purdue.edu/SLEEC
}}

'''Semantics-rich Libraries for Effective Exascale Computation''' or '''SLEEC'''

== Team Members ==
* [http://www.purdue.edu/ Purdue University]: Milind Kulkarni, Arun Prakash, Vijay Pai, Sam Midkiff
* [http://www.sandia.gov/ Sandia National Laboratory]: Michael Parks

== Motivations ==
* Modern computational science applications composed of many different libraries
* Computational libraries, communication libraries, data structure libraries, etc.
* Peridigm, developed by co-PI Mike Parks, builds on 10 different Trilinos libraries
* Each library has its own idioms and expected usage
* Determining right way to compose and use libraries to solve a problem is difficult

=== Compositional complexity ===
* Consider loosely-coupled multi-scale computational mechanics problem (developed by co-PI Arun Prakash)
* Must determine right way to decompose problem, couple separate solutions, etc.

[[File:SLEEC-Compositional-Complexity.png|600px]]

* Simple case: fixed number of subdomains, only consider how to couple them together
* Vast space of configurations: 8 subdomains → 135K possible schedules
* Large variation in performance of different orders
* Exploration of different variants requires knowledge of domain semantics, cost estimates

[[File:SLEEC-Runtime-Graph.png|400px]]

=== Difficult interaction between libraries ===

* Peridigm: computational peridynamics code
** Allows modeling of materials under stress without explicit accounting for discontinuities (fractures, etc.)
* Built on Trilinos components
** Set of computation and communication libraries
* Requires careful coordination of data movement operations to manage shadow data, etc. needed by solvers
** But data movement requirements can be directly inferred from which equations are being solved

'''Prior Results'''
* Exploiting library semantics to improve lock placement

[[File:SLEEC-Prior-Results-1.png|700px]]

* Exploiting library semantics to improve parallelism and locality

[[File:SLEEC-Prior-Results-2.png|700px]]

=== Why not compilers ===
* Compilers do not understand library calls as abstractions
** Option one: see them as black boxes which give no information → no opportunity for optimization
** Option two: break abstraction boundaries and try to optimize → many transformation opportunities are only possible by understanding semantics of abstractions

* Needed: ''a way for compilers to understand abstractions''
** Broadway project attempted this, but focused on analyzing across abstractions, not semantics-driven transformations

=== Why not domain-specific languages? ===
* DSLs are a great fit for this
** Bake abstractions into the language
** Optimize code at high level of abstraction based on semantic properties
** Shown to be effective in various domains
*** SPL/Spiral for digital signal processing, Tensor contraction engine, etc.

* But they are not generalizable
** New domain? New DSL!
** What about applications that span domains? (e.g., multiphysics codes)

* Needed: ''a generic infrastructure for incorporating domain knowledge''

== Project Impact ==

* [https://xstackwiki.modelado.org/images/b/b9/SLEEC-Highlight_summary.pdf SLEEC Project Impact]

== Principles ==
* Abstractions carried by domain libraries
** Domain experts encode semantics, not compiler writers
** ''Need effective annotation language for capturing semantics''

* Compiler should be domain agnostic
** Same infrastructure used for optimization and transformation regardless of domain
** ''Need common IR for capturing abstractions''

* Compiler should be able to optimize for various objectives
** Do not want to focus solely on performance
** ''Need generic optimization ability and cost models''

== Components ==

[[File:SLEEC-Overview.png|600px]]

* '''Annotation language''' for capturing semantic properties of domain libraries
* '''High-level intermediate representation''' to represent programs that use annotated domain libraries
* '''Transformation strategies''' that leverage annotations to perform semantics-driven code transformations
* '''Optimization heuristics''' that use domain-specific cost models to find more efficient program variants
* '''Iterative refinement techniques''' that let the compiler work with incomplete information and infer missing information when possible

=== Example ===
* Consider annotated linear algebra library that supports two methods
** Matrix multiply
** Equation solve

* Operations have mathematical properties that establish equivalence
** Can solve ABx = b in two ways:
*** C = AB followed by Cx = b
*** Az = b followed by Bx = z
** Latter may be more effective if A & B have special properties (e.g., triangular)

'''Program Code'''
[[File:SLEEC-Program-Code.png]]

'''Abstract'''
* Abstract into high level representation
* Expression tree to capture flow of data
* Library methods represented as high level operations
* Operands can be subtrees, too, to support composition

[[File:SLEEC-Abstract-Transform.png|600px]]

'''Transform'''
* Transformations expressed as rewrite rules on expression trees
* Rewrites match operation types (domain specific) but compiler applies them without understanding domain semantics

'''Concretize'''
* Re-materialize back to source code, or transform to other, lower-level IR
[[File:SLEEC-Concretize.png]]

=== Annotation Language ===
* Domain libraries annotated by domain experts to interface with compiler infrastructure
* Questions
** How to abstract libraries into IR
** What kinds of transformations are legal
*** Represent as rewrite rules
*** How to verify? Can we synthesize?
** How to concretize
*** Can this be inferred?

'''Cost models'''
* Most annotations deal with library interface
** Semantic properties are associated with library specification, not implementation
* Can also provide cost estimates for library methods
** Implementation and architecture specific
* Can express other properties of implementation
** Energy estimates
** Accuracy information

=== Compiler Infrastructure ===
* Compiler does not explicitly understand domains
** But is extensible, allowing IR to be extended as new domains are added
* Transformations are just pattern-matched rewrite rules
** Can use domain-specific information such as domain-specific equivalences, domain-specific properties
** Can also substitute equivalent implementations of same method
* Generic compiler + annotated domain library = domain-specific compiler

'''Cost-drive optimization'''
* Applying transformations to program generates semantically equivalent program variants
** No “best” variant: different implementations will work better in different situations or optimize for different metrics
* Compilation as optimization problem
** Minimize objective function
*** FLOPs, energy efficiency, etc.
** Subject to constraints
*** Semantically equivalent to original program, meets accuracy constraints, etc.
* Same infrastructure can be used to optimize for a variety of metrics

=== Iterative Refinement ===
* Typical problem with domain-specific languages or annotation approaches: what if program is incompletely annotated?
* Want compiler to still produce useful results
* Key property: compilation process is about optimization, not correctness
** Lack of information does not raise correctness issues
** As more annotations are provided, compilation results improve

'''Inference'''
* Can we infer missing information?
* Transformation annotations
** Can we use synthesis techniques to infer legal transformations?
* Cost models
** Can we use machine learning techniques to build cost models automatically?

== Potential Impacts ==
* '''Programmability:''' Programmers can focus on developing methods, using high level libraries, without worrying about careful optimization
* '''Performance portability:''' Ability to select between library variants automatically eases transition to new architectures
* '''Scalability:''' Cost models can incorporate parallelism, locality, communication to enhance scalability
* '''Energy efficiency:''' Parameterized compilation can optimize for energy use instead of performance without rewriting infrastructure
* '''Resilience:''' Cost models can incorporate resilience information (e.g., algorithmic fault tolerance information), compilation can choose variants based on resilience properties

== Implementation Plan ==
* Work driven by “challenge” applications and domains
** Computational mechanics and multiscale techniques (lead: Arun Prakash)
** Peridynamics and Trilinos libraries (lead: Michael Parks)
* Build compiler infrastructure in ROSE
** Compiler infrastructure and optimization strategies (leads: Milind Kulkarni and Sam Midkiff)
** Annotation language and IR (leads: Milind Kulkarni and Sam Midkiff)
** Cost models and performance modeling (lead: Vijay Pai)

== Concrete Deliverables ==
* Annotation language
* Common IR
* Generic compiler infrastructure
* “Showcase” annotated libraries

== Products ==

=== Software Releases ===

* '''SemCache''' Will provide annotated, concrete, domain-specific libraries that use SLEEC technology to automatically manage communication between CPU and GPU for codes using Trilinos/Kokkos. Integration in progress.

=== Presentations/Papers ===

* "Exploiting Domain Knowledge to Optimize Parallel Computational Mechanics Codes." ICS 2013 [https://engineering.purdue.edu/~milind/docs/ics13b.pdf (PDF)]
* "SemCache: Semantics-aware Caching for Efficient GPU Offloading." ICS 2013. [https://engineering.purdue.edu/~milind/docs/ics13a.pdf (PDF)]
* SLEEC 2014 Progress Presentation [https://engineering.purdue.edu/SLEEC/SLEEC-2014.pdf (PDF)]
* "SemCache++: Semantics-aware Caching for Efficient Multi-GPU Offloading." ICS 2015. [https://engineering.purdue.edu/~milind/docs/ics15.pdf (PDF)]
* "Exploiting Domain Knowledge to Optimize Mesh Partitioning for Multiscale Methods." Supercomputing 2015 (Poster). [https://engineering.purdue.edu/~milind/docs/sc15.pdf (PDF)]
* "Optimizing the LULESH Stencil Code using Concurrent Collections." WolfHPC 2015. [https://engineering.purdue.edu/~milind/docs/wolfhpc15.pdf (PDF)]

PIPER

2023-07-10T05:03:18Z

Pinfold:

{{Infobox project
| title = PIPER: Performance Insight for Programmers and Exascale Runtimes
| image = [[File:Screen_Shot_2013-09-27_at_11.17.54_PM.png]]
| imagecaption =
| website =
| team-members = LLNL, PNNL, Rice Uni., U. of Maryland, U. of Utah, U. of Wisconsin
| pi = [[Martin Schulz]]
| co-pi = Peer-Timo Bremer, Todd Gamblin, Jeff Hollingsworth, John Mellor-Crummey, Barton Miller, Valerio Pascucci, Nathan Tallent}}

The PIPER (Performance Insight for Programmers and Exascale Runtimes) project is developing new techniques for measuring, analyzing, attributing, and presenting performance data on exascale systems.

== Team Members ==
* [http://www.llnl.gov/ Lawrence Livermore National Laboratory] ([https://scalability.llnl.gov/ team pages])
* [http://www.pnnl.gov Pacific Northwest National Laboratory] ([http://hpc.pnnl.gov/people/tallent/ team pages])
* [http://www.rice.edu Rice University] ([http://hpctoolkit.org/ team pages])
* [http://www.umd.edu University of Maryland] ([http://www.dyninst.org/harmony team pages])
* [http://www.utah.edu University of Utah] ([http://www.sci.utah.edu team pages])
* [http://www.wisc.edu University of Wisconsin] ([http://www.paradyn.org team pages])

== Objectives ==

Exascale architectures and applications will be much more complex than today's systems, and to achieve high performance, radical changes will be required in high performance computing (HPC) applications and in the system software stack. In such a variable environment, performance tools are essential to enable users to optimize application and system code. Tools must provide online performance feedback to runtime systems to guide online adaptation, and they must output intuitive summaries and visualizations to help developers identify performance problems.

To provide this essential functionality for the extreme-scale software stack, we are developing new abstractions, techniques, and novel tools for data measurement, analysis, attribution, diagnostic feedback, and visualization. This enables Performance Insights for Programmers and Exascale Runtimes (PIPER). This project cuts across the entire software stack by collecting data in all system components through novel abstractions and integrated introspection, providing data attribution and analysis in a system-wide context and across programming models, enabling global correlations of performance data from independent sources in different domains, and delivering dynamic feedback to run-time systems and applications through auto-tuning as well as interactive visualizations.

== Project Impact ==

* [https://xstackwiki.modelado.org/images/b/b8/PIPER-Impact_summary.pdf PIPER Project Impact]

== Approach ==

PIPER consists of four thrust areas, organized into three phases.
The following figure gives an overall view of our approach:

[[File:columns.png]]

* Thrust 1: We design and implement a series of new scalable measurement techniques to pinpoint and quantify the main roadblocks on the way to exascale, including lack of parallelism, energy consumption, and load imbalance.
* Thrust 2: We combine a broad range of stack-wide metrics and measurements to gain a global picture of the application's execution running on top of the highly complex and possibly adaptive exascale system architecture.
* Thrust 3: We exploit the stack-wide correlated data and develop a suite of new feature-based analysis and visualization techniques that allow us to gain true insight into a code's behavior and relay this information back to the user in an intuitive fashion
* Thrust 4: We apply the analysis results to enable feedback into the system stack enabling autonomic optimization loops both for high- and low-level adaptations.

We are implementing our research in a set of modular components that can be deployed
across various execution and programming models. Wherever possible, we leverage the extensive
tool infrastructures available through prior work in our project team and integrate
the results of our research back into these existing production-level tool sets.

== Architecture and Interaction with the Software Stack ==

We implement our research in a set of modular components that can be deployed
across various execution and programming models, covering both legacy (MPI+X) models and new models developed in other X-Stack2 projects. Wherever possible, we leverage the extensive
tool infrastructures available through prior work in our project team and integrate
the results of our research back into these existing production-level tool sets.
Furthermore, we make the results of our research available to a broad audience
and work with the larger tools community to achieve a wider adaption.
The figure below provides an initial high-level sketch of our envisioned architecture
that will provide the PIPER functionality:

[[File:Piper_architecture.png]]

We target measurements from the entire hardware/software stack. That is,
we expect to use measurements from both the underlying system hardware as well
as custom measurements derived from the application. The measurements themselves
leverage a series of adaptive instrumentation techniques. As part of the
measurement operation, we associate the measurements with a local
call stack. The correlated local stack/performance measurement data feeds
an analysis pipeline consisting of both node-local analysis methods, and
distributed, wider-context analysis methods.
The resulting data
store supports a high-level query interface used by visualization and data
analysis reporting tools informing the user. Such a system also enables dynamic tuning,
and feedback-directed optimization.

== Released Software ==

=== Infrastructure Elements ===

* [http://www.paradyn.org/html/dyninst9.0.3-features.html Dyninst 9.0 - Dynamic Instrumentation Library]
* [http://www.paradyn.org/html/mrnet5.0.0-features.html MRNet 5.0 - Tree-based Overlay Network]
* [https://github.com/OpenMPToolsInterface OMPT/OMPD - Tool Interfaces for OpenMP]

=== Bottleneck Detection / Analysis ===

* [http://hpctoolkit.org/ HPCToolkit - Sampling centric analysis and blame shifting]

=== Performance Visualization ===

* [https://computation.llnl.gov/project/performance-analysis-through-visualization/software.php Boxfish - Visual performance analysis through data centric mappings]
* [https://github.com/scalability-llnl/ravel Ravel - MPI trace visualization using logical timelines]
* [https://github.com/scalability-llnl/MemAxes MemAxes/Mitos - Visualization of on-node memory traffic]

=== Auto-Tuning ===

* [http://www.dyninst.org/harmony Active Harmony 4.5 - Multiparameter Tuning Systme]

Ron Brightwell

2023-07-10T05:02:44Z

Pinfold:

{{Person
|portrait=Ron Brightwell.jpeg
|firstname=Ron
|lastname=Brightwell
|company=Sandia National Laboratories
|position=Department Manager
|location=Albuquerque NM
|country=United States
|linkedin=https://www.linkedin.com/in/ron-brightwell-797bb74/
}}

File:Ron Brightwell.jpeg

2023-07-10T05:02:35Z

Pinfold:

Ron Brightwell

2023-07-10T05:02:07Z

Pinfold: Created page with "{{Person |portrait=Ron Brightwell.jpeg |firstname=Ron |middlename= |lastname=Brightwell |company=Sandia National Laboratories |position=Department Manager |location=Albuquerqu..."

{{Person |portrait=Ron Brightwell.jpeg |firstname=Ron |middlename= |lastname=Brightwell |company=Sandia National Laboratories |position=Department Manager |location=Albuquerque NM |country=United States |sector= |linkedin=https://www.linkedin.com/in/ron-brightwell-797bb74/ }}

XPRESS

2023-07-10T05:01:37Z

Pinfold:

{{Infobox2 project
| title = XPRESS
| image = [[File:XPRESS-Logos.png|350px]]
| imagecaption =
| team-members = [http://www.sandia.gov/ SNL], [http://www.indiana.edu/ IU], [http://www.lbl.gov/ LBNL], [http://www.lsu.edu/ LSU], [http://www.ornl.gov/ ORNL], [http://www.renci.org/ UNC/RENCI], [http://www.uh.edu/ UH], [https://www.uoregon.edu/ UO]
| pi = [[Ron Brightwell]]
| chief-scientist = Thomas Sterling (IU)
| co-pi = Andrew Lumsdaine (IU), Hartmut Kaiser (LSU), Barbara Chapman (UH), Allen Maloney (UO), Chris Baker (ORNL), Allan Porterfield (UNC/RENCI), Alice Koniges (LBNL)
| website = http://xstack.sandia.gov/xpress
}}

'''eXascale Programming Environment and System Software''' or '''XPRESS'''

== Team Members ==
* [http://www.sandia.gov/ Sandia National Laboratories (SNL)]: LXK operating system, OpenX software architecture, Applications
* [http://www.indiana.edu/ Indiana University (IU)]: Parallex Execution Model, OpenX software architecture, HPX‐4 runtime system, XPI
* [http://www.lbl.gov/ Lawrence Berkeley National Laboratory (LBNL)]: Applications
* [http://www.lsu.edu/ Louisiana State University (LSU)]: HPX‐4 runtime system
* [http://www.ornl.gov/ Oak Ridge National Laboratory (ORNL)]: Applications
* [http://www.uh.edu/ University of Houston (UH)]: Application migration
* [http://renci.org/research/xpress/ University of North Carolina at Chapel Hill/RENCI (UNC/RENCI)]: XPI, HPX‐4, APEX
* [https://www.uoregon.edu/ University of Oregon (UO)]: Performance instrumentation (APEX)

== Project Impact ==

*[https://xstackwiki.modelado.org/images/7/78/XPRESS-Highlights_Summary.pdf XPRESS Project Impact]

== Products ==

* [[List of products]]

This [[Media:XPRESS-Products-2015.pdf|document]] lists all of the products for the XPRESS project.

* Publications
** A complete list of publications is available [http://xstack.sandia.gov/xpress/publications.html here].

*Software
** A list of software for this project is available [http://xstack.sandia.gov/xpress/software.html here].

[[File:New-openx.jpg|400x400px|frame|The OpenX Software Architecture]]

== Goals, Objectives, and Approach ==
* '''Goals:'''
** Enable exascale performance capability for current and future DOE applications
** Develop and deliver a practical computing system software X‐stack, “OpenX”, for future practical DOE computing systems
** Provide programming methods, environments, languages, and tools for effective means of expressing application and system software for portable exascale system execution

* '''Objectives:'''
** Derive a dynamic adaptive introspective strategy for exploiting opportunities and addressing critical exascale technology challenges in the form of an abstract execution model
** Devise a software architecture as a framework for future exascale system design and implementation
** Implement core interrelated and interoperable components of the software architecture to realize a fully working and usable system
** Test, evaluate, validate, and demonstrate correctness, performance, resiliency, and energy efficiency
** Provide technology transfer through cooperative engagement of industry hardware and software vendors and national labs via documentation and training

* '''Approach:'''
** Research, develop, and deploy a software stack to exploit the ParalleX execution model

== Software Stack ==

=== ParalleX Execution Model ===
* An execution model to provide the governing principles of computation to guide the system co-design and interoperability of software component layers and portability across system classes

* Goal is to provide conceptual foundation to dramatically increase efficiency and scalability through transition from static to dynamic resource management and task scheduling and exploitation of new sources of parallelism

* Key semantic constructs
** Active Global Address Space (“AGAS”) for single system image
** First class lightweight user threads for medium‐gain parallelism
** “Parcels” message‐driven computing for latency mitigation
** Local Control Objects (“LCO”) for powerful system synchronization

* Performance strategy
** Scalability through lightweight thread level parallelism, overlapping successive phases of computation with powerful synchronization and elimination of global barriers, automatic exposure/exploitation of intrinsic meta‐data parallelism, effective use of finer grain parallelism through reduction of overhead that bounds granularity
** Latency mitigation through parcels by reducing numbers of long distance actions (split‐phase transactions), localizing remote data and work, migrating continuations to change locus of continued execution with data structure, lightweight thread context switching, and direct in‐memory and parcel generation without thread instantiation, and locality semantics
** Overhead reduction is derived by powerful semantics of synchronization for minimum work, optimized thread control
** Contention amelioration through dynamic resource management

=== LXK: Lightweight eXascale Kernel ===
* Based on Sandia’s Kitten lightweight kernel
* Boots idenDcally to Linux
** Repurposes basic Linux funcDonality (PCI, NUMA, ACPI, etc.)
** Supports POSIX threads (NPTL) and OpenMP
** Allows innovaDon in key areas
*** Memory management
*** Multicore messaging
*** Network stack optimizations
*** Fully tick‐less operation
** 20K LOC

* Allows for re‐thinking OS structure and implementation of
** Lightweight asynchronous system services
** Dynamic composability and modularity
** Adaptive resource policy enforcement mechanisms
** Interface to runtime system(s)
** Integrated instrumentation and monitoring

'''Kitten Implementation'''
* Monolithic, C code, GNU toolchain, Kbuild configuration
* Supports x86‐64 architecture only, considering port to ARM
** Boots on standard PC architecture, Cray XT, and in virtual machines
** Boots identically to Linux (Kitten bzImage and init_task)

* Repurposes basic functionality from Linux
** Hardware bootstrap
** Basic OS kernel primitives (lists, locks, wait queues, etc.)
** PCI, NUMA, ACPI, IOMMU, …
** Directory structure similar to Linux, arch dependent/independent directories

* Custom address space management and task management
** User‐level API for managing physical memory, building virtual address spaces
** User‐level API for creating tasks, which run in virtual address spaces
** User‐level API for migrating tasks between cores

'''Kitten Thread Support'''
* Kitten user‐applications link with standard GNU C library (Glibc) and other system libraries installed on the Linux build host

* Functionality added to Kitten to support Glibc NPTL POSIX threads implementation
** Futex() system call (fast user‐level locking)
** Basic support for signals
** Match Linux implementation of thread local storage
** Support for multiple threads per CPU core, preemptively scheduled

* Kitten supports runtimes that work on top of POSIX threads
** Glibc GOMP OpenMP implementation
** Sandia Qthreads
** Probably others with a little effort

=== HPX Runtime System ===
* A next‐generation runtime system software layer that supports the semantics of the ParalleX execution model for significant increase in efficiency and scalability
* HPX‐3 provides
** Existing early proof‐of‐concept software dynamic adaptive resource management, task scheduling, global name space, efficient powerful synchronization, and Parcel message‐driven computation. Interfaces with conventional Unix‐like OS.
** HPX‐5
** Modular system software architecture with specified functionality, interfaces and protocols for intra‐operability and interfaces to OS and programming environment
** Introspection closed‐loop system for
*** Resiliency through microcheckpointing
*** Dynamic load balancing
*** Power monitoring and control

=== XPI: Low‐Level Intermediate Form ===
* User programming syntax and source‐to‐source compiler target for high‐level programming languages
* Provides stable application plakorm based on HPX, which is expected to change underneath throughout project
* A library of C‐bindings to represent lowest‐level semantics, policies, and mechanisms for ParalleX execution model
* XPI construct classes
** Process
** Thread
** Locality
** Parcel
** Future
** Dataflow
** Housekeeping

=== XPRESS Information Flow ===
* Passing information between layers will be critical
** In current systems information flows one direction

* For Exascale static scheduling decisions will not work
** Dynamic environment
*** Billion‐way parallelism
*** Resilience
*** Reliability
*** Energy
*** Shared resource contention

* Feedback will be required

'''Performance Information as Glue'''
* Performance informaDon
** Current: post‐execution performance tools
** Exascale: dynamic application introspection

* For performance and reliability thread/core/node/system knowledge will be critical throughout the software stack
** Interfaces designed to enable information flow
*** Utilities need to know current system performance
*** Utilities need to know how other utilities are reacting

'''Exascale Performance Observability'''
* Exascale requires a fundamentally new observability paradigm
** Reflects translation of application and mapping of computation model to execution model
** Designed specifically to support introspection of runtime performance for adaptation and control
** Aware of multiple objectives
*** System‐level resource utilization data and analysis, energy consumption, and health information

* Exascale observability abstraction
** Inherent state of exascale execution is dynamic
** Embodies non‐stationarity of performance, energy, resilience during application execution
** Constantly shaped by the adaptation of resources to meet computational needs and optimize execution objectives

=== OpenX Software Stack and APEX ===

=== APEX ===
* XPRESS performance measurement/analysis/introspection
** Observation and runtime analysis of performance, energy, and reliability
** Online introspection resource utilization, energy consumption, and health information
** Coupling of introspection with OpenX software stack for self‐adaptive control

* APEX: Autonomic Performance Environment for eXascale
** Support performance awareness and performance reactivity
** Couple with application and execution knowledge
** Serve top‐down AND bottom‐up requirements of OpenX
** Performance overlay on OpenX

'''APEX’s Role for Top‐Down OpenX Requirements'''

* Top-‐down requirements are driven by:
** Mapping of applications to the ParalleX model
** Translation through the programming models and the language compilers into runtime operations and execution
** Performance abstractions (PA) at each level define:
*** Set of parameters to be observed by the next levels down
*** Performance model to be evaluated and provide basis for control
** Performance abstractions are coupled with observations through APEX’s hierarchical performance framework
** Realization of control mechanisms (reactive to PA actualization)

* Top‐down view sees APEX functionality as part of the application’s execution, specialized with observability and introspection support built into each OpenX layer:
** LXK OS – system resource, utilization, job contention, overhead
** HPX – threads, queues, concurrency, remote, parcels, memory
** XPI/Legacy – language‐level performance semantics

'''APEX’s Role for Bottom‐Up OpenX Requirements'''

* Bottom‐up requirements are driven by:
** Performance introspection across the OpenX layers
** Enable dynamic, adaptive operation, and decision control
** Online access and analysis of observations at different levels
** Working model is multi‐parameter system optimization

* APEX creates the performance feedback mechanisms and builds an efficient hierarchical infrastructure for connecting subscribers to runtime performance state
** Intra‐level performance awareness for HPX and LXK
** Interplay with the overall application dynamics (top‐down)
*** Top‐down requirement implement constraints
** Performance information is the result of runtime analysis

'''Top‐Down Development Approach'''
* Define performance abstraction (focusing on higher level)
** Specify observability requirements, and results semantics
** Provide application context for association
** What are the performance models, attributes, factors?

* Build into legacy languages and XPI
** APEX programming of PA infrastructure
** Invokes HPX performance measurement API

* Integrate TAU capabilities for APEX realization
** Instrumentation
*** legacy programming (MPI, OpenMP, OpenACC)
*** imperative API (XPI) for ParalleX programming
** TAU mapping for measurement contextualization
** Wrapper interposition to intercept HPX runtime layer

* Create introspection API for feedback

== Legacy Application Migration and Interoperability ==

* Seamless migration, no modification needed for apps
** Retargeting OpenMP compiler to OpenX
** Adapting MPI to OpenX, need MPI system‐level adjustment, and change of configuration logic

* Two approaches for retargeting OpenMP via XPI:
** Through a POSIX compliant interface using XPI, e.g. pthreads on top of XPI
*** +: No (or minor) modification needed to OpenMP compiler
*** -: May only use a limited subset of OpenX features
** Rewrite OpenMP compiler and runtime to XPI
*** +: Explore OpenX advanced features

* OpenACC integration with OpenX
** Adapt the OpenACC data movement mechanism to OpenX communication parcels in the AGAS
** Integrate accelerator kernel execution of OpenACC with the OpenX execution model

* Evaluation:
** Applications: Mini‐apps with MPI+OpenMP/OpenACC
** Performance and productivity feedback to the OpenX implementation team

Wilfred Pinfold

2023-07-10T05:00:37Z

Pinfold:

{{Person
|portrait=WilfredPinfold864.png
|firstname=Wilfred
|middlename=Robert
|lastname=Pinfold
|company=OpenCommons
|position=President
|location=Maupin OR
|country=United States
|linkedin=https://www.linkedin.com/in/wilfred-pinfold-aa6b68/
}}

Wilfred Pinfold

2023-07-10T05:00:14Z

Pinfold:

{{Person
|portrait=WilfredPinfold864.png
|firstname=Wilfred
|middlename=Robert
|lastname=Pinfold
|company=OpenCommons
|position=President
|location=Maupin OR
|country=United States
|sector=Transportation
|linkedin=https://www.linkedin.com/in/wilfred-pinfold-aa6b68/
}}

Wilfred Pinfold

2023-07-10T04:57:43Z

Pinfold:

{{Person |portrait=WilfredPinfold.png |firstname=Wilfred |middlename=Robert |lastname=Pinfold |company=OpenCommons |position=President |location=Maupin OR |country=United States |sector=Transportation |linkedin=https://www.linkedin.com/in/wilfred-pinfold-aa6b68/ }}

File:WilfredPinfold.png

2023-07-10T04:55:15Z

Pinfold:

Wilfred Pinfold

2023-07-10T04:53:10Z

Pinfold: Created page with "{{Person |portrait=Pinfold200.jpg |firstname=Wilfred |middlename=Robert |lastname=Pinfold |company=urban.systems Inc. |position=CEO |location=Maupin OR |country=United States..."

{{Person |portrait=Pinfold200.jpg |firstname=Wilfred |middlename=Robert |lastname=Pinfold |company=urban.systems Inc. |position=CEO |location=Maupin OR |country=United States |sector=Transportation |linkedin=www.wilfredpinfold.com }}

Traleika Glacier

2023-07-10T04:52:42Z

Pinfold:

{{Infobox project
| title = Traleika Glacier X-Stack
| image = [[File:Traleikaglacier.jpg]] 
| imagecaption =
| team-members = [http://www.intel.com/ Intel], [https://www.reservoir.com/ Reservoir Labs], [http://www.etinternational.com/ ETI], [http://www.udel.edu/ UDEL], [http://www.ucsd.edu/ UC San Diego], [http://www.rice.edu/ Rice U.], [http://cs.illinois.edu/ UIUC], [http://www.pnnl.gov/ PNNL]
| pi = [[Shekhar Borkar]]
| co-pi = [[Wilfred Pinfold]], Richard Lethin (Reservoir Labs), Laura Carrington (UC San Diego), Vivek Sarkar (Rice U.), David Padua (UIUC), Josep Torrellas (UIUC), Andres Marquez (PNNL)
| website = [https://xstack.exascale-tech.com/wiki/index.php/Main_Page https://xstack.exascale-tech.com/wiki/index.php/Main_Page]
}}

== Team Members ==
* [http://www.intel.com/ Intel:] Shekhar Borkar (PI); Hardware guidance, HW/SW co-design, resiliency, technical management
* [https://www.reservoir.com/ Reservoir Labs:] Richard Lethin (PI); Programming system, R-Stream, tools, optimization

* [http://www.ucsd.edu/ University of California, San Diego (UC San Diego):] Laura Carrington (PI); Applications
* [http://www.rice.edu/ Rice University:] Vivek Sarkar (PI); Programming system, runtime system
* [http://cs.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC):] David Padua, Josep Torrellas (PIs); Programming system, Hierarchical Tiles Arrays (HTA), architecture, system architecture evaluation
* [http://www.pnnl.gov/ Pacific Northwest National Laboratory (PNNL):] Andres Marquez (PI); Kernels and proxy apps for evaluation

== Project Impact ==

*[https://xstackwiki.modelado.org/images/2/21/Traleika_Glacier_Impacts.pdf Traleika Glacier Project Impact]

== Goals and Objectives ==
'''Goal:'''
The Traleika Glacier X-Stack program will develop X-Stack software components in close collaboration with application specialists at the DOE co-design centers and with the best available knowledge of the Exascale systems we anticipate will be available in 2018/2020.

'''Description:'''
Intel has built a straw-man hardware platform that embodies potential technology solutions to well understood challenges. This straw-man is implemented in the form of a simulator that will be used as a tool to test software components under investigation by Traleika team members. Co-design will be achieved by developing representative application components that stress software components and platform technologies and then use these stress tests to refine platform and software elements iteratively to an optimum solution. All software and simulator components will be developed in open source facilitating open cross team collaboration. The interface between the software components and the simulator will be built to facilitate back end replacement with current production architectures (MIC and Xeon) providing a broadly available software development vehicle and facilitating the integration of new tools and compilers conceived and developed under this proposal with existing environments like MPI, OpenMP, and OpenCL.

The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.

'''Objectives:'''

* '''Energy efficiency:''' SW components interoperate, harmonize, exploit HW features, and optimize the system for energy efficiency
* '''Data locality:''' PGM system & system SW optimize to reduce data movement
* '''Scalability:''' SW components scalable, portable to O(109)—extreme parallelism
* '''Programmability:''' New (Asychronous) & legacy (MPI+OpenMP), with gentle slope for productivity
* '''Execution model:''' Objective function based, dynamic, global system optimization
* '''Self-awareness:''' Dynamically respond to changing conditions and demands
* '''Resiliency:''' Asymptotically provide reliability of N-modular redundancy using HW/SW co-design; HW detection, SW correction

== Status Reports ==
* [[media:TG_X-Stack_Review_Top_2_20140401.pdf| Traleika Glacier X-Stack Highlights]], April 1, 2014
* [[media:DE-SC0008717_TG_X-Stack_Status_Review_20140325.pdf| Traleika Glacier X-Stack Status Review]], March 25, 2014
* [[media:DE-SC0008717_TG_X-Stack_M5+6_Status_Report_20140312_Redacted.pdf|Traleika Glacier Year 2 Interim Status Report]], March 12, 2014
* [[media:DE-SC0008717_TG_X-Stack_progress_report_20140523_Redacted.pdf|Traleika Glacier Year 2 Progress Report]], May 30, 2014
* [[media:DE-SC0008717_TG_X-Stack_M8_Status_Report_20140907.pdf|Traleika Glacier Milestone 8 Report]], September 1, 2014
* [[media:DE-SC0008717_TG_X-Stack_M9_Status_Report_20141208.pdf|Traleika Glacier Milestone 9 Report]], December 8, 2014
* [[media:DE-SC0008717_TG_X-Stack_M10_Status_Report.pdf|Traleika Glacier Milestone 10 Report]], March 2, 2015
* [[media:DE-SC0008717_TG_X-Stack_M11_Status_Report_20150602.pdf|Traleika Glacier Milestone 11 Report]], June 2, 2014

== Meetings and Presentations ==
=== Weekly Extreme Scale Deep-Dive: Schedule and Archive ===
* [https://eci.exascale-tech.com/wiki/index.php/Weekly_Technical_Review_Meeting Weekly Technical Review Meeting] (Tuesdays, 10-12 Pacific Time)

=== Co-Design Workshops (Newest to Oldest) ===
* [http://www.modelado.org/dynamic-runtime-community-project-review-fall-2016/ 7th Co-Design Project Review - September 27 - 29, 2016]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_5_-_September_29,_2015_-_October_1,_2015 Application Workshop 5 - September 29, 2015 - October 1, 2015]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_4_-_April_7-8,_2015 Application Workshop 4 - April 7-8, 2015]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_3_-_September_30,_2014_-_October_2,_2014 Application Workshop 3 - September 30, 2014 - October 2, 2014]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_2_-_January_21-23,_2014 Application Workshop 2 - January 21-23, 2014]

== Traleika Glacier products ==
=== Research Products ===
* [https://xstack.exascale-tech.com/wiki/index.php/Traleika_Glacier_Research_Products Research Products]
=== Software Releases ===
* [https://xstack.exascale-tech.com/wiki/index.php/Traleika_Glacier_Software_Releases Software Releases]

== Scope of the Project ==
[[File:TG-Scope.png|600px]]

== Roadmap ==
[[File:TG-Roadmap.png]]

== Architecture ==

'''Straw-man System Architecture and Evaluation'''

[[File:TG-Strawman-System.png|600px]]

'''Data-locality and BW Tapering, Why So Important?'''

[[File:TG-Data-Locality.png|600px]]

== Programming and Execution Models ==
[[File:TG-Programming-Model.png]]

'''Programming model'''
* Separation of concerns: Domain specification & HW mapping
* Express data locality with hierarchical tiling
* Global, shared, non-coherent address space
* Optimization and auto generation of codelets (HW specific)

'''Execution model'''
* Dataflow inspired, tiny codelets (self contained)
* Dynamic, event-driven scheduling, non-blocking
* Dynamic decision to move computation to data
* Observation based adaption (self-awareness)
* Implemented in the runtime environment

'''Separation of concerns'''
* User application, control, and resource management

=== Programming System Components ===

[[File:TG-System-Components.png|600px]]

=== Runtime ===
* Different runtimes target different aspects
** IRR: targeted for Intel Straw-man architecture
** SWARM: runtime for a wide range of parallel machines
** DAR3TS: explore codelet PXM using portable C++
** Habanero-C: interfaces IRR, tie-in to CnC

* All explore related aspects of the codelet Program Exec Model (PXM)

* Goal: Converge towards Open Collaborative Runtime (OCR)
** Enabling technology development for codelet execution
** Model systems, foster novel runtime systems research

* Greater visibility through SW stack -> efficient computing
** Break OS/Runtime information firewall

'''Some Promising Results:'''

[[File:TG-Runtime-Results.png|600px]]

'''Runtime Research Agenda'''
* Locality aware scheduling—heuristics for locality/E-efficiency
** Extensions to standard Habanero-C runtime

* Adaptive boosting and idling of hardware
** Avoid energy expensive unsuccessful steals that perform no work
** Turbo mode for a core executing serial code
** Fine grain resource (including energy) management

* Dynamic data-block movement
** Co-locate codelets and data
** Move codelets to data

* Introspection and dynamic optimization
** Performance counters, sensors provide real time information
** Optimization of the system for user defined objective
** (Go beyond energy proportional computing)

=== Simulators and Tools ===

[[File:TG-Simulators-Tools.png|600px]]

'''Simulators—what to expect and not'''
* Evaluation of architecture features for PGM and EXE models
* Relative comparison of performance, energy
* Data movement patterns to memory and interconnect
* Relative evaluation of resource management techniques

[[File:TG-Simulator-Expect-Not.png|400px]]

'''Results Using Simulators'''

[[File:TG-Simulator-Results.png|600px]]

== Applications and HW-SW Codesign ==
[[File:TG-App-HW-Co-design.png|600px]]

== X-Stack Components ==
[[File:TG-XStack-Components.png|600px]]

== Metrics ==
[[File:TG-Metrics.png|600px]]

Shekhar Borkar

2023-07-10T04:51:42Z

Pinfold:

{{Person
|portrait=Shekhar Borkar.jpeg
|firstname=Shekhar
|middlename=Y
|lastname=Borkar
|company=Qualcomm
|position=Intel Fellow
|location=Hillsboro OR
|country=United States
|linkedin=https://www.linkedin.com/in/shekhar-borkar-9068097b/
}}

File:Shekhar Borkar.jpeg

2023-07-10T04:51:29Z

Pinfold:

Shekhar Borkar

2023-07-10T04:51:00Z

Pinfold: Created page with "{{Person |portrait=Shekhar Borkar.jpeg |firstname=Shekhar |middlename=Y |lastname=Borkar |company=Qualcomm |position=Intel Fellow |location=Hillsboro OR |country=United States..."

{{Person |portrait=Shekhar Borkar.jpeg |firstname=Shekhar |middlename=Y |lastname=Borkar |company=Qualcomm |position=Intel Fellow |location=Hillsboro OR |country=United States |sector= |linkedin=https://www.linkedin.com/in/shekhar-borkar-9068097b/ }}

DEGAS

2023-07-10T04:50:06Z

Pinfold:

{{Infobox project
| title = DEGAS
| image = [[File:DEGAS-Logos.png|350px]]
| imagecaption =
| team-members = [http://www.lbl.gov/ LBNL], [http://www.rice.edu/ Rice U.], [http://www.berkeley.edu/ UC Berkeley], [https://www.utexas.edu/ UT Austin], [https://www.llnl.gov/ LLNL], [http://www.ncsu.edu/ NCSU]
| pi = [[Katherine Yelick]]
| co-pi = Vivek Sarkar (Rice U.), James Demmel (UC Berkeley), Mattan Erez (UT Austin), Dan Quinlan (LLNL)
| website = [http://crd.lbl.gov/departments/computer-science/CLaSS/research/DEGAS/ DEGAS]
}}

'''Dynamic Exascale Global Address Space''' or '''DEGAS'''

== Team Members ==
* [http://www.lbl.gov/ Lawrence Berkeley National Laboratory (LBNL)]
* [http://www.rice.edu/ Rice University]
* [http://www.berkeley.edu/ University of California, Berkeley]
* [https://www.utexas.edu/ University of Texas at Austin]
* [https://www.llnl.gov/ Lawrence Livermore National Laboratory (LLNL)]
* [http://www.ncsu.edu/ North Carolina State University (NCSU)]

== Project Impact ==
* [https://xstackwiki.modelado.org/images/0/09/DEGAS-Highlight_Summary.pdf DEGAS Project Impact]

== Mission ==
'''Mission Statement:''' To ensure the broad success of Exascale systems through a unified programming model that is productive, scalable, portable, and interoperable, and meets the unique Exascale demands of energy efficiency and resilience.

[[File:DEGAS-Mission.png]]

== Goals & Objectives ==
* '''Scalability:''' Billion‐way concurrency, thousand‐way on chip with new architectures
* '''Programmability:''' Convenient programming through a global address space and high‐level abstractions for parallelism, data movement and resilience
* '''Performance Portability:''' Ensure applications can be moved across diverse machines using implicit (automatic) compiler optimizations and runtime adaptation
* '''Resilience:''' Integrated language support for capturing state and recovering from faults
* '''Energy Efficiency:''' Avoid communication, which will dominate energy costs, and adapt to performance heterogeneity due to system-‐level energy management
* '''Interoperability:''' Encourage use of languages and features through incremental adoption

== Programming Models ==
=== Two Distinct Parallel Programming Questions ===
* What is the parallel control model?

[[File:DEGAS-Parallel-Control-Model.png|500px]]

* What is the model for sharing/communication?

[[File:DEGAS-Sharing-Model.png|500px]]

=== Applications Drive New Programming Models ===
[[File:DEGAS-Message-Passing.png]]

* Message Passing Programming
** Divide up domain in pieces
** Compute one piece and exchange
** '''MPI and many libraries'''

[[File:DEGAS-Global-Address-Space.png]]

* Global Address Space Programming
** Each start computing
** Grab whatever/whenever
** '''UPC, CAF, X10, Chapel, Fortress, Titanium, GlobalArrays'''

=== Hierarchical Programming Model ===
[[File:DEGAS-Hierarchical-PM.png|right|400px]]
* '''Goal:''' Programmability of exascale applications while providing scalability, locality, energy efficiency, resilience, and portability
** ''Implicit constructs:'' parallel multidimensional loops, global distributed data structures, adaptation for performance heterogeneity
** ''Explicit constructs:'' asynchronous tasks, phaser synchronization, locality

* Built on scalability, performance, and asynchrony of PGAS models
** Language experience from UPC, Habanero‐C, Co‐Array Fortran, Titanium

* Both intra and inter‐node; focus is on node model

* Languages demonstrate DEGAS programming model
** ''Habanero‐UPC:'' Habanero’s intra‐node model with UPC’s inter‐node model
** ''Hierarchical Co‐Array Fortran (CAF):'' CAF for on‐chip scaling and more
** ''Exploration of high level languages:'' E.g., Python extended with H‐PGAS

* Language‐independent H‐PGAS Features:
** Hierarchical distributed arrays, asynchronous tasks, and compiler specialization for hybrid (task/loop) parallelism and heterogeneity
** Semantic guarantees for deadlock avoidance, determinism, etc.
** Asynchronous collectives, function shipping, and hierarchical places
** End‐to‐end support for asynchrony (messaging, tasking, bandwidth utilization through concurrency)
** Early concept exploration for applications and benchmarks

=== Communication-Avoiding Compilers ===
[[File:DEGAS-Communication-Node.png|300px|right]]
* '''Goal:''' massive parallelism, deep memory and network hierarchies, plus functional and performance heterogeneity
** '''Fine‐grained task and data parallelism:''' enable performance portability
** '''Heterogeneity:''' guided by functional, energy and performance characteristics
** '''Energy efficiency:''' minimize data movement and hooks to runtime adaptation
** '''Programmability:''' manage details of memory, heterogeneity, and containment
** '''Scalability:''' communication and synchronization hiding through asynchrony

* H-PGAS into the Node
** Communication is all data movement

* Build on code‐generation infrastructure
** ROSE for H‐CAF and Communication‐Avoidance optimizations
** BUPC and Habanero‐C; Zoltan
** Additional theory of CA code generation

=== Exascale Programming: Support for Future Algorithms ===
[[File:DEGAS-Algorithm.png|600px]]
* '''Approach:''' “Rethink” algorithms to optimize for data movement
** New class of communication‐optimal algorithms
** Most codes are not bandwidth limited, but many should be

* '''Challenges:''' How general are these algorithms?
** Can they be automated and for what types of loops?
** How much benefit is there in practice?

=== Adaptive Runtime Systems (ARTS) ===
[[File:DEGAS-Infiniband-Throughput.png|right|400px]]
* '''Goal:''' Adaptive runtime for manycore systems that are hierarchical, heterogeneous and provide asymmetric performance
** '''Reactive and proactive control:''' for utilization and energy efficiency
** '''Integrated tasking and communication:''' for hybrid programming
** '''Sharing of hardware threads:''' required for library interoperability

* '''Novelty:''' Scalable control; integrated tasking with communication
** '''Adaptation:''' Runtime annotated with performance history/intentions
** '''Performance models:''' Guide runtime optimizations, specialization
** '''Hierarchical:''' Resource/energy
** '''Tunable control:''' Locality/load balance

* '''Leverages:''' Existing runtimes
** '''Lithe''' scheduler composition; '''Juggle'''
** '''BUPC and Habanero‐C''' runtimes

=== Synchronization Avoidance vs Resource Management ===

[[File:DEGAS-Resource-Mgmt.png|700px]]

* Management of critical resources will be more important:
** ''Memory and network bandwidth limited'' by cost and energy
** ''Capacity limited at many levels:'' network buffers at interfaces, internal network congestion are real and growing problems

* Can runtimes manage these or do users need to help?
** Adaptation based on history and (user‐supplied) intent?
** Where will bottlenecks be for a given architecture and application?

=== Lith Scheduling Abstraction: "Harts" (Hardware Threads) ===
[[File:DEGAS-Harts.png|700px]]

=== Lightweight Communication (GASNet-EX) ===
[[File:DEGAS-GASNet.png|right]]
* '''Goal:''' Maximize bandwidth use with lightweight communication
** '''One‐sided communication:''' to avoid over‐synchronization
** '''Active‐Messages:''' for productivity and portability
** '''Interoperability:''' with MPI and threading layers

* '''Novelty:'''
** Congestion management: for 1‐sided communication with ARTS
** Hierarchical: communication management for H‐PGAS
** Resilience: globally consist states and fine‐grained fault recovery
** Progress: new models for scalability and interoperatbility

* '''Leverage GASNet''' (redesigned):
** Major changes for on‐chip interconnects
** Each network has unique opportunities

=== Resilience through Containment Domains ===
[[File:DEGAS-Resilience.png|right]]
* '''Goal:''' Provide a resilient runtime for PGAS applications
** Applications should be able to customize resilience to their needs
** Resilient runtime that provides easy‐to‐use mechanisms

* '''Novelty:''' Single analyzable abstraction for resilience
** PGAS Resilience consistency model
** Directed and hierarchical preservation
** Global or localized recovery
** Algorithm and system‐specific detection, elision, and recovery

* '''Leverage:''' Combined superset of prior approaches
** Fast checkpoints for large bulk updates
** Journal for small frequent updates
** Hierarchical checkpoint‐restart
** OS‐level save and restore
** Distributed recovery

'''Resilience: Research Questions'''

1. How to define consistent (i.e. allowable) states in the PGAS model?
* Theory well understood for fail‐stop message‐passing, but not PGAS.

2. How do we discover consistent states once we've defined them?
* Containment domains offer a new approach, beyond conventional sync-and‐stop algorithms.

3. How do we reconstruct consistent states after a failure?
* Explore low overhead techniques that minimize effort required by applications programmers.
* Leverage BLCR, GASnet, Berkeley UPC for development, and use Containment Domains as prototype API for requirements discovery
[[File:DEGAS-Resilience-Research-Area.png|300px]]

=== Energy and Performance Feedback ===
[[File:DEGAS-Nvidia-graph.png|right|300px]]
* '''Goal:''' Monitoring and feedback of performance and energy for online and offline optimization
** Collect and distill: performance/energy/timing data
** Identify and report bottlenecks: through summarization/visualization
** Provide mechanisms: for autonomous runtime adaptation

* '''Novelty:''' Automated runtime introspection
** Provide monitoring: power/network utilization
** Machine Learning: identify common characteristics
** Resource management: including dark silicon

* '''Leverage:''' Performance/energy counters
** Integrated Performance Monitoring (IPM)
** Roofline formalism
** Performance/energy counters

== Software Stack ==
[[File:DEGAS-Software-Stack.png|500px]]

== DEGAS Pieces of the Puzzle ==
[[File:DEGAS-Puzzle.png|500px]]

== [http://crd.lbl.gov/assets/Uploads/FTG/Projects/DEGAS/DEGAS-products-April2016.pdf Products] from DEGAS research (as of 04/2016) ==

== [http://crd.lbl.gov/departments/computer-science/CLaSS/research/DEGAS/degas-software-releases Software Releases] ==

D-TEC

2023-07-10T04:49:37Z

Pinfold:

{{Infobox project
| title = D-TEC
| image = [[File:DTEC-Logos.png|350px]]
| imagecaption =
| team-members = [https://www.llnl.gov/ LLNL], [http://www.mit.edu/ MIT], [http://www.rice.edu/ Rice U.], [http://www.ibm.com/ IBM], [http://www.osu.edu/ OSU], [http://www.berkeley.edu/ UC Berkeley], [https://www.uoregon.edu/ U. of Oregon], [http://www.lbl.gov/ LBNL], [http://www.ucsd.edu/ UC San Diego]
| pi = [[Dan Quinlan]]
| co-pi = ???
| website = http://dtec-xstack.org/
| download = http://dtec-xstack.org/?page_id=4
}}

'''DSL Technology for Exascale Computing''' or '''D-TEC'''

'''Domain Specific Languages (DSLs)''' are a tranformational technology that capture expert knowledge about applica@on domains. For the domain scientist, the DSL provides a view of the high‐level programming model. The DSL compiler captures expert knowledge about how to map high‐level abstractions to different architectures. The DSL compiler’s analysis and transformations are complemented by the general compiler analysis and transformations shared by general purpose languages.

* There are different types of DSLs:
** Embedded DSLs: Have custom compiler support for high level abstractions defined in a host language (abstractions defined via a library, for example)
** General DSLs (syntax extended): Have their own syntax and grammar; can be full languages, but defined to address a narrowly defined domain

* DSL design is a responsibility shared between application domain and algorithm scientists

* Extraction of abstractions requires significant application and algorithm expertise

* We have an application team to:
** provide expertise that will ground our DSL research
** ensure its relevance to DOE and enable impact by the end of three years

== Team Members ==
* [https://www.llnl.gov/ Lawrence Livermore National Laboratory (LLNL)]: Daniel J. Quinlan (Lead PI)
* [http://www.mit.edu/ Massachusetts Institute of Technology (MIT)]: Saman Amarasinghe, Armando Solar‐Lezama, Adam Chlipala, Srinivas Devadas, Una‐May O’Reilly, Nir Shavit, Youssef Marzouk
* [http://www.rice.edu/ Rice University]: John Mellor‐Crummey, Vivek Sarkar
* [http://www.ibm.com/ IBM]: Vijay Saraswat, David Grove
* [http://www.osu.edu/ Ohio State University (OSU)]: P. Sadayappan, Atanas Rountev
* [http://www.berkeley.edu/ University of California, Berkeley]: Ras Bodik
* [https://www.uoregon.edu/ University of Oregon]: Craig Rasmussen
* [http://www.lbl.gov/ Lawrence Berkeley National Laboratory (LBNL)]: Phil Colella
* [http://www.ucsd.edu/ University of California, San Diego]: Scott Baden

See [[Media:dtec-contacts.xlsx]] for team member contact information.

== Project Impact ==

*[https://xstackwiki.modelado.org/images/9/99/D-TEC_Summary_Highlight.pdf D-TEC Project Impact]

== Overview ==

=== Goals and Objectives ===

'''D‐TEC Goal: Making DSLs Effective for Exascale'''

* We address all parts of the Exascale Stack:
** '''Languages (DSLs):''' define and build several DSLs economically
** '''Compilers:''' define and demonstrate the analysis and optimizations required to build DSLs
** '''Parameterized Abstract Machine:''' define how the hardware is evaluated to provide inputs to the compiler and runtime
** '''Runtime System:''' define a runtime system and resource management support for DSLs
** '''Tools:''' design and use tools to communicate to specific levels of abstraction in the DSLs

* We will provide effective performance by addressing exascale challenges:
** '''Scalability:''' deeply integrated with state‐of‐art X10 scaling framework
** '''Programmability:''' build DSLs around high levels of abstraction for specific domains
** '''Performance Portability:''' DSL compilers give greater flexibility to the code generation for diverse architectures
** '''Resilience:''' define compiler and runtime technology to make code resilient
** '''Energy Efficiency:''' machine learning and autotuning will drive energy efficiency
** '''Correctness:''' formal methods technologies required to verify DSL transformations
** '''Heterogeneity:''' demonstrate how to automatically generate lower level multi‐ISA code

* Our approach includes interoperability and a migration strategy:
** '''Interoperability with MPI + X:''' demonstrate embedding of DSLs into MPI + X applications
** '''Migration for Existing Code:''' demonstrate source‐to-source technology to migrate existing code

=== Poster ===

[[File:D-TEC_poster_X-Stack_kickoff_v3.png|border|500px]]

=== Quad Chart ===

Download [[Media:DTEC-Quad Chart_and_Highlight_v6.pdf]].

=== Two Pager ===

Download [[Media:D-TEC_2013_TwoPager_v1.pdf]].

== The D‐TEC approach addresses the full Exascale workflow ==
[[File:DTEC-workflow.png|600px]]

* Discovery of domain specific abstractions from proxy‐apps by application and algorithm experts

* (C1 & C2) Defining Domain Specific Languages (DSLs)
** The role of the DSL is to encapsulate expert knowledge
*** About the problem domain
*** The DSL compiler encapsulates how to optimize code for that domain on new architectures
** Rosebud used to define DSLs ''(a novel framework for joint optimization of mixed DSLs)''
*** DSL specification is used to generate a "DSL plug‐in” for Rosebud's DSL compiler
*** Supports both embedded and general DSLs and multiple DSLs in one host‐language source file
*** DSL optimization is done via cost‐based search over the space of possible rewritings
*** Costs are domain‐specific, based on shared abstract machine model + ROSE analysis results
*** Cross‐DSL optimization occurs naturally via search of combined rewriting space
** Sketching used to define DSLs ''(cutting‐edge aspect of our proposal)''
*** Series of manual refinements steps (code rewrites) define the transformations
*** Equivalence checking between steps to verify correctness
*** The series of transformations define the DSL compiler using ROSE
*** Machine learning is used to drive optimizations
** Both approaches will leverage the common ROSE infrastructure
** Both approaches will leverage the SEEC enhanced X10 runtime system

* (C3) DSL Compiler
** Leverages ROSE compiler throughout

* (C4) Parameterized Abstract Machine
** Extraction of machine characteristics

* (C5) Runtime System
** Leverages X10 and extends it with SEEC support

* (C6) Tools
** We will define source-to‐source migration tools
** We will define the mappings between DSL layers to support future tools

== Software Stack ==

=== Rosebud ===

'''Rosebud Overview'''
* Unified framework for DSL implementation
** all aspects: parsing, analysis, optimization, code generation
** all types: embedded, custom‐syntax, standalone

* Modular development and use of DSLs
** textual DSL description => plug‐in to ROSE DSL Compiler
** plug‐ins developed separately from ROSE and each other

* Knowledge‐based optimization of DSL programs
** plug‐in encapsulates expert optimization knowledge
** ROSE supplies conventional compiler optimizations

* Flexible code generation
** DSL lowered to any ROSE host language
** DSL compiled directly to (portable) machine code via LLVM

'''Rosebud Implementation'''
* DSL front end
** SGLR parser + predefined host‐language grammars
** attribute grammar + ROSE extensible AST and analysis

* DSL optimizer
** declarative rewriting system + procedural hooks to ROSE
** cost‐based heuristic search of implementation space
** domain‐specific costs based on abstract machine model
** cross‐DSL optimization arises naturally from joint search space

* DSL code generator
** ROSE host language unparsers
** ROSE AST => LLVM SSA code

'''Rosebud Plug-ins'''

[[File:DTEC-Rosebud-Plug-ins.png|600px]]

* Plug‐ins developed separately from ROSE and each other
* Plug‐ins distributed in source or object form
* Selected plug‐ins supplied to Rosebud DSL Compiler to compile mixed DSLs in a host language source file

'''Rosebud DSL Compiler'''

[[File:DTEC-Rosebud-DSL-Compiler.png]]

[[File:DTEC-Rosebud-Parsing.png|right]]
'''Two‐phase parsing for DSL language support'''

* Host language + multiple DSLs in the same source file
** expressive custom notations
** familiar general-purpose language

* Phase 1: extract and parse DSLs
** via Stratego SDF parsing system

* Phase 2: parse host language
** via existing ROSE front ends

* Merge DSL tree fragments into host language AST
** DSL plug-ins provide custom tree nodes and semantic analysis

=== LOPe Programming Model ===

'''LOPe programming model is easily expressed in Fortran because of syntax for arrays'''

* Halo attribute added to arrays
** HALO(1:*:1, 1:*:1)
** specifies one border cell on each side of a two‐dimensional array
** * implies "stuff" in the middle
[[File:DTEC-Halo-1.png|100px]]

[[File:DTEC-Halo-2.png|right]]
* Halos are logical cells not necessarily physically part of the array

* Halos can be communicated with coarrays
** DIMENSION(:,:)[:,:]
** halo region in pink
** logically extends to neighbor processors
** exchange_halo(Array)

'''LOPe programming model is easily expressed in Fortran because of syntax for concurrency'''

* Concurrent attribute added to procedures
** restricted semantics for array element access to avoid race conditions
** copy halo in, write single element out (visible after all threads exit)
[[File:DTEC-Code-1.png]]

* Called from within a '''DO CONCURRENT''' loop
[[File:DTEC-Code-2.png]]

'''Transformation (via ROSE) of a LOPe program to OpenCL allows execution on a GPU'''

[[File:DTEC-Transformation.png|600px]]

'''Compiler Research is essential for DSLs (C3)'''

The DSL compiler captures expert knowledge about how to optimize high‐level abstractions to different architectures and is complemented by general compiler analysis and transformations such as that shared by general purpose language compilers. Architecture specific features are reasoned about through machine learning and/or the use of a parameterized abstract machine model that can be tailored to different machines.

[[File:DTEC-C3.png|right]]

* We will leverage existing technologies:
** Source‐to‐source technology in ROSE (LLNL and Rice)
** X10 front‐end for connection to ROSE (IBM)
** LLVM as low level IR in ROSE (LLNL and Rice)
** Polyhedral analysis to support optimizations (OSU)
** Machine learning to drive optimizations (MIT)
** Correctness checking (MIT and UCB)

*We will develop new technologies:
** Rosebud DSL specification
** DSL specific analysis and optimizations
** Automated DSL compiler generation
** X10 support in ROSE
** Define mappings between DSL layers to compiler analysis
** Refinement using equivalence checking
** Verification for transformations

* We will advance the state‐of‐the‐art:
** Formal methods use for HPC
** Generation of DSLs for productivity and performance portability
** Extending/Using polyhedral analysis to drive code generation for heterogeneous architectures

* Exascale challenges:
** Scalability: code generation for X10/SEEC and Scalable Data Structures, program synthesis
** Programmability: two approaches to DSL construction, automated equivalence checking
** Performance Portability: Using parameterized abstract machines, machine learning, auto‐tuning of refinement search spaces
** Resilience: Compiler‐based software TMR
** Energy Efficiency: using machine learning

* Interoperability and Migration Plans:
** Interoperability: A single compiler IR supports reusing analysis and transformations
** Migration Plan: Using source‐to‐source technology permits leveraging the vendor’s compiler

'''Preliminary Experimental Results (Habanero Hierarchical Place Tree)'''
* Actual hardware: four quad-core Xeon sockets; each socket contains two core-pairs; each core shares an L2 cache
* Possible abstract machine models:
** Use Habanero Hierarchal Place Tree (HPT) abstraction for these results
** Experiment with three HPT abstractions of same hardware:
*** 1x16 --- one root place with 16 leaf places <This model focuses on L1 Cache locality>
*** 8x2 --- 8 non-leaf places, each of which has 2 leaf places <This model focuses on the L2 cache shared by a core-pair>
*** 16x1 --- like 1x16, except that it ignores the root place
* Preliminary execution times for SOR2D (size C) on above hardware underscore the importance of selecting the right abstraction for a given application-platform combination
** 1x16 --- 1.14 seconds
** 8x2 --- 0.61 seconds
** 16x1 --- 1.90 seconds

=== X10 ===
'''Current X10 Runtime Software Stack'''
[[File:DTEC-X10-1.png|right|300px]]
* Core Class Libraries
** Fundamental classes & primitives, Arrays, core I/O, collections, etc
** Written in X10; compiled to C++ or Java

* XRX (X10 Runtime in X10)
** APGAS functionality
*** Concurrency: async/finish
*** Distribution: Places/at
** Written in X10; compiled to C++ or Java

* X10 Language Native Runtime
** Runtime support for core sequential X10 language features
** Two versions: C++ and Java

* X10RT
** Active messages, collectives, bulk data transfer
** Implemented in C
** Abstracts/unifies network layers (PAMI, DCMF, MPI, etc) to enable X10 on a range of transports

'''Leveraging X10 Runtime for Native Applications'''

[[File:DTEC-X10-2.png|right|300px]]
''Native APGAS API''
* Provides C++/C APIs to APGAS functionality of X10 Runtime
** Concurrency: async/finish
** Distribution: Places/at

* Additionally exposes subset of X10RT APIs for use by native applications
** Collective operations
** One‐sided active messages

* Allows non‐X10 applications to leverage X10 runtime facilities via a library interface

'''Scalability of X10 Runtime'''
* Scalability
** X10 programs have achieved good scaling at > 32k cores on P7IH (PERCS) and up to 8k cores on BlueGene/P

* Support for Intra‐node scalability
** async/finish enable high‐level programming of fine‐grained concurrency
** Advanced features (clocks, collecting finish) support determinate programming of common concurrency idioms
** Workstealing implementation: both Fork/Join & Cilk‐style
** APGAS programing model extended to GPUs
*** X10 kernels can be compiled to CUDA
*** compiler-‐mediated data/control transfer between CPU/GPU

* Support for Inter‐node scalability
** Places/at; collectives; one‐sided active messages; asynchronous bulk data transfer APIs
** Utilizes available transports (PAMI, DCMF, MPI)

=== SEEC Runtime ===
* Understands high‐level goals
** E.g., performance, accuracy, power

* Makes observations
** Is app on current machine meeting goals?

* Understands actions
** Provided by opt. management, machine, uncertainty quantifica@on

* Makes decisions about how to take action given goals and current observations
** Uses control theory, machine learning, and possibly game theory

[[File:DTEC-SEEC.png|600px]]

=== Polyhedral Compiler Transformations ===
[[File:DTEC-Polyhedral.png|right|300px]]

* Advantages over standard AST‐based compiler frameworks
** Seamless support of imperfectly nested loops
** Handle symbolic loop bounds
** Powerful uniform model for composition of transformations
** Model‐driven optimization using the power of integer linear programming

* Work planned on D‐TEC project
** Leverage/integrate DSL properties in the optimization process
** Expose API for analysis and semantics‐preserving transformations of programs
** Multi‐target code generation using domain semantics and architecture characteristics
** Communication optimization using high‐level semantic information
** Address challenges in applying polyhedral transformations to complex DOE application codes

'''Multi‐target Domain‐specialized Code Generation'''

[[File:DTEC-Code-Generation.png|600px]]

=== MIT Sketch ===
'''MIT Sketch: how does it work'''

* Synthesis engine works by elimination
** Partial implementation defines space of possible solutions
** Classes of incorrect solutions are eliminated by analyzing why particular incorrect solutions failed

[[File:DTEC-MIT-Sketch-1.png|600px]]

'''Sketching Enhanced Refinement: Low‐Level'''
* Synthesis simplifies manual refinement
** Sophisticated implementation is simple if we can elide low‐level details
** Automated equivalence checking helps avoid bugs in the refinement process

[[File:DTEC-MIT-Sketch-2.png|600px]]

'''Sketching Enhanced Refinement: High‐Level'''

[[File:DTEC-MIT-Sketch-3.png|600px]]

== The role of constraints on types ==
* Constraints can appear on classes or functions
* Constraints allow locality of reasoning and simplify synthesis
* Examples:

[[File:DTEC-Multiplication.png|600px]]

[[File:DTEC-Fault-Tolerance.png|right|300px]]
== Fault Tolerance ==

* User defines bound on expected error
* UQ to determine how fault contribute to total error
* Represent total error a function of errors caused by transient faults (in individual tasks)
* Total error is a function of errors introduced in each faulty task
* Errors due to faults modeled as random noise
* Each random quantity ''_ε_i_'' captures transient fault influence on tasks

== Tools for Legacy Code Modernization ==
* Incrementally add DSL constructs to legacy codes
** Replace performance‐critical sections by DSLs
** Our “mixed‐DSLs + host language” architecture supports this

* Manual addition of DSL constructs is low risk

* Semi‐automatic addition of DSL constructs is promising
** Recognize opportunities for DSL constructs using same pattern‐matching as in rewriting system
** Human could direct, assist, verify, or veto

* Fully automatic rewriting of fragments to DSL constructs may be possible

* Benefits
** Higher performance using aggressive DSL optimization
** Performance portability without a complete rewrite

== Tools for Understanding DSL Performance ==
[[File:DTEC-DSL-Performance.png|right|300px]]
* Challenges
** Huge semantic gap between embedded DSL and generated code
** Code generation for DSLs is opaque, debugging is hard, and fine‐grain performance attribution is unavailable

* Goal: Bridge semantic gap for debugging and performance tuning

* Approach
** Record information during program compilation
*** two‐way mappings between every token in source and generated code
*** transformation options, domain knowledge, cost models, and choices
** Monitor and attribute execution characteristics with instrumentation and sampling
*** e.g., parallelism, resource consumption, contention, failure, scalability
** Map performance back to source, transformations, and domain knowledge
** Compensate for approximate cost models with empirical autotuning

* Technologies to be developed
** Strategies for maintaining mappings without overly burdening DSL implementers
** Strategies for tracking transformations, knowledge, and costs through compilation
** Techniques for exploring and explaining the roles of transformations and knowledge
** Algorithms for refining cost estimates with observed costs to support autotuning

== Migrating Existing Codes ==
''' Benefits of custom, source to source translation'''
* Automatically restructure conventional code using a custom source‐to‐source translator ...
* ... that captures semantic knowledge of the application domain ... thereby improving performance
* Embedded Domain Specific Languages
** Automatically tolerate communication delays
** Squeeze out library overheads
* Library primitives → primitive language objects

== Management Plan and Collaboration Paths with Advisory Board and Outside Community ==
[[File:DTEC-Plan.png|600px]]
==D-TEC Products and Publications==
Product List
# D-TEC compiler and verification research work, X10-ROSE support, Embedded DSL compilers, and ROSE connection to OpenTuner: http://www.rosecompiler.org/
# The Open Fortran Project’s (OFP) parser: https://github.com/OpenFortranProject/ofp-sdf .
# Rosebud: https://svn.rice.edu/r/rosebud .
# Halide: http://halide-lang.org
# OpenTuner: http://opentuner.org/
# The X10 toolchain, X10/APGAS runtime system, and the X10 proxy applications: http://x10-lang.org/
# Bamboo: http://cseweb.ucsd.edu/groups/hpcl/scg/BambooWebsite/
# PolyOpt is a ROSE plug-in distributed with ROSE and also available at: http://hpcrl.cse.ohio-state.edu/wiki/index.php/Polyhedral_Compilation
# Simit: http://groups.csail.mit.edu/commit/ .
# Sketch: http://people.csail.mit.edu/asolar/.
# Cloverleaf auto-converted using synthesis: http://people.csail.mit.edu/asolar/.
# Lulesh built using synthesis: http://people.csail.mit.edu/asolar/.
# MSL, the distributed synthesis language: at http://people.csail.mit.edu/asolar/.
# Rely: at http://people.csail.mit.edu/mcarbin/.
# Rosette, a Racket-based language for hosting solver-aided DSLs: http://homes.cs.washington.edu/~emina/rosette/.
# Chlorophyll: synthesis-aided DSL and compiler for spatial parallel architectures: http://pl.eecs.berkeley.edu/projects/chlorophyll/.

Publication List
# Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Nol Pouchet. Verification of polyhedral optimizations with constant loop bounds infinite state space computations. In Tiziana Margaria and Bernhard Steffen, editors, Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications, volume 8803 of Lecture Notes in Computer Science, pages 493{508. Springer Berlin Heidelberg, 2014.
# Chunhua Liao, Daniel J. Quinlan, Thomas Panas, and Bronis R. de Supinski. A rose-based openmp 3.0 research compiler supporting multiple runtime libraries. In Mitsuhisa Sato, ToshihiroHanawa, Matthias S. Muller, Barbara M. Chapman, and Bronis R. de Supinski, editors, IWOMP, volume 6132 of Lecture Notes in Computer Science, pages 15{28. Springer, 2010.
# Chunhua Liao, Yonghong Yan, Bronis R de Supinski, Daniel J Quinlan, and Barbara Chapman. Early experiences with the openmp accelerator model. In OpenMP in the Era of Low Power Devices and Accelerators, pages 84{98. Springer, 2013.
# Dan Quinlan and Chunhua Liao. The ROSE source-to-source compiler infrastructure. In Cetus Users and Compiler Infrastructure Workshop, Galveston Island, TX, USA, October 2011.
# Yonghong Yan, Pei-Hung Lin, Chunhua Liao, Bronis R. de Supinski, and Daniel J. Quinlan. Supporting multiple accelerators in high-level programming models. In Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM '15, pages 170{180, New York, NY, USA, 2015. ACM.
# Pei-Hung Lin, Chunhua Liao, Daniel J. Quinlan, and Stephen Guzik. Experiences of using the openmp accelerator model to port doe stencil applications, 2014. Poster presented at the Workshop on accelerator programming using directives, Nov. 17, 2014, New Orleans, LA.
# Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Nol Pouchet. Verification of parallel polyhedral transformations with arbitrary constant loop bounds, 2015. In review process of EuroPar2015.
# Jonathan Ragan-Kelley. Decoupling Algorithms from the Organization of Computation for High Performance Image Processing. Ph.d. thesis, Massachusetts Institute of Technology, Cambridge,MA, June 2014.
# Jason Ansel. Autotuning Programs with Algorithmic Choice. Ph.d. thesis, Massachusetts Institute of Technology, Cambridge, MA, February 2014.
# Jeffrey Bosboom. Streamjit: A commensal compiler for high-performance stream programming. S.m. thesis, Massachusetts Institute of Technology, Cambridge, MA, June 2014.
# Eric Wong. Optimizations in stream programming for multimedia applications. M.eng. thesis, Massachusetts Institute of Technology, Cambridge, MA, Aug 2012.
# Phumpong Watanaprakornkul. Distributed data as a choice in petabricks. M.eng. thesis, Massachusetts Institute of Technology, Cambridge, MA, Jun 2012.
# Charith Mendis, Jeffrey Bosboom, Kevin Wu, Shoaib Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, and Saman Amarasinghe. Helium: Lifting high-performance stencil kernels from stripped x86 binaries to halide dsl code. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015.
# Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. Opentuner: An extensible framework for program autotuning. In International Conference on Parallel Architectures and Compilation Techniques, Edmonton, Canada, August 2014.
# Jeffrey Bosboom, Sumanaruban Rajadurai, Weng-Fai Wong, and Saman Amarasinghe. Streamjit: A commensal compiler for high-performance stream programming. In ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications, Portland, OR, October 2014.
# Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, and Saman Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation, Seattle, WA, June 2013.
# Phitchaya Mangpo Phothilimthana, Jason Ansel, Jonathan Ragan-Kelley, and Saman Amarasinghe. Portable performance on heterogeneous architectures. In The International Conference on Architectural Support for Programming Languages and Operating Systems, Houston, TX, March 2013.
# Maciej Pacula, Jason Ansel, Saman Amarasinghe, and Una-May O'Reilly. Hyperparameter tuning in bandit-based adaptive operator selection. In European Conference on the Applications of Evolutionary Computation, Malaga, Spain, Apr 2012.
# Jason Ansel, Maciej Pacula, Yee Lok Wong, Cy Chan, Marek Olszewski, Una-May O'Reilly, and Saman Amarasinghe. Siblingrivalry: Online autotuning through local competitions. In International Conference on Compilers Architecture and Synthesis for Embedded Systems, Tampere, Finland, Oct 2012.
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Fredo Durand. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics, 31(4), July 2012.
# Dan Alistarh, Patrick Eugster, Maurice Herlihy, Alexander Matveev, and Nir Shavit. Stacktrack: An automated transactional approach to concurrent memory reclamation. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys '14, pages 25:1{25:14, New York, NY, USA, 2014. ACM.
# Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Una-May O'Reilly, and Saman Amarasinghe. Opentuner: An extensible framework for program autotuning. Technical Report MIT/CSAIL Technical Report MIT-CSAIL-TR-2013-026, Massachusetts Institute of Technology, Cambridge, MA, Nov 2013.
# Alexander Matveev and Nir Shavit. Reduced hardware norec: A safe and scalable hybrid transactional memory. In 20th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2015, Istanbul, Turkey, 2015. ACM.
# Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C. Rinard. Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '14, pages 309{328, New York, NY, USA, 2014. ACM.
# Zhilei Xu, Shoaib Kamil, and Armando Solar-Lezama. Msl: A synthesis enabled language for distributed implementations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 311{322, Piscataway, NJ, USA, 2014. IEEE Press.
# F. Augustin and Y. ~ M. Marzouk. Uncertainty quantification in high performance computing (invited position paper). SIGPLAN Workshop on Probabilistic and Approximate Computing(APPROX), 2014.
# David Grove, Josh Milthorpe, and Olivier Tardieu. Supporting array programming in X10. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY'14, pages 38:38{38:43, New York, NY, USA, 2014. ACM.
# Wei Zhang, Olivier Tardieu, David Grove, Benjamin Herta, Tomio Kamada, Vijay Saraswat, and Mikio Takeuchi. GLB: Lifeline-based global load balancing library in X10. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 31{40, New York, NY, USA, 2014. ACM.
# Olivier Tardieu, David Grove, Benjamin Herta, Tomio Kamada, Vijay Saraswat, Mikio Takeuchi, and Wei Zhang. X10 for Productivity and Performance at Scale: A Submission to the 2013 HPC Class II Challenge, October 2013.
# Craig Rasmussen, Matthew Sottile, Daniel Nagle, and Soren Rasmussen. Locally-oriented programming: A simple programming model for stencil-based computations on multi-level distributed memory architectures. In Proceedings of Euro-Par 2015 Parallel Processing, Lecture Notes in Computer Science. Springer International Publishing, 2015. Submitted, February 2015.
# Thomas Steel Henretty. Performance Optimization of Stencil Computations on Modern SIMD Architectures. PhD thesis, The Ohio State University, 2014.
# Justin Andrew Holewinski. Automatic Code Generation for Stencil Computations on GPU Architectures. PhD thesis, The Ohio State University, 2012.
# Mahesh Ravishankar. Automatic parallelization of loops with data dependent control flow and array access patterns. PhD thesis, The Ohio State University, 2014.
# Kevin Alan Stock. Vectorization and Register Reuse in High Performance Computing. PhD thesis, The Ohio State University, 2014.
# Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 13{24, New York, NY, USA, 2013. ACM.
# Justin Holewinski, Louis-Noel Pouchet, and P. Sadayappan. High-performance code generation for stencil computations on gpu architectures. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 311{320, New York, NY, USA, 2012. ACM.
# Louis-Noel Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 29{38, New York, NY, USA, 2013. ACM.
# S. Rajbhandari, A. Nikam, Pai-Wei Lai, K. Stock, S. Krishnamoorthy, and P. Sadayappan. Cast: Contraction algorithm for symmetric tensors. In Parallel Processing (ICPP), 2014 43rd International Conference on, pages 261{272, Sept 2014.
# Samyam Rajbhandari, Akshay Nikam, Pai-Wei Lai, Kevin Stock, Sriram Krishnamoorthy, and P. Sadayappan. A communication-optimal framework for contracting distributed tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pages 375{386, Piscataway, NJ, USA, 2014. IEEE Press.
# Mahesh Ravishankar, John Eisenlohr, Louis-Noel Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 72:1{72:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
# Mahesh Ravishankar, John Eisenlohr, Louis-Noel Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. Automatic parallelization of a class of irregular loops for distributed memory systems. ACM Transactions on Parallel Computing, 1(1):7:1{7:37, September 2014.
# Mahesh Ravishankar, Roshan Dathathri, Venmugil Elango, Louis-Noel Pouchet, J Ramanujam, Atanas Rountev, and P Sadayappan. Distributed memory code generation for mixed irregular/regular computations. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 65{75. ACM, 2015.
# Kevin Stock, Martin Kong, Tobias Grosser, Louis-Noel Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. A framework for enhancing data reuse via associative reordering. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 65{76, New York, NY, USA, 2014. ACM.
# Venmugil Elango, Fabrice Rastello, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. On characterizing the data movement complexity of computational dags for parallel execution. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 296{306, New York, NY, USA, 2014. ACM.
# Naznin Fauzia, Venmugil Elango, Mahesh Ravishankar, J. Ramanujam, Fabrice Rastello, Atanas Rountev, Louis-Noel Pouchet, and P. Sadayappan. Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential. ACM Trans. Archit. Code Optim., 10(4):53:1{53:29, Dec. 2013.
# Martin Kong, Richard Veras, Kevin Stock, Franz Franchetti, Louis-Noel Pouchet, and P. Sadayappan. When polyhedral transformations meet simd code generation. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 127{138, New York, NY, USA, 2013. ACM.
# Lai Wei and John Mellor-Crummey. Autotuning tensor transposition. In Proceedings of the 19th International Workshop on High-level Parallel Programming Models and Supportive Environments, May 2014.
# Vivek Sarkar. Jun Shirako, Louis-Noel Pouchet. Oil and water can mix: An integration of polyhedral and ast-based transformations. In IEEE Conference on High Performance Computing, Networking, Storage and Analysis (SC'14). IEEE, 2014.
# Vivek Sarkar. Prasanth Chatarasi, Jun Shirako. Polyhedral transformations of explicitly parallel programs. In 5th International Workshop on Polyhedral Compilation Techniques (IMPACT 2015). IEEE, 2015.
# Kamal Sharma. Locality Transformations of Computation and Data for Portable Performance. PhD thesis, Rice University, August 2014.
# Jun Shirako and Vivek Sarkar. Oil and water can mix! Experiences with integrating polyhedral and AST-based Transformations. In 17th Workshop on Compilers for Parallel Programming, July 2013.
# Jisheng Zhao, Michael Burke, and Vivek Sarkar. Rice ROSE Compositional Analysis and Transformation Framework (R2CAT). Technical report, LLNL Technical Report 590233, October 2012.
# Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah Chasins, and Rastislav Bodik. Chlorophyll: synthesis-aided compiler for low-power spatial architectures. In O'Boyle and Pingali [58], page 42.
# Emina Torlak and Rastislav Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In O'Boyle and Pingali [58], page 54.
# Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1{8. IEEE, 2013.
# Emina Torlak and Rastislav Bodik. Growing solver-aided languages with rosette. In Antony L. Hosking, Patrick Th. Eugster, and Robert Hirschfeld, editors, ACM Symposium on New Ideas in Programming and Reflections on Software, Onward! 2013, part of SPLASH '13, Indianapolis, IN, USA, October 26-31, 2013, pages 135{152. ACM, 2013.
# Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, and Rastislav Bodik. Parallel schedule synthesis for attribute grammars. In Alex Nicolau, Xiaowei Shen, Saman P. Amarasinghe, and Richard W. Vuduc, editors, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, Shenzhen, China, February 23-27, 2013, pages 187{196. ACM, 2013.
# Michael F. P. O'Boyle and Keshav Pingali, editors. ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, Edinburgh, United Kingdom - June 09- 11, 2014. ACM, 2014.
# Tan Nguyen, Pietro Cicotti, Eric Bylaska, Dan Quinlan, and Scott B. Baden. Bamboo: translating mpi applications to a latency-tolerant, data-driven form. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 39:1{39:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
# Tan Nguyen and Scott B. Baden. Bamboo-preliminary scaling results on multiple hybrid nodes of knights corner and sandy bridge processors. In Proc. WOLFHPC: Workshop on Domain-Specific Languages and High-Level Frameworks for HPC, SC13, The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver CO, 2013.

== DTEC Usage ==

{| class="wikitable"
! #
! Project
! Technology Readiness
! TRL Level (1-9)
! Downloads (Last 14 Days)
! Visitors,(Last 14 Days)
! Commits,(Last 12 Months)
! Contributors (Last 12 Months)
! OpenHub.net Analysis
! Institutional Usage
! University Usage
! DTEC Deliverables
! URL
|-
| 1
| ROSE
| Open Source, used by LLNL ASC application teams and both internal and external research to DOE
| 7
| 21
| 434
| 1808
| 16
| 28,464 total commits by 123 developers and 5M lines of code: https://www.openhub.net/p/rose-compiler
| IBM, Intel, DOE LLNL, DOE LBL, DoD
| MIT, University of Utah, Texas A&M, Rice, UCSD, Universify Of Oregon, University of Colorado, CalTech, UIUC
| Source-to-source compiler Framework, including Rosetta IR node support, specific DSL tools, CodeThorn verification tools, and X10 language support
| http://www.roseCompiler.org
|-
| 2
| Halide
|
| 7
| 1251
| 595
| 10597
| 61
|
| Google, Adobe, Intel, Qualcomm, Facebook + more
| Many...
|
| http://halide-lang. org/
|-
| 3
| OpenTuner
|
| 6
| 51
| 59
| 330
| 11
|
|
|
|
| http://opentuner.org/
|-
| 4
| Rely Language
|
|
|
|
|
|
|
|
|
|
|
|-
| 5
| MSL - Synththesis Language/ Sketch Synthesizer
| Open source, Released as part of the Sketch language.
|
| ?
| ?
| 210
| 5
|
| Adobe
| MIT, UW, Rice, Berkeley
| Synthesis framework for SPMD kernels
| http://people.csail.mit.edu/asolar
|-
| 6
| SImit Langauge
| Language for physical simulations and sparse systems
| 5
| 70
| 1284
| 2418
| 11
|
| Adobe
| MIT, Berkely, Stanford, UT Austin, U Toronto, Texas A&M,
|
|
|-
| 7
| X10 Runtime
| Open source language with compiler, runtime and IDE; used by research teams within IBM and international research institutes/universities; demonstrated at petascale in HPC challenge; substantial test suite and libraries and applications (see http://x10-lang.org/x10-community/applications.html)
| 7
| 243 (SourceForge)
| 164 (Github repo)
| 606
| 14
| 27K commits made by 100 contributors and 700K lines of code: https://www.openhub.net/p/x10
| IBM, RIKEN, INRIA
| http://x10-lang.org/x10-community/universities-using-x10.html University of Alberta, Australian National University, UCLA, Carnegie Mellon University, Columbia University, TU Dresden, University of Erlangen-Nuremberg, Georgetown University, Imperial College London, IMT Institute for Advanced Studies Lucca, IIT Madras, Karlsruhe Institute of Technology, University of Kassel, Kobe University, McGill University, University of Paris-1 Sorbonne/CRI, Universidad Nacional Autónoma de México, TU Munich, Tianjin University, Tokyo Institute of Technology, University of Tokyo
| APGAS libraries for C++, Java and Scala; X10 compiler support for user-defined control constructs, and improvements to X10DT IDE, X10RT transport layer over MPI User-Level Fault Mitigation (ULFM), X10RT transport layer over MPI-3; proxy applications LULESH, MCCK, CoMD
| http://x10-lang.org/
|-
| 8
| Rosebud
| Incomplete research prototype
| 3
| 0
| 0
| 200
| 1
|
|
|
|
| http://svn.rice.edu/r/rosebud.
|-
| 9
| PolyOpt
| Open source
| 4
| 5
| 27
| 103
| 3
| N/A
|
| OSU, INRIA, UCLA, UIUC, CSU,...
| Polyhedral optimizer for ROSE, inc. high-order stencil optimizations for CPUs.
| http://hpcrl.cse.ohio-state.edu/wiki/index.php/Polyhedral_Compilation
|-
| 10
| Verification Tools (CodeThorn)
| Released within ROSE distribution, currently used for multiple international verification compititions, winning 3rd, 2nd, 1st, and 1st, over the last four years.
|
|
|
|
|
|
|
|
|
| http://www.roseCompiler.org
|-
| 11
| Bamboo
| Open source translator and run time for convertingMPI source to data driven formulation that hides communication
| 4
| 1
| 1
|
|
|
|
| NYU
| Bamboo Translator for C++
| http://cseweb.ucsd.edu/groups/hpcl/scg/BambooWebsite
|-
| 12
| Open Fortran Project
| Open source Fortran compiler front end. Used by: 1. ROSE (Fortran parser); 2. Open MPI (tool generation); 3. TAU Performance Tools (Fortran parser); 4. OpenCoarrays (coarray to library-call transformations); 5. University of Oregon (MATLAB MEX file generator); 6. EDX (Fortran to C# conversions).
| 7
| 1
| 18
| 318
| 4
|
|
|
|
| https://github.com/OpenFortranProject/ofp-sdf
|-
| 13
| http://people.csail.mit.edu/asolar
|
|
|
|
|
|
|
|
|
|
| http://homes.cs.washington.edu/~emina/rosetta
|-
| 14
| DTEC Web Site:
|
|
|
|
|
|
|
|
|
|
| http://portal.nersc.gov/ project/rosecompiler/dtec/wordpress/
|-
| 15
| Formal Verification
| Coq proofs of stencil algorithms and programs
|
|
|
| 1
| 2
|
|
|
|
|
|-
| 16
| AMRStencil
| C++11 Embedded Domain Specific Language for Stencil-based Adaptive Mesh Refinement algorithms.
|
| 4
| 8
| 20
| 5
|
| LBNL, LLNL
|
| AMRStencil API. Reference Manual. Multigrid Example. Euler's Equation example.
| svn repo https://anag-repo.lbl.gov/svn/AMRStencil
|-
|
|
|
|
|
|
|
|
|
|
|
|
|
|}

DynAX

2023-07-10T04:48:33Z

Pinfold:

{{Infobox project
| title = DynAX
| image = [[File:BrandywineTeam-logo.png|180px]]
| imagecaption =
| website = [http://www.etinternational.com/xstack/ www.etinternational.com/xstack]
| team-members = [http://www.etinternational.com/ ETI], [https://www.reservoir.com/ Reservoir Labs], [http://cs.illinois.edu/ UIUC], [http://www.pnnl.gov/ PNNL]
| pi = [[Guang Gao]]
| co-pi = Benoit Meister (Reservoir Labs),
David Padua (UIUC),
John Feo (PNNL)
}}

'''Dynamically Adaptive X-Stack''' or '''DynAX''' is a team led by ET International to conduct a research on runtime software for exascale computing.

Moving forward, exascale software will be unable to rely on minimally invasive system interfaces to provide an execution environment. Instead, a software runtime layer is necessary to mediate between an application and the underlying hardware and software. This proposal describes a model of execution based on codelets, which are small pieces of work that are sequenced by expressing their interdependencies to runtime software instead of relying on the implicit sequencing of a software thread. In addition, this document will also describe interactions between the runtime layer, compiler, and programming language.

The runtime software for exascale computing must be able to deal with a very large amount of outstanding work at any given time and manage enormous amounts of data, some of which may be highly volatile. The relationship between work and the data it acts upon or generates is crucial to maintaining high performance and low power usage. A poor understanding of data locality may lead to a much higher amount of communication, which is extremely undesirable in an exascale environment. To assist it in associating work with data and facilitating the migration of work to data and vice versa, such a runtime may impose a hierarchy on regions of the system, dividing it up along address space and privacy boundaries to allow it to guess at the cost inherent in communicating between regions. Furthermore, tying data and work to locations in the hierarchy creates a construct by which transparent work stealing and sharing may be applied, helping to keep stolen work near its data and allowing shared work to be issued to specific regions.

Compilers also need to reflect the requirements of exascale computing systems. A compiler that supports a codelet execution model must be able to determine appropriate boundaries for codelets in software and generate the codelets and code to interface with both the runtime and a transformed version of the input program. We propose that a three-step compilation process be used, wherein program code is compiled down to a high-level-language-independent internal representation, which can then be compiled down to C code that makes API calls into runtime software. This C code can then be compiled down to a platform-specific binary for execution on the target system, using existing C compilers for the generated sequential code. Higher-level analysis of the relationship between codelets and data can be performed in earlier steps, and this can enable the compiler to emit static hints to the runtime to assist in making decisions on scheduling and placement. Compilers can also assist in providing for fault tolerance by supporting containment domains, which can be used by the runtime software to assist in program checkpointing.

This work will be done in the context of DOE co-design applications. We will use kernels of these applications as well as other benchmarks and synthetic kernels in the course of our research. The needs of the co-design applications will provide valuable feedback to the research process.

== Team Members ==
* '''[http://www.etinternational.com/ ET International (ETI):]''' Execution Model, Runtime Systems, Parallel Intermediate Language, Resilience
* '''[https://www.reservoir.com/ Reservoir Labs:]''' Programming Models, Loop Optimizations
* '''[http://cs.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC):]''' High level data structures and algorithms for parallelism and locality
* '''[http://www.pnnl.gov/ Pacific Northwest National Laboratory (PNNL):]''' Co-design and NWChem kernels for evaluation, energy efficiency

== Objectives ==
'''Scalability:''' Expose, express, and exploit O(10^10) concurrency

'''Locality:''' Locality aware data types, algorithms, and optimizations

'''Programmability:''' Easy expression of asynchrony, concurrency, locality

'''Portability:''' Stack portability across heterogeneous architectures

'''Energy Efficiency:''' Maximize static and dynamic energy savings while managing the tradeoff between energy efficiency, resilience, and performance

'''Resilience:''' Gradual degradation in the face of many faults

'''Interoperability:''' Leverage legacy code through a gradual transformation towards exascale performance

'''Applications:''' Support NWChem

== Roadmap ==
[[File:DynAX-Roadmap.png|500px]]

== Impact ==
'''Scalability:'''
* Expose, express and exploit new forms of parallelism
* Provide mechanisms of task scheduling across the system as if it were one system rather than many disparate pieces
* Symmetric access semantics across heterogeneous devices

'''Locality:'''
* Provide mechanisms to express locality as a first-class citizen
* Expose the memory hierarchies to the compiler (and programmer)
* Provide data types and memory models so that the programmer can view the system as one system instead of many disparate memories

'''Programmability:'''
* Create easier ways of expressing asynchrony thereby enabling programmers to write more scalable programs
* R-Stream will automatically extract parallelism and locality from common idioms
* Provide data types and algorithms that provide high-level representations of arrays mapped to the memory and algorithm hierarchy for automatic parallelization and data placement

'''Portability:'''
* Demonstrate a software stack that is portable to multiple architectures provided a C compiler
* Support a platform abstraction layer in SWARM, which will allow it to operate on multiple heterogeneous architectures
* Work with Xpress on the XPI interface to show application portability between runtime systems

'''Energy Efficiency:'''
* Collocate execution and data
* Dynamically load balance execution based on resource availability
* Dynamically scale resources based on load
* Provide new programming constructs (Rescinded Primitive Data Types) that allow compressed data formats at higher memory levels to minimize data transfer costs

'''Resilience:'''
* Integrate containment domains and their extensions into the SWARM runtime system and SCALE compiler
* Allow graceful degradation in the face of exascale-level faults and a framework for software validation of soft faults

'''Interoperability:'''
* Work with Xpress on XPI interoperability with legacy codes such that all X-Stack runtime systems and all X-Stack applications can benefit from Evolutionary/Revolutionary runtime system interoperability

'''Applications:'''
* Provide NWChem kernels and expertise to all X-Stack projects
* Use Co-Design and NWChem applications to evaluate the Brandywine Team Software Stack

== Software Stack ==

[[File:Xstack-software-stack.png|X-Stack Software Stack|500px]]

The X-Stack software stack consists of high level data objects and algorithms (HTA: Hierarchical Tiled Arrays), R-Stream loop optimizing compiler, SCALE parallel language compiler, and SWARM distributed heterogeneous runtime system. The project will extend the existing software tools to improve on parallelism, locality, programmability, portability, energy efficiency, resilience, and interoperability (see left). In addition, it will add new infrastructure for energy efficiency (Rescinded Primitive Data Types) and resilience (Containment Domains).

'''SWARM (SWift Adaptive Runtime Machine)'''
[[File:SWARM-trace-comparison.png|right|350px]]
* Codelets
** Basic unit of parallelism
** Nonblocking tasks
** Scheduled upon satisfaction of precedent constraints
* Hierarchical Locale Tree: spatial position, data locality
* Lightweight Synchronization
* Asynchronous Split-phase Transactions: latency hiding
* Message Driven Computation
* Control-flow and Dataflow Futures
* Error Handling
* Active Global Address Space (planned)
* Fault tolerance (planned)

'''R-Stream'''
[[File:R-StreamMF.jpg|right|350px]]
* Current capabilities:
** Automatic parallelization and mapping
** Heterogeneous, hierarchical targets
** Automatic DMA/comm. generation/optimization
** Auto-tuning tile sizes, mapping strategies, etc.
** Scheduling with parallelism/locality layout tradeoffs
** Corrective array expansion
* Planned capabilities:
** Extend explicit data placement
** Generation of parallel codelet codes from serial codes
** Generation of SCALE IR and tuning hints on scheduling and data placement
** Automatic mapping of irregular mesh codes

[[File:HTA.png|right|350px]]
'''Hierarchical Tiled Arrays'''
* HTAs are recursive data structure
** Tree structured representation of memory
* Includes library of operations to enable the programming of codelets in the familiar notation of C/C++
** Represent parallelism using operations on arrays and sets
** Represent parallelism using parallel constructs such as parallel loops
* Compiler optimizations on sequences of HTA operations will be evaluated

[[File:RPDTA.png|right|350px]]
'''Rescinded Primitive Data Type Access'''
* Redundancy removal to improve performance/energy
** Communication
** Storage
* Redundancy addition to improve fault tolerance
** High Level fault tolerant error correction codes and their distributed placement
* Placeholder representation for aggregated data elements
** Memory allocation/deallocation/copying
** Memory consistency models

'''NWChem'''
* DOE’s Premier computational chemistry software
* One-of-a-kind solution scalable with respect to scientific challenge and compute platforms
* From molecules and nanoparticles to solid state and biomolecular systems
* Open-source has greatly expanded user and developer base (ECL 2.0)
* Worldwide distribution (70% is academia)
* Ab initio molecular dynamics runs at petascale
* Scalability to 100,000 processors demonstrated
* Smart data distribution and communication algorithms enable hybrid-DFT to scale to large numbers of processors

== Deliverables ==
'''Q1 (12/1/2012)'''
* [[media:Q1.pdf|Q1 Report]]
* [[media:Q1-cholesky.pptx|Presentation on Cholesky Decomposition work]]

'''Q2 (3/1/2013)'''
* [[media:BrandywineXStackReportQ2.pdf|Q2 Report]]
* [[media:pil_api_v0.3.pdf|PIL Design v0.3]]
* [[media:SCF_text_document.docx|NWChem SCF module description]]
* [[media:Scf.tar.gz|NWChem SCF code download]]
'''March 2013 PI Meeting''': [[media:DynAX_March_2013_PI_meeting.pptx|PI Meeting presentation]]

'''EXaCT All-hands meeting''': [[media:DynAX_EXaCT_meeting.pptx|DynAX Presentation]]

'''Q3 (6/1/2013)'''
* [[media:BrandywineXStackReportQ3.pdf|Q3 Report]]
* [[media:Scf2.tar.gz|NWChem SCF code download version 2]]
* [[media:Pil_api_v0.4.pdf|PIL Design v0.4]]
* [[media:TCG.tar.gz|Tensor Contraction Engine (OpenMP, CUDA, C, Fortran)]]

'''Year 1 report''': [[media:BrandywineXStackReportY1.pdf|Y1 Report]]

'''Q4 (9/1/2013)'''
* [[media:BrandywineXStackReportQ4.pdf|Q4 Report]]

'''Q5 (12/1/2013)'''
* [[media:BrandywineXStackReportQ5.pdf|Q5 Report]]
* [[media:Tce.tar.gz|TCE C Generator code]]

'''Q6 (3/1/2014)'''
* [[media:BrandywineXStackReportQ6.pdf|Q6 Report]]
* [[media:TCE slides.pdf|TCE Presentation]]

'''Q7 (6/1/2014)'''
* [[media:DynAXXStackQ7Report.pdf|Q7 Report]]
* [[media:Dynax_Deep_Dive_Presentation_-_HTA.pdf|HTA Deep Dive presentation]]
* [[media:Tce-20140605.tar.gz|TCE for OCR and SWARM (function-level parallelism)]]

'''Year 2 report''': [[media:DynAXX-StackYear2report.pdf|Y2 Report]]

'''Q8 (9/1/2014)'''
* [[media:DynAXXStackQ8Report.pdf|Q8 Report]]
* [[media:Dfm2014-cyang49.pdf|Publication - "Hierarchically Tiled Array as a High-Level Abstraction for Codelets" (DFM2014)]]
* [[media:DennisGaoVMOS.v2.pdf|Publication - "On the Feasibility of a Codelet Based Multi-core Operating System" (DFM2014)]]

'''Q9 (12/1/2014)'''
* [[media:DynAXXStackQ9Report.pdf|Q9 Report]]
* [[media:Lcpc_2014_shrestha.pdf|Publication - "Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading" (LCPC2014)]]
* [[media:Tce-20141110.tar.gz|TCE for OCR (block-level parallelism)]]

'''Q10 (3/1/2015)'''
* [[media:DynaxXStackQ10Report.pdf|Q10 Report]]
* [[media:Containment_domains_slides.pdf|Containment Domains presentation]]
* [[media:Cgo_slides_xstack_v7.pdf|Locality Aware Concurrent Start for Stencil Applications]]

'''Q11 (6/1/2015)'''
* [[media:DynAX-XStackQ11Report.pdf|Q11 Report]]
* [[media:ETITechnicalReport01.pdf|ETI Technical Report 01 : Landing Containment Domains on SWARM: Toward a Robust Resiliency Solution on A Dynamic Adaptive Runtime Machine]]
* [[media:ETITechnicalReport02.pdf|ETI Technical Report 02 : Legacy MPI Codes and its interoperability with fine grain task-parallel runtime systems for Exascale]]

'''Q12 '''
* [[media:DynAX-XStackQ12Report.pdf|Q12 Report]]
* [[media:GL_DataRes.pdf| PNNL DeepDive : Group Locality: Gregarious Data Restructuring]]
* [[media:Brandywine_Presentation_8-12-2015_v2.pdf| ETI DeepDive : Containment Domains In SWARM]]

'''Year 3 report''': [[media:DynAXYear3.pdf|Y3 Report]]

'''NCE '''
* [[media:ETI NCE.pdf|ETI NCE Report]]
* [[media:PNNL NCE.pdf|PNNL NCE Report]]
* [[media:Reservoir NCE.pdf|Reservoir NCE Report]]

File:Guang gao.jpeg

2023-07-10T04:48:01Z

Pinfold:

Guang Gao

2023-07-10T04:47:31Z

Pinfold: Created page with "{{Person |portrait=guang_gao.jpeg |firstname=Guang |middlename=R. |lastname=Gao |company=University of Delaware |position=Endowed Distinguished Professor of Electrical & Compu..."

{{Person |portrait=guang_gao.jpeg |firstname=Guang |middlename=R. |lastname=Gao |company=University of Delaware |position=Endowed Distinguished Professor of Electrical & Computer Engineering |location=Newark DE |country=United States |sector= |linkedin=https://www.linkedin.com/in/guang-r-gao-27824b5/ }}

CORVETTE

2023-07-10T04:46:33Z

Pinfold:

{{Infobox project
| title = CORVETTE
| image = [[File:CORVETTE-Logos.png|300px]]
| imagecaption =
| team-members = [http://www.berkeley.edu/ UC Berkeley], [http://www.lbl.gov/ LBNL]
| pi = [[Koushik Sen]]
| co-pi = James W. Demmel (UC Berkeley), Costin Iancu (LBNL)
| website = [http://crd.lbl.gov/groups-depts/ftg/projects/current-projects/corvette/ CORVETTE]
}}

'''Program ''Cor''rectness, ''Ve''rification, and ''T''es''t''ing for ''E''xascale''' or '''CORVETTE'''

== Team Members ==

=== Researchers ===
* [http://srl.cs.berkeley.edu/~ksen/doku.php/ Koushik Sen]
* [http://www.eecs.berkeley.edu/~demmel/ James Demmel]
* [http://www.crd.lbl.gov/about/staff/cds/ftg/costin-iancu/ Costin Iancu]

=== Postdoctoral Researchers ===
* [http://www.eecs.berkeley.edu/~hdnguyen/ Hong Diep Nguyen]
* [http://www.eecs.berkeley.edu/~xuehaiq/ Xuehai Qian]
* [http://www.eecs.berkeley.edu/~rubio/ Cindy Rubio-González]

=== Graduate Student Researcher ===
* [http://www.eecs.berkeley.edu/~nacuong/ Cuong Nguyen]

== Motivation ==
*High performance scientific computing
** Exascale: O(10<sup>6</sup>) nodes, O(10<sup>3</sup>) cores per node
** Requires asynchrony and “relaxed” memory consistency
** Shared memory with dynamic task parallelism
** Languages allow remote memory modification

* Correctness challenges
** Non-deterministic causes hard to diagnose correctness and performance bugs
*** Data races, atomicity violations, deadlocks …
** Bugs in DSL
** Scientific applications use floating-points: non-determinism leads to non-reproducible results
** Numerical exceptions can cause rare but critical bugs that are hard for non-experts to detect and fix

== Goals ==
Develop correctness tools for different programming models: PGAS, MPI, dynamic parallelism

I. Testing and Verification
* Identify sources of non-determinism in executions
* Data races, atomicity violations, non‐reproducible floating point results
* Explore state-of-the-art techniques that use dynamic analysis
* Develop precise and scalable tools: < 2X overhead

II. Debugging
* Use minimal amount of concurrency to reproduce bug
* Support two-level debugging of high-level abstractions
* Detect causes of floating-point anomalies and determine the minimum precision needed to fix them

==Impact Summary ==
* [https://xstackwiki.modelado.org/images/4/4c/Corvette-highlight_summary.pdf Corvette Impact Summary]

== List of Products ==
=== Publications ===

* [http://crd.lbl.gov/assets/pubs_presos/main2.pdf ''Floating Point Precision Tuning Using Blame Analysis''] Cindy Rubio-Gonz ́alez, Cuong Nguyen, James Demmel, William Kahan, Koushik Sen, Wim Lavrijsen, Costin Iancu, LBNL TR, April 17, 2015.
* [http://crd.lbl.gov/assets/pubs_presos/main3.pdf ''OPR: Partial Deterministic Record and Replay for One-Sided Communication''] Xuehai Qian, Koushik Sen, Paul Hargrove, Costin Iancu, LBNL TR, April 17, 2015.
* [http://crd.lbl.gov/assets/pubs_presos/nwbar.pdf ''Barrier Elision for Production Parallel Programs''] Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushik Sen, John Mellor Crummey, Costin Iancu, PPOPP 2015, February 5, 2015.
* "Parallel Reproducible Summation James Demmel", Hong-Diep Nguyen, In IEEE Transactions on Computers, Special Section on Computer Arithmetic 2014, April 30, 2014
* [http://www.eecs.berkeley.edu/~rubio/includes/sc13.pdf ''Precimonious: Tuning Assistant for Floating-Point Precision.''] C. Rubio-González, C. Nguyen, H.D. Nguyen, J. Demmel, W. Kahan, K. Sen, D.H. Bailey, C. Iancu, and D. Hough. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, November 2013.
* [http://www.eecs.berkeley.edu/~hdnguyen/public/papers/ARITH21_Fast_Sum.pdf ''Fast Reproducible Floating-Point Summation''.] J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
* [http://www.eecs.berkeley.edu/~hdnguyen/public/papers/ARITH21_ExaScale.pdf ''Numerical Accuracy and Reproducibility at ExaScale''.] J. Demmel and H.D. Nguyen. In Proceedings of the 21st IEEE Symposium on Computer Arithmetic (ARITH'13), Austin, TX, April 2013.
* [http://srl.cs.berkeley.edu/~ksen/papers/thrille-exp.pdf ''Scaling Data Race Detection for Partitioned Global Address Space Programs''.] C. Park, K. Sen, C. Iancu. In Proceedings of the International Conference on Supercomputing (ICS'13), Eugene, OR, June 2013.

=== Technical Presentations ===

* 06/2014. ''Testing, Debugging, and Precision-tuning of Large-scale Parallel and Floating-point Programs''. Presenter: Koushik Sen. Invited R. Narasimhan Lecture Award, TIFR, India.
* 05/2014. ''Precision Tuning of Floating-Point Programs''. Presenters: Cindy Rubio-González and Cuong Nguyen. Berkeley Programming Systems Retreat, Santa Cruz, CA.
* 04/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. University of California, Irvine, Irvine, CA.
* 03/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. University of Texas at Dallas, Dallas, TX.
* 03/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. University of California, Davis, Davis, CA.
* 02/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. The University of New York at Buffalo, Buffalo, NY.
* 02/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. College of William and Mary, Williamsburg, VA.
* 02/2014. ''Improving Software Reliability and Performance Using Program Analysis''. Presenter: Cindy Rubio-González. Invited talk at SRI International, Menlo Park, CA.
* 01/2014. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. UC Berkeley ASPIRE Winter Retreat, Tahoe, CA.
* 12/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Invited talk at Bay Area Scientific Computing Day (BASCD'13), LBNL, Berkeley, CA.
* 11/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Supercomputing Conference (SC'13), Denver, CO.
* 11/2013. ''ReproBLAS: Reproducible BLAS''. Presenter: Hong Diep Nguyen. Supercomputing Conference (SC'13), Denver, CO.
* 11/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Massachusetts Institute of Technology, Cambridge, MA.
* 11/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Rising Stars in EECS Workshop, Poster Session, MIT, Cambridge, MA.
* 08/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Invited talk at Oracle, Compiler Technical Talks Series, Santa Clara, CA.
* 06/2013. ''Scaling Data Race Detection for Partitioned Global Address Space Programs''. Presenter: Costin Iancu. International Supercomputing Conference (ICS'13), Eugene, OR.
* 06/2013. ''Efficient Reproducible Floating-Point Reduction Operations on Large Scale Systems''. Presenter: Hong Diep Nguyen. SIAM Annual Meeting 2013 (AN13), San Diego, CA.
* 06/2013. ''Precimonious: Tuning Assistant for Floating-Point Precision''. Presenter: Cindy Rubio-González. Lawrence Berkeley National Laboratory DEGAS Retreat, Santa Cruz, CA.
* 04/2013. ''Fast Reproducible Floating-Point Summation''. Presenter: Hong Diep Nguyen. 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX.
* 04/2013. ''Numerical Reproducibility and Accuracy at ExaScale''. Presenter: Hong Diep Nguyen. 21st IEEE Symposium on Computer Arithmetic (ARITH21), Austin, TX.
* 11/2012. ''Reproducible Floating Point Computation: Motivation, Algorithms, Diagnostics''. Presenter: James Demmel. Birds of a Feather (BOF), Supercomputing Conference (SC'12), Salt Lake City, UT.

=== Software Releases ===

* [http://upc.lbl.gov/thrille.html/ UPC-Thrille]
* [https://github.com/corvette-berkeley/precimonious/ Precimonious]
* [http://bebop.cs.berkeley.edu/reproblas/ ReproBLAS]

== Testing and Verification Tools ==
=== Scalable Testing of Parallel Programs ===
* Concurrent Programming is hard
** Bugs happen non‐deterministically
** Data races, deadlocks, atomicity violations, etc.

* Goals: build a tool to test and debug concurrent and parallel programs
** Efficient: reduce overhead from 10x‐100x to 2x
** Precise
** Reproducible
** Scalable

* Active random testing

=== Active Testing ===
* Phase 1: Static or dynamic analysis to find potential concurrency bug patterns, such as data races, deadlocks, atomicity violations
** Data races: Eraser or lockset based [PLDI’08]
** Atomicity violations: cycle in transactions and happens‐before relation [FSE’08]
** Deadlocks: cycle in resource acquisition graph [PLDI’09]
** Publicly available tool for Java/Pthreads/UPC [CAV’09]
** Memory model bugs: cycle in happens‐before graph [ISSTA’11]
** For UPC programs running on thousands of cores [SC’11]

* Phase 2: “Direct” testing (or model checking) based on the bug patterns obtained from phase 1
** Confirm bugs

'''Goals'''
* Goal 1. Nice to have a trace exhibiting the data race
* Goal 2. Nice to have a trace exhibiting the assertion failure
* Goal 3. Nice to have a trace with fewer threads
* Goal 4. Nice to have a trace with fewer context switches

=== Challenges for Exascale ===
* Java and pthreads programs
** Synchronization with locks and condition variables
** Single node
* Exascale has different programming models
** Large scale
** Bulk communication
** Collective operations with data movement
** Memory consistency
** Distributed shared memory
* Cannot use centralized dynamic analyses
* Cannot instrument and track every statement

'''Further Challenges'''
* Targeted a simple programming paradigm
** Threads and shared memory
* Similar techniques are available for MPI and CUDA
** ISP, DAMPI, MARMOT, Umpire, MessageChecker
** TASS uses symbolic execu7on
** PUG for CUDA
* Analyze programs that mix different paradigms
** OpenMP, MPI, Shared Distributed Memory
** Need to correlate non‐determinism across paradigms

=== How Well Does it Scale? ===

[[File:CORVETTE-Franklin.png|right|150px]]

* Maximum 8% slowdown at 8K cores
** Franklin Cray XT4 Supercomputer at NERSC
** Quad‐core 2.x3GHz CPU and 8GB RAM per node
** Portals interconnect
* Optimizations for scalability
** Efficient Data Structures
** Minimize Communication
** Sampling with Exponential Backoff

[[File:CORVETTE-Optimization.png|700px]]

== Debugging Tools ==

=== Debugging Project I ===
* Detect bug with fewer threads and fewer context switches

'''Experience with C/PThreads''': Over 90% of simplified traces were within 2 context switches of optimal

[[File:CORVETTE-Context-Switch-Results.png|600px]]

'''Small model hypothesis for parallel programs'''

* Most bugs can be found with few threads
** 2‐3 threads
** No need to run on thousands of nodes
* Most bugs can be found with fewer context switches [Musuvathi and Qadeer, PLDI 07]
** Helps in sequential debugging

=== Debugging Project II ===
* Two‐level debugging of DSLs
* Correlate program state across program versions

[[File:CORVETTE-Debugging-DSL.png|600px]]

=== Debugging Project III ===
* Find floating point anomalies
* Recommend safe reduction of precision

== Floating Point Debugging ==

=== Why do we care? ===
* Usage of floating point programs has been growing rapidly
** HPC
** Cloud, games, graphics, finance, speech, signal processing

* Most programmers are not expert in floating‐point!
** Why not use highest precision everywhere

* High precision wastes
** Energy
** Time
** Storage

=== Floating Point Debugging Problem 1: Reduce unnecessary precision ===
* Consider the problem of finding the arc length of the function

'''How can we find a minimal set of code fragments whose precision must be high?'''

[[File:CORVETTE-Debugging-Problem-1.png|600px]]

=== Floating Point Debugging Problem 2: Detect Inaccuracy and Anomaly ===

[[File:CORVETTE-Debugging-Problem-2.png|500px]]

=== What can we do? ===
* We can reduce precision “safely”
** reduce power, improve performance, get better answer

* Automated testing and debugging techniques
** To recommend “precision reduction”
** Formal proof of “safety” can be replaced by Concolic testing

* Approach: automate previously hand-made debugging
** Concolic testing
** Delta debugging [Zeller et al.]

== Implementation ==
* Prototype implementation for C programs
** Uses CIL compiler framework
** http://perso.univ-perp.fr/guillaume.revy/index.php?page=debugging

* Future plans
** Build on top of LLVM compiler framework

== Potential Collaboration ==
* Dynamic analyses to find bugs ‐ dynamic parallelism, unstructured parallelism, shared memory
** DEGAS, XPRESS, Traleika Glacier

* Floating point debugging
** Co‐design centers

* 2‐level debugging
** D-TEC

== Summary ==

[[File:CORVETTE-Summary.png|600px]]

* Build testing tools
** Close to what programmers use
** Hide formal methods and program analysis under testing

* If you are not obsessed with formal correctness
** Testing and debugging can help you solve these problems with high confidence

Koushik Sen

2023-07-10T04:45:58Z

Pinfold:

{{Person
|portrait=Koushik Sen.jpeg
|firstname=Koushik
|lastname=Sen
|company=University of California, Berkeley
|position=Professor
|location=Berkeley CA
|country=United States
|linkedin=https://people.eecs.berkeley.edu/~ksen/?rnd=1688946287138
}}

File:Koushik Sen.jpeg

2023-07-10T04:45:49Z

Pinfold:

Koushik Sen

2023-07-10T04:45:29Z

Pinfold: Created page with "{{Person |portrait=Koushik Sen.jpeg |firstname=Koushik |middlename= |lastname=Sen |company=University of California, Berkeley |position=Professor |location=Berkeley CA |countr..."

{{Person |portrait=Koushik Sen.jpeg |firstname=Koushik |middlename= |lastname=Sen |company=University of California, Berkeley |position=Professor |location=Berkeley CA |country=United States |sector= |linkedin=https://people.eecs.berkeley.edu/~ksen/?rnd=1688946287138 }}

Andrew Chien

2023-07-10T04:44:06Z

Pinfold:

{{Person
|portrait=Andrew Chen.jpg
|firstname=Andrew
|lastname=Chien
|company=University of Chicago,
|position=William Eckhardt Professor, Department of Computer Science, 10708498
|location=Chicago IL
|country=United States
|linkedin=https://www.linkedin.com/in/andrew-a-chien-1b70795/
}}

Andrew Chien

2023-07-10T04:42:50Z

Pinfold:

Andrew Chien

2023-07-10T04:42:41Z

Pinfold:

{{Person
|portrait=Andrew Chen.jpg
|firstname=Andrew
|lastname=Chien
|company=University of Chicago,
|position=William Eckhardt Professor, Department of Computer Science, 10708498
|linkedin=http://cs.uchicago.edu/people/aachien/
|location=Chicago IL
|country=United States
}}

GVR

2023-07-10T04:39:41Z

Pinfold:

{{Infobox project
| title = GVR: Exploiting Global-view for Resilience
| image = [[File:GVR-Logos.png|400px]]
| imagecaption =
| team-members = [http://www.uchicago.edu/ U. of Chicago], [http://www.anl.gov/ ANL], [http://www.hpl.hp.com/ HP Labs]
| pi = [[Andrew Chien]]
| co-pi = [http://www.mcs.anl.gov/person/pavan-balaji/ Pavan Balaji (ANL)]
| website = http://gvr.cs.uchicago.edu/
| download = https://sites.google.com/site/uchicagolssg/lssg/research/gvr/downloads
}}

'''Exploiting Global View for Resilience''' or '''GVR'''

== Team Members ==
* [http://www.uchicago.edu/ University of Chicago]: [http://www.cs.uchicago.edu/people/aachien/ Andrew A. Chien] (PI), [http://www.cs.uchicago.edu/people/hfujita Hajime Fujita], [http://www.cs.uchicago.edu/people/zar1 Zachary Rubenstein], [http://www.cs.uchicago.edu/people/zimingzheng Ziming Zheng], [http://www.cs.uchicago.edu/people/dun Nan Dun], [http://www.cs.uchicago.edu/people/aimanf Aiman Fang], Yan Liu
* [http://www.anl.gov/ Argonne National Laboratory (ANL)]: [http://www.mcs.anl.gov/person/pavan-balaji/ Pavan Balaji] (co-PI), [http://www.mcs.anl.gov/person/pete-beckman Pete Beckman], [http://www.mcs.anl.gov/person/kamil-iskra/ Kamil Iskra], Wes Bland
* [http://www.hpl.hp.com/ HP Labs]: Robert Schreiber

'''Application Partnerships'''
* Advanced Nuclear Reactor Simulation ([https://cesar.mcs.anl.gov/content/andrew-siegel Andrew Siegel], CESAR)
* Computational Chemistry ([https://www.alcf.anl.gov/staff-directory/jeff-hammond Jeff Hammond], ALCF)
* Rich Computational Frameworks (Trilinos, [http://www.sandia.gov/~maherou/ Mike Heroux], Sandia)
* Particle codes (ddcMD) (David Richards, Ignacio Laguna, LLNL)
* Adaptive Mesh Refinement (Chombo) (Brian van Straalen, Anshu Dubey, LBNL)
* Combustion (S3D) (Jackie Chen, Sandia)

Global View Resilience (GVR) is a new programming approach that exploits a global view data model (global naming of data, consistency, and distributed layout), adding reliability to globally visible distributed arrays. The globally-visible distributed array abstraction is "multi-version", providing redundancy in time, and a convenient location for application annotations for reliability needs. Because the distributed array abstraction is portable, GVR enables application programmers to manage reliability (and its overhead) in a flexible, portable fashion, tapping their deep scientific and application code insights. Further, GVR will provide a flexible, efficient, cross-layer error management architecture called “open reliability” that allows applications to describe error detection (checking) and recovery routines and inject them into the GVR stack for efficient implementation. This architecture enables applications and systems to work in concert, exploiting semantics (algorithmic or even scientific domain) and key capabilities (e.g., fast error detection in hardware) to dramatically increase the range of errors that can be detected and corrected.

== Resilience Challenges ==

* Can we achieve a smooth transition to system resilience? (a la Flash memory, Internet)
* What’s an application to do?

[[File:GVR-Resilience-Challenges.png|600px]]

== Resilience Co-design ==

'''Co‑design without co‑dependence'''
* Software: Information and Algorithms to enhance resilience (REQ: Portable, flexible)
* Runtime, OS, and Architecture Mechanisms to enhance resilience (REQ: leverage beyond HPC, cheap)

[[File:GVR-Resilience-Co-design.png|400px]]

== Project Impact ==

* [https://xstackwiki.modelado.org/images/4/4c/GVR_Highlights_Summary9-2016.pdf GVR Project Impact]

'''Challenges'''
* Enable an application to incorporate resilience incrementally, expressing resilience proportionally to the application need
* “Outside in”, as needed, incremental, ...

== GVR Approach 1==

[[File:GVR-Approach-1.png|600px]]

* Application-System Partnership
** Expose and exploit algorithm and application domain knowledge
** Enable “End to end” resilience model

* Foundation in Data-oriented resilience
** Internet services, map-reduce, internet, ...
** Achieve with high performance and massive parallelism...
** Global view data Foundation (PGAS..., GA, SWARM, ParalleX, CnC, ...)

=== Data-oriented Resilience ===

[[File:GVR-Data-Oriented.png|600px]]

* Parallel applications and global-view data
* Natural parallel structure version-to-version
** Example: shock hydro simulation at t=10ms to 100ms
** Example: iterative solver at iteration 1 to 20
** Example: monte carlo at 10M to 20M points

* Temporal redundancy enables rollback and resume
** User-controlled, convenient

=== Resilience Partnership ===
* Proportional Resilience
** Application specifies “Resilience priorities”
** Mapped into data-redundancy in space
** Mapped into redundancy in time (multi-version)
** Complements computation/task redundancy efforts

* Deep error detection: invariants, assertions, checks ... and recovery

* Applications add further checks based on algorithm and domain semantics
** Application add flexible, adaptive recovery mechanisms (and exploit multi-version)

* “End-to-end” resilience

== GVR Approach 2 ==

[[File:GVR-Approach-2.png|600px]]

* x-layer approach for efficient execution (and better resilience)
** Spatial redundancy – coding at multiple levels, system level checking
** Temporal redundancy - Multi-version memory, integrated memory and NVRAM management

* Push checks to most efficient level (find early, contain, reduce overhead)
* Recover based on semantics from any level (repair more, larger feasible computation, reduce overhead)
* Efficient implementation support in runtime, OS, architecture ... increase efficiency and containment

=== Multi-version Memory ===

[[File:GVR-Memory.png|600px]]

* Common parallel paradigm, basis for programmer engagement
* Frames invariant checks, more complex checks based on high-level semantics
* Frames sophisticated recovery

== Research Challenges ==
* Understand application resilience needs and opportunities for ''proportional resilience'' and ''deep error detection''/''end-to-end resilience''
* Explore multi-version memory as opportunity for framing richer resilience and parallelism
* Design API that embodies these ideas and ''gentle slope'' incremental application effort
* Create efficient x-layer implementations - many questions
* Explore architecture opportunities to increase resilience and reduce overhead

== Global‑view Data Program ==

[[File:GVR-Program-1.png|600px]]

== GVR Resilience Program ==

[[File:GVR-Program-2.png|600px]]

== Global View & Consistent Snapshots ==

[[File:GVR-Snapshots.png|600px]]

* How to safely, efficiently identify consistent snapshots?
** Application control: Global Synch; Array-level synch; explicit snapshot
** Application flagged (optional)
** Implicit (runtime decides)
* Snapshots = natural points to express and implement assertions, checks, recovery

== Implementing Multi-version ==

[[File:GVR-Implementing.png|600px]]

* How to implement multi-version efficiently?
** Time, Space, Label => representation, protocol
* Which to take?
** Versions are logical, snapshots require resources
* Intelligent storage:
** Representation, compression, architecture support
** Older versions recede into storage [SILT]

== Intelligent Memory and Storage ==

[[File:GVR-Memory-Storage.png|600px]]

* How to exploit intelligence at memory and storage? (at controller)
* Intelligent stacked DRAM and storage-class Memory [HMC,PIM]
* Fine-grained state tracking; compression, intelligent, copying, etc.
* Efficient version capture; differenced checkpoints (Plank95, Svard11)

== Opportunities ==
* Multi-version and increased concurrency
* Multi-version and debugging
* Architecture support and fine-grained synchronization, application checks, compressed memory, etc.
* ...more?

== Expected Outcomes ==
* Use cases – Application skeleton design and classifications which form foundation of the design
* Design of GVR API for flexible resilience and multi-version global data
* Research prototype software developed as a library; target for programmers, compiler backends
* Experiments with mini-apps and application partners (w/ co-design postdocs)
* Assessment of architecture support opportunities and quantitative benefits

== GVR X-Stack Synergies ==

[[File:GVR-Synergies.png|600px]]

* Direct Application Programming Interface
* Co-existence, even target with other Runtimes
* Rich Solver Library Building Block
* Programming System Target

== Research Products ==

Full report = [[Media:gvr-research-products.pdf]]

* Demonstrated easy application integration, <2% lines of code change in large (10K-100K line applications)
* Demonstrated controllable and low performance overhead (application scaling to 16,384 nodes and <2% overhead)
* Released on multiple platforms, including Cray (Edison, Cori), IBM BG/Q (Mira, JuQueen), Linux clusters
* Demonstrated flexible, portable application-semantics based forward-error correction in multiple applications (OpenMC, ddcMD, etc.)
* Software release available from http://gvr.cs.uchicago.edu/ and deployed at multiple supercomputing centers, including NERSC.

== Publications ==
(see project web site for full up to date list)

* Hajime Fujita, Kamil Iskra, Pavan Balaji, and Andrew A. Chien, "Versioning Architectures for Local and Global Memory", in Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS), December 2015, Melbourne, Australia.
* Aiman Fang, Hajime Fujita and Andrew A. Chien, "Towards Understanding Post-Recovery Efficiency for Shrinking and Non-Shrinking Recovery", in Proceedings of the 8th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, at Euro-Par 2015, Vienna, Austria, August 24, 2015
* Anshu Dubey, Hajime Fujita, Zachary Rubenstein, Brian Van Straalen and Andrew Chien. "A Case Study Of Application Structure Aware Resilience Through Differentiated State Saving And Recovery", in Proceedings of the 8th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, at Euro-Par 2015, Vienna, Austria, August 24, 2015
* Hajime Fujita, Kamil Iskra, Pavan Balaji, and Andrew A. Chien, "Empirical Characterization of Versioning Architectures", in Proceedings of IEEE Cluster, September 8-10, 2015, Chicago.
* A. Chien, P. Balaji, N. Dun, A. Fang, H. Fujita, K. Iskra, Z. Rubenstein, Z. Zheng, J. Hammond, I. Laguna, D. Richards, A. Dubey, B. van Straalen, M Hoemmen, M. Heroux, K. Teranishi, A. Siegel. Exploring Versioning for Resilience in Scientific Applications: Global-view Resilience, submitted for publication, March 2015. (Best overall project summary)
* Aiman Fang and Andrew A. Chien, "How Much SSD Is Useful for Resilience in Supercomputers”, in ACM Symposium on Fault-tolerance at Extreme-Scale (FTXS) associated with HPDC 2015, Portland, Oregon, June 15, 2015 (Slides)
* Aiman Fang, "How Much SSD Is Useful for Resilience in Supercomputers”, Master's Thesis, Department of Computer Science, University of Chicago, April 2015.
* Nan Dun, Hajime Fujita, John R. Tramm, Andrew A. Chien, Andrew R. Siegel, Data Decomposition in Monte Carlo Neutron Transport Simulations using Global View Arrays, International Journal of High Performance Computing Applications, March 2015.
* A. Chien, P. Balaji, P. Beckman, N. Dun, A. Fang, H. Fujita, K. Iskra, Z. Rubenstein, Z. Zheng, R. Schreiber, J. Hammond, J. Dinan, A. Laguna, D. Richards, A. Dubey, B. van Straalen, M Hoemmen, M. Heroux, K. Teranishi, A. Siegel, and J. Tramm, "Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience", in International Conference on Computational Science (ICCS 2015), Reykjavik, Iceland, June 2015.
* Hajime Fujita, Nan Dun, Zachary Rubenstein, and Andrew A. Chien. Log-Structured Global Array for Efficient Multi-Version Snapshots, IEEE CCGrid 2015, May 2015. Also UChicago CS Tech Report 2014-16, Nov 2014.
* Hajime Fujita, Nan Dun, Aiman Fang, Zachary A. Rubenstein, Ziming Zheng, Kamil Iskra, Jeff Hammond, Anshu Dubey, Pavan Balaji, Andrew A. Chien: Using Global View Resilience (GVR) to add Resilience to Exascale Applications, SC14, Nov 2014 (Best Poster Finalist!)
* The GVR Team, Global View Resilience (GVR) Documentation, Release 1.0, University of Chicago, Computer Science Technical Report 2014-10.
* Nan Dun, Hajime Fujita, John Tramm, Andrew A. Chien, and Andrew R. Siegel. Data Decomposition in Monte Carlo Particle Transport Simulations using Global View Arrays, UChicago CS Tech Report 2014-09 May 2014.
* The GVR Team, How Applications Use GVR: Use Cases, University of Chicago, Computer Science Technical Report 2014-06.
* The GVR Team, Global View Resilience, API Documentation R0.8.1-rc0, University of Chicago, Computer Science Technical Report 2014-05.
* Aiman Fang and Andrew A. Chien, "Applying GVR to Molecular Dynamics: Enabling Resilience for Scientific Computations", Tech Report, University of Chicago, Dept of Computer Science, CS-TR-2014-04, April 2014.
* Ziming Zheng, Andrew A. Chien, Keita Teranishi, "Fault Tolerance in an Inner-Outer Solver: a GVR-enabled Case Study", in Proceedings of VECPAR 2014, July 2014, Eugene, Oregon. Proceedings available from Springer-Verlag Lecture Notes in Computer Science.
* Z. Rubenstein, "Error Checking and Snapshot-based Recovery in Preconditioned Conjugate Gradient Solver", Masters Thesis, University of Chicago, Department of Computer Science, March 2014.
* Z. Rubenstein, J. Dinan, H. Fujita, Z. Zheng, A. Chien, "Error Checking and Snapshot-Based Recovery in a Preconditioned Conjugate Gradient Solver", University of Chicago, Department of Computer Science Technical Report 2013-11, December 2013
* Wesley Bland, Aurelien Bouteiller, Thomas Herault, Joshua Hursey, George Bosilca, and JackJ. Dongarra. An evaluation of User-Level Failure Mitigation support in MPI. Computing, 95(12):1171–1184, 2013.
* Ziming Zheng, Zachary Rubenstein, and Andrew A. Chien, GVR-Enabled Trilinos: An Outside-In Approach for Resilient Computing, in the SIAM Conference on Parallel Processing, February 2014, Portland Oregon.
* Ziming Zheng, Andrew A. Chien, Mark Hoemmen, Keita Teranishi, "Fault Tolerance in an Inner-Outer Solver: a GVR-enabled Case Study", available as Technical Report from University of Chicago Department of Computer Science, CS-TR-2014-01, January 2014.
* Guoming Lu, Ziming Zheng, and Andrew A. Chien, When are Multiple Checkpoints Needed?, in 3rd Workshop for Fault-tolerance at Extreme Scale (FTXS), at IEEE Conference on High Performance Distributed Computing, June 2013, New York, New York.
* Hajime Fujita, Robert Schreiber, Andrew A. Chien, It's Time for New Programming Models for Unreliable Hardware, to appear in ASPLOS 2013 Provocative Ideas session, March 18, 2013.
* Sean Hogan, Jeff Hammond, and Andrew A. Chien, An Evaluation of Difference and Threshold Techniques for Efficient Checkpointing, 2nd Workshop on Fault-Tolerance at Extreme Scale FTXS 2012 at DSN 2012, June 2012, Boston, Massachusetts.

Modelado Foundation - User contributions [en]

Main Page

Sandbox

Sandbox

Sandbox

Welcome

Welcome

Groups

Groups

Groups

File:Sc.svg

Groups

Groups

Groups

Welcome

Typhoon Mawar 2005 Computer Simulation

Binary Black Hole

Ionring Blackhole

Ionring Blackhole

Team

Template:PITable

Template:PITable

X-TUNE

SLEEC

PIPER

Ron Brightwell

File:Ron Brightwell.jpeg

Ron Brightwell

XPRESS

Wilfred Pinfold

Wilfred Pinfold

Wilfred Pinfold

File:WilfredPinfold.png

Wilfred Pinfold

Traleika Glacier

Shekhar Borkar

File:Shekhar Borkar.jpeg

Shekhar Borkar

DEGAS

D-­TEC

DynAX

File:Guang gao.jpeg

Guang Gao

CORVETTE

Koushik Sen

File:Koushik Sen.jpeg

Koushik Sen

Andrew Chien

Andrew Chien

Andrew Chien

GVR

D-TEC