Actions

Traleika Glacier: Difference between revisions

From Modelado Foundation

imported>Jsstone1
No edit summary
 
(59 intermediate revisions by 5 users not shown)
Line 4: Line 4:
| imagecaption =   
| imagecaption =   
| team-members = [http://www.intel.com/ Intel], [https://www.reservoir.com/ Reservoir Labs], [http://www.etinternational.com/ ETI], [http://www.udel.edu/ UDEL], [http://www.ucsd.edu/ UC San Diego], [http://www.rice.edu/ Rice U.], [http://cs.illinois.edu/ UIUC], [http://www.pnnl.gov/ PNNL]
| team-members = [http://www.intel.com/ Intel], [https://www.reservoir.com/ Reservoir Labs], [http://www.etinternational.com/ ETI], [http://www.udel.edu/ UDEL], [http://www.ucsd.edu/ UC San Diego], [http://www.rice.edu/ Rice U.], [http://cs.illinois.edu/ UIUC], [http://www.pnnl.gov/ PNNL]
| pi = Shekhar Borkar (Intel)
| pi = [[Shekhar Borkar]]
| co-pi = Wilf Pinfold (Intel), Richard Lethin (Reservoir Labs), TBD (ETI), Guang Gao (UDEL), Laura Carrington (UC San Diego), Vivek Sarkar (Rice U.), David Padua (UIUC), Josep Torrellas (UIUC), John Feo (PNNL)
| co-pi = [[Wilfred Pinfold]], Richard Lethin (Reservoir Labs), Laura Carrington (UC San Diego), Vivek Sarkar (Rice U.), David Padua (UIUC), Josep Torrellas (UIUC), Andres Marquez (PNNL)
| website = [https://www.xstackwiki.com/index.php/Traleika_Glacier https://www.xstackwiki.com/index.php/Traleika_Glacier]
| website = [https://xstack.exascale-tech.com/wiki/index.php/Main_Page https://xstack.exascale-tech.com/wiki/index.php/Main_Page]
}}
}}


== Team Members ==
== Team Members ==
* [http://www.intel.com/ Intel:] Shekhar Borkar (PI); Hardware guidance, HW/SW co-design, resiliency, technical management
* [http://www.intel.com/ Intel:] Shekhar Borkar (PI); Hardware guidance, HW/SW co-design, resiliency, technical management
* [https://www.reservoir.com/ Reservoir Labs:] Richard Lethin (PI); Programming system, R-Stream, tools, optimization
* [https://www.reservoir.com/ Reservoir Labs:] Richard Lethin (PI); Programming system, R-Stream, tools, optimization
<!---------------------------------------------------------------------------------------------------------------------------------------------------
* [http://www.etinternational.com/ ET International (ETI):] PI TBD ; Simulators, execution model and runtime support
* [http://www.etinternational.com/ ET International (ETI):] PI TBD ; Simulators, execution model and runtime support
* [http://www.udel.edu/ University of Delaware (UDEL):] Guang Gao (PI); Execution model research
* [http://www.udel.edu/ University of Delaware (UDEL):] Guang Gao (PI); Execution model research
----------------------------------------------------------------------------------------------------------------------------------------------------->
* [http://www.ucsd.edu/ University of California, San Diego (UC San Diego):] Laura Carrington (PI); Applications
* [http://www.ucsd.edu/ University of California, San Diego (UC San Diego):] Laura Carrington (PI); Applications
* [http://www.rice.edu/ Rice University:] Vivek Sarkar (PI); Programming system, runtime system
* [http://www.rice.edu/ Rice University:] Vivek Sarkar (PI); Programming system, runtime system
* [http://cs.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC):] David Padua, Josep Torrellas (PIs); Programming system, Hierarchical Tiles Arrays (HTA), architecture, system architecture evaluation
* [http://cs.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC):] David Padua, Josep Torrellas (PIs); Programming system, Hierarchical Tiles Arrays (HTA), architecture, system architecture evaluation
* [http://www.pnnl.gov/ Pacific Northwest National Laboratory (PNNL):] John Feo (PI); Kernels and proxy apps for evaluation
* [http://www.pnnl.gov/ Pacific Northwest National Laboratory (PNNL):] Andres Marquez (PI); Kernels and proxy apps for evaluation


== Project Impact ==
*[https://xstackwiki.modelado.org/images/2/21/Traleika_Glacier_Impacts.pdf Traleika Glacier Project Impact]


== Goals and Objectives ==
== Goals and Objectives ==
Line 30: Line 33:


The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.
The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.


'''Objectives:'''
'''Objectives:'''
Line 37: Line 39:
* '''Data locality:''' PGM system & system SW optimize to reduce data movement
* '''Data locality:''' PGM system & system SW optimize to reduce data movement
* '''Scalability:''' SW components scalable, portable to O(109)—extreme parallelism
* '''Scalability:''' SW components scalable, portable to O(109)—extreme parallelism
* '''Programmability:''' New (Codelet) & legacy (MPI), with gentle slope for productivity
* '''Programmability:''' New (Asychronous) & legacy (MPI+OpenMP), with gentle slope for productivity
* '''Execution model:''' Objective function based, dynamic, global system optimization
* '''Execution model:''' Objective function based, dynamic, global system optimization
* '''Self-awareness:''' Dynamically respond to changing conditions and demands
* '''Self-awareness:''' Dynamically respond to changing conditions and demands
Line 43: Line 45:


== Status Reports ==
== Status Reports ==
* [[media:TG_X-Stack_Review_Top_2_20140401.pdf|''Traleika Glacier X-Stack Highlights'']], April 1, 2014
* [[media:TG_X-Stack_Review_Top_2_20140401.pdf| Traleika Glacier X-Stack Highlights]], April 1, 2014
* [[media:DE-SC0008717_TG_X-Stack_Status_Review_20140325.pdf|''Traleika Glacier X-Stack Status Review'']], March 25, 2014
* [[media:DE-SC0008717_TG_X-Stack_Status_Review_20140325.pdf| Traleika Glacier X-Stack Status Review]], March 25, 2014
* [[media:DE-SC0008717_TG_X-Stack_M5+6_Status_Report_20140312_Redacted.pdf|Traleika Glacier Year 2 Interim Status Report]], March 12, 2014
* [[media:DE-SC0008717_TG_X-Stack_M5+6_Status_Report_20140312_Redacted.pdf|Traleika Glacier Year 2 Interim Status Report]], March 12, 2014
* [[media:DE-SC0008717_TG_X-Stack_progress_report_20140523_Redacted.pdf|Traleika Glacier Year 2 Progress Report]], May 30, 2014
* [[media:DE-SC0008717_TG_X-Stack_M8_Status_Report_20140907.pdf|Traleika Glacier Milestone 8 Report]], September 1, 2014
* [[media:DE-SC0008717_TG_X-Stack_M9_Status_Report_20141208.pdf|Traleika Glacier Milestone 9 Report]], December 8, 2014
* [[media:DE-SC0008717_TG_X-Stack_M10_Status_Report.pdf|Traleika Glacier Milestone 10 Report]], March 2, 2015
* [[media:DE-SC0008717_TG_X-Stack_M11_Status_Report_20150602.pdf|Traleika Glacier Milestone 11 Report]], June 2, 2014


== Meetings and Presentations ==
=== Weekly Extreme Scale Deep-Dive: Schedule and Archive ===
* [https://eci.exascale-tech.com/wiki/index.php/Weekly_Technical_Review_Meeting Weekly Technical Review Meeting] (Tuesdays, 10-12 Pacific Time)
=== Co-Design Workshops (Newest to Oldest) ===
* [http://www.modelado.org/dynamic-runtime-community-project-review-fall-2016/ 7th Co-Design Project Review - September 27 - 29, 2016]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_5_-_September_29,_2015_-_October_1,_2015 Application Workshop 5 - September 29, 2015 - October 1, 2015]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_4_-_April_7-8,_2015 Application Workshop 4 - April 7-8, 2015]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_3_-_September_30,_2014_-_October_2,_2014 Application Workshop 3 - September 30, 2014 - October 2, 2014]
* [https://eci.exascale-tech.com/wiki/index.php/Application_Workshop_2_-_January_21-23,_2014 Application Workshop 2 - January 21-23, 2014]
== Traleika Glacier products ==
=== Research Products ===
* [https://xstack.exascale-tech.com/wiki/index.php/Traleika_Glacier_Research_Products Research Products]
=== Software Releases ===
* [https://xstack.exascale-tech.com/wiki/index.php/Traleika_Glacier_Software_Releases Software Releases]
<!---commented out - replaced with links to xstack public site  ******************************************
== Publications ==
== Publications ==


=== Intel ===
=== Intel ===
* Romain Cledat, Sagnak Tasirlar (Rice University) and Rob Knauerhase (Intel), ''Programmer Obliviousness is Bliss: Ideas for Runtime-Managed Granularity''. To be published at HotPar ’13, June 24, 2013, San Jose, CA - https://www.usenix.org/conference/hotpar13  
* ''Programmer Obliviousness is Bliss: Ideas for Runtime-Managed Granularity'', Romain Cledat, Sagnak Tasirlar (Rice University) and Rob Knauerhase (Intel). To be published at HotPar ’13, June 24, 2013, San Jose, CA - https://www.usenix.org/conference/hotpar13  
* Shekhar Borkar, ''How to stop interconnects from hindering the future of computing!'', Optical interconnects Conference, May 2013
* ''How to stop interconnects from hindering the future of computing!'', Shekhar Borkar, Optical interconnects Conference, May 2013
* Shekhar Borkar, ''Exascale Computing—a fact or a fiction?'', IPDPS, May 2013
* ''Exascale Computing—a fact or a fiction?'', Shekhar Borkar, IPDPS, May 2013
* ''Functional Simulator for Exascale System Research'', Romain Cledat (Intel), Josh Fryman (Intel), Ivan Ganev (Intel), Sam Kaplan (ETI), Rishi Khan (ETI), Asit Mishra (Intel), Bala Seshasayee (Intel), Ganesh Venkatesh (Intel), Dave Dunning (Intel), Shekhar Borkar (Intel), Workshop on Modeling & Simulation of Exascale Systems & Applications, September 18th-19th, 2013, University of Washington, Seattle, WA - http://hpc.pnl.gov/modsim/2013/
* ''Functional Simulator for Exascale System Research'', Romain Cledat (Intel), Josh Fryman (Intel), Ivan Ganev (Intel), Sam Kaplan (ETI), Rishi Khan (ETI), Asit Mishra (Intel), Bala Seshasayee (Intel), Ganesh Venkatesh (Intel), Dave Dunning (Intel), Shekhar Borkar (Intel), Workshop on Modeling & Simulation of Exascale Systems & Applications, September 18th-19th, 2013, University of Washington, Seattle, WA - http://hpc.pnl.gov/modsim/2013/


=== University of Delaware ===
=== Reservoir Labs ===
* ''An Implementation of the Codelet Model'', Joshua Suetterlein, Stephane Zuckerman, and Guang R. Gao, to be published in the proceedings of the 19th International European Conference on Parallel and Distributed Computing (EuroPar  2013), August 26-30,  Aachen, Germany.
* ''A Tale of Three Runtimes'', Nicolas Vasilache, Muthu Baskaran, Tom Henretty, Benoit Meister, M. Harper Langston, and Richard Lethin, submitted 5-Sep-14 in [http://arxiv.org/abs/1409.1914 arXiv.org]
* ''Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Gain Execution Models'', Chen Chen, Yao Wu, Stephane Zuckerman, and Guang R. Gao, to be published in Proceedings of 2013 Workshop on Multithreaded Architectures and Applications (MTAAP 2013). 27th IEEE International Parallel & Distributed Processing Symposium, May 24, Boston, MA, USA.
* [[media:Toward_a_Self-Aware_System_for_Exascale_Architectures_20130614.pdf|''Toward a Self-Aware System for Exascale Architectures'']], Aaron Myles Landwehr, Stephane Zuckerman, Guang R. Gao, CAPSL Technical Memo 123, June 2013.
* ''T2: ASAFESSS: A Scheduler-Driven Adaptive Framework for Extreme Scale Software Stacks'', St. John, T. et al, 4th International Workshop on Adaptive Self-tuning Computing Systems 2014, Vienna Austria. (Best paper award).
* [http://www.capsl.udel.edu/pub/doc/papers/LCPC2013.pdf|''Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture''], Elkin Garcia, Jaime Arteaga, Robert Pavel, and Guang R. Gao, in Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2013), Santa Clara, CA, September 25-27, 2013.


=== Rice University ===
=== Rice University ===
Line 72: Line 92:
* ''CDSC-GL: A CnC-inspired Graph Language'', Zoran Budimlic (Rice), Jason Cong (UCLA), Zhou Li (UCLA), Louis-Noel Pouchet (UCLA), Vivek Sarkar (Rice), Alina Sbirlea (Rice), Mo Xu (UCLA), Pen Zhang (UCLA). Abstract to appear in CnC'13 workshop, September 2013.
* ''CDSC-GL: A CnC-inspired Graph Language'', Zoran Budimlic (Rice), Jason Cong (UCLA), Zhou Li (UCLA), Louis-Noel Pouchet (UCLA), Vivek Sarkar (Rice), Alina Sbirlea (Rice), Mo Xu (UCLA), Pen Zhang (UCLA). Abstract to appear in CnC'13 workshop, September 2013.
* ''Bounded memory scheduling of dynamic task graphs'', Dragos Sbirlea, Zoran Budimlić, Vivek Sarkar, submitted to IPDPS 2014.
* ''Bounded memory scheduling of dynamic task graphs'', Dragos Sbirlea, Zoran Budimlić, Vivek Sarkar, submitted to IPDPS 2014.
* ''Isolation for Nested Task Parallelism'', Jisheng Zhao, Roberto Lublinerman, Zoran Budimlic, Swarat Chaudhuri, Vivek Sarkar, The 29th International Conference on the Object-Oriented Programming, System, Languages and Application (OOPSLA), October 2013.
* ''Bounded memory scheduling of dynamic task graphs'', Dragos Sbirlea, Zoran Budimlic, Vivek Sarkar, to appear in The 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT 2014).
* ''Expressing DOACROSS Loop Dependencies in OpenMP'', Jun Shirako, Priya Unnikrishnan, Sanjay Chatterjee, Kelvin Li, Vivek Sarkar, 9th International Workshop on OpenMP (IWOMP), September 2013.
* ''The Flexible Preconditions Model for Macro-Dataflow Execution'', Dragoș Sbîrlea, Alina Sbîrlea, Kyle B. Wheeler, Vivek Sarkar, The 3rd Data-Flow Execution Models for Extreme Scale Computing (DFM), September 2013.


=== Pacific Northwest National Lab ===
=== Pacific Northwest National Lab ===
* ''ACDT: Architected Composite Data Types Trading-in Unfettered Data Access for Improved Execution'', Marquez, A. et.al, submitted to the 23rd International ACM symposium on High Performance Parallel and Distributed Computing 2014, Vancouver Canada.
* ''ACDT: Architected Composite Data Types Trading-in Unfettered Data Access for Improved Execution'', Marquez, A. et.al, submitted to the 23rd International ACM symposium on High Performance Parallel and Distributed Computing 2014, Vancouver Canada.
=== University of Delaware ===
* ''An Implementation of the Codelet Model'', Joshua Suetterlein, Stephane Zuckerman, and Guang R. Gao, to be published in the proceedings of the 19th International European Conference on Parallel and Distributed Computing (EuroPar  2013), August 26-30,  Aachen, Germany.
* ''Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Gain Execution Models'', Chen Chen, Yao Wu, Stephane Zuckerman, and Guang R. Gao, to be published in Proceedings of 2013 Workshop on Multithreaded Architectures and Applications (MTAAP 2013). 27th IEEE International Parallel & Distributed Processing Symposium, May 24, Boston, MA, USA.
* [[media:Toward_a_Self-Aware_System_for_Exascale_Architectures_20130614.pdf|''Toward a Self-Aware System for Exascale Architectures'']][http://www.capsl.udel.edu/pub/doc/papers/rome13.pdf], Aaron Myles Landwehr, Stephane Zuckerman, Guang R. Gao, CAPSL Technical Memo 123, June 2013.
* [http://www.capsl.udel.edu/pub/doc/papers/LCPC2013.pdf|''Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture''], Elkin Garcia, Jaime Arteaga, Robert Pavel, and Guang R. Gao, in Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2013), Santa Clara, CA, September 25-27, 2013.
* ''Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading'', Jaime Arteaga, Stephane Zuckerman, Elkin Garcia, and Guang R. Gao, in Proceedings of Workshop on Multi-Threaded Architectures and Applications (MTAAP 2014), May 2014, Accepted.
* ''Runtime Systems for Extreme Scale Platforms'', Sanjay Chatterjee, PhD Thesis, December 2013.
=== Joint Publications ===
* ''Compiler Support for Software Cache Coherence'', Sanket Tavarageri, Wooil Kim, Josep Torrellas, and P Sadayappan Pacific Northwest National Labs (John Feo, Andres Marquez), submitted for publication.
* ''A Dynamic Schema to increase performance in Many-core Architectures through Percolation operations'', Elkin Garcia, Daniel Orozco, Rishi Khan, Ioannis Venetis, Kelly Livingston, and Guang Gao, in Proceedings of the 2013 IEEE International Conference on High Performance Computing (HiPC 2013), Hyderabad, India, December 18 - 21, 2013.
* ''ASAFESSS: A Scheduler-driven Adaptive Framework for Extreme Scale Software Stacks'', Tom St. John, Benoit Meister, Andres Marquez, Joseph B. Manzano, Guang R. Gao, and Xiaoming Li, in Proceedings of the 4th International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT'14); 9th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC'14), Vienna, Austria. January 20-22, 2014. Best Paper Award.
==Presentations and Other Collateral==
* [[media:Ocr-bof-slides_20121114.pdf|''Birds-of-a-Feather'']] session at SuperComputing12, November 14, 2012. See the OCR homepage at https://01.org/projects/open-community-runtime.
* [[media:TG_Overview_Carrington.pdf|''Traleika Glacier X-Stack Overview'']], presented by Laura Carrington (UCSD) at the Fourth ExaCT All Hands Meeting, Sandia National Laboratories, May 14, 2013
* [[media:OCR-SC13-BOF.pdf|''The Open Community Runtime Framework for Exascale Systems'']], Birds of a Feather Session, SC13, Denver, November 19, 2013, Vivek Sarkar (Rice), Rob Knauerhase (Intel), Rich Lethin (Reservoir Labs)
* [[media:OnePage_LULESH_to_CnC.pdf|''Experience developing CnC versions of DOE Applications'']] - Ellen Porter (PNNL), Kath Knobe (Intel), John Feo (PNNL) - 4/15/14
* [[media:2014-05-23-XS-PI-meeting-v4.pdf|''May 2014 PI Meeting'']]
* [[media:OCR-SC14-BOF.pdf|''Open Community Runtime (OCR) Framework for Extreme Scale Systems'']], Birds of a Feather Session, SC14, New Orleans, November 20, 2014, Vivek Sarkar (Rice), Barbara Chapman (U. Houston), William Gropp (U. Illinois)


=== CnC’13 workshop September, 2013 ===
=== CnC’13 workshop September, 2013 ===
Line 88: Line 134:


Note: Asterisked (*) presentations are supportive of the Traleika Glacier X-Stack strategic aims and objectives but not directly under the statement of work.  
Note: Asterisked (*) presentations are supportive of the Traleika Glacier X-Stack strategic aims and objectives but not directly under the statement of work.  
=== Joint Publications ===
* ''Compiler Support for Software Cache Coherence'', Sanket Tavarageri, Wooil Kim, Josep Torrellas, and P Sadayappan Pacific Northwest National Labs (John Feo, Andres Marquez), submitted for publication.
* ''ASAFESSS: A Scheduler-Driven Adaptive Framework for Extreme Scale Software Stacks'', St. John, T. et al, 4th International Workshop on Adaptive Self-tuning Computing Systems 2014, Vienna Austria. (Best paper award).
* ''A Dynamic Schema to increase performance in Many-core Architectures through Percolation operations'', Elkin Garcia, Daniel Orozco, Rishi Khan, Ioannis Venetis, Kelly Livingston, and Guang Gao, in Proceedings of the 2013 IEEE International Conference on High Performance Computing (HiPC 2013), Hyderabad, India, December 18 - 21, 2013.
* ''ASAFESSS: A Scheduler-driven Adaptive Framework for Extreme Scale Software Stacks'', Tom St. John, Benoit Meister, Andres Marquez, Joseph B. Manzano, Guang R. Gao, and Xiaoming Li, in Proceedings of the 4th International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT'14); 9th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC'14), Vienna, Austria. January 20-22, 2014. Best Paper Award.
==Presentations and Other Collateral==
* [[media:Ocr-bof-slides_20121114.pdf|Birds-of-a-Feather]] session at SuperComputing12, November 14, 2012. See the OCR homepage at https://01.org/projects/open-community-runtime.
* [[media:TG_Overview_Carrington.pdf|Traleika Glacier X-Stack Overview]], presented by Laura Carrington (UCSD) at the Fourth ExaCT All Hands Meeting, Sandia National Laboratories, May 14, 2013
* [http://www.cs.rice.edu/~vs3/PDF/OCR-SC13-BOF.pdf|''The Open Community Runtime Framework for Exascale Systems''], Birds of a Feather Session, SC13, Denver, November 19, 2013, Vivek Sarkar (Rice), Rob Knauerhase (Intel), Rich Lethin (Reservoir Labs)
* [[media:OnePage_LULESH_to_CnC.pdf|''Experience developing CnC versions of DOE Applications'']] - Ellen Porter (PNNL), Kath Knobe (Intel), John Feo (PNNL) - 4/15/14


=== About OCR - Open Community Runtime ===
=== About OCR - Open Community Runtime ===
Line 106: Line 139:
* [[OCR Module Policy Domain]] - snapshot as of May 1, 2014. For most recent version visit https://xstack.exascale-tech.com/wiki/index.php/OCR_Module_Policy_Domain
* [[OCR Module Policy Domain]] - snapshot as of May 1, 2014. For most recent version visit https://xstack.exascale-tech.com/wiki/index.php/OCR_Module_Policy_Domain


--------------------------->
== Scope of the Project ==
== Scope of the Project ==
[[File:TG-Scope.png|600px]]
[[File:TG-Scope.png|600px]]

Latest revision as of 04:52, July 10, 2023

Traleika Glacier X-Stack
Traleikaglacier.jpg
Team Members Intel, Reservoir Labs, ETI, UDEL, UC San Diego, Rice U., UIUC, PNNL
PI Shekhar Borkar
Co-PIs Wilfred Pinfold, Richard Lethin (Reservoir Labs), Laura Carrington (UC San Diego), Vivek Sarkar (Rice U.), David Padua (UIUC), Josep Torrellas (UIUC), Andres Marquez (PNNL)
Website https://xstack.exascale-tech.com/wiki/index.php/Main_Page
Download {{{download}}}

Team Members

Project Impact

Goals and Objectives

Goal: The Traleika Glacier X-Stack program will develop X-Stack software components in close collaboration with application specialists at the DOE co-design centers and with the best available knowledge of the Exascale systems we anticipate will be available in 2018/2020.

Description: Intel has built a straw-man hardware platform that embodies potential technology solutions to well understood challenges. This straw-man is implemented in the form of a simulator that will be used as a tool to test software components under investigation by Traleika team members. Co-design will be achieved by developing representative application components that stress software components and platform technologies and then use these stress tests to refine platform and software elements iteratively to an optimum solution. All software and simulator components will be developed in open source facilitating open cross team collaboration. The interface between the software components and the simulator will be built to facilitate back end replacement with current production architectures (MIC and Xeon) providing a broadly available software development vehicle and facilitating the integration of new tools and compilers conceived and developed under this proposal with existing environments like MPI, OpenMP, and OpenCL.

The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.

Objectives:

  • Energy efficiency: SW components interoperate, harmonize, exploit HW features, and optimize the system for energy efficiency
  • Data locality: PGM system & system SW optimize to reduce data movement
  • Scalability: SW components scalable, portable to O(109)—extreme parallelism
  • Programmability: New (Asychronous) & legacy (MPI+OpenMP), with gentle slope for productivity
  • Execution model: Objective function based, dynamic, global system optimization
  • Self-awareness: Dynamically respond to changing conditions and demands
  • Resiliency: Asymptotically provide reliability of N-modular redundancy using HW/SW co-design; HW detection, SW correction

Status Reports

Meetings and Presentations

Weekly Extreme Scale Deep-Dive: Schedule and Archive

Co-Design Workshops (Newest to Oldest)

Traleika Glacier products

Research Products

Software Releases

Scope of the Project

TG-Scope.png


Roadmap

TG-Roadmap.png


Architecture

Straw-man System Architecture and Evaluation

TG-Strawman-System.png


Data-locality and BW Tapering, Why So Important?

TG-Data-Locality.png


Programming and Execution Models

TG-Programming-Model.png

Programming model

  • Separation of concerns: Domain specification & HW mapping
  • Express data locality with hierarchical tiling
  • Global, shared, non-coherent address space
  • Optimization and auto generation of codelets (HW specific)

Execution model

  • Dataflow inspired, tiny codelets (self contained)
  • Dynamic, event-driven scheduling, non-blocking
  • Dynamic decision to move computation to data
  • Observation based adaption (self-awareness)
  • Implemented in the runtime environment

Separation of concerns

  • User application, control, and resource management


Programming System Components

TG-System-Components.png

Runtime

  • Different runtimes target different aspects
    • IRR: targeted for Intel Straw-man architecture
    • SWARM: runtime for a wide range of parallel machines
    • DAR3TS: explore codelet PXM using portable C++
    • Habanero-C: interfaces IRR, tie-in to CnC
  • All explore related aspects of the codelet Program Exec Model (PXM)
  • Goal: Converge towards Open Collaborative Runtime (OCR)
    • Enabling technology development for codelet execution
    • Model systems, foster novel runtime systems research
  • Greater visibility through SW stack -> efficient computing
    • Break OS/Runtime information firewall


Some Promising Results:

TG-Runtime-Results.png

Runtime Research Agenda

  • Locality aware scheduling—heuristics for locality/E-efficiency
    • Extensions to standard Habanero-C runtime
  • Adaptive boosting and idling of hardware
    • Avoid energy expensive unsuccessful steals that perform no work
    • Turbo mode for a core executing serial code
    • Fine grain resource (including energy) management
  • Dynamic data-block movement
    • Co-locate codelets and data
    • Move codelets to data
  • Introspection and dynamic optimization
    • Performance counters, sensors provide real time information
    • Optimization of the system for user defined objective
    • (Go beyond energy proportional computing)


Simulators and Tools

TG-Simulators-Tools.png


Simulators—what to expect and not

  • Evaluation of architecture features for PGM and EXE models
  • Relative comparison of performance, energy
  • Data movement patterns to memory and interconnect
  • Relative evaluation of resource management techniques

TG-Simulator-Expect-Not.png


Results Using Simulators

TG-Simulator-Results.png


Applications and HW-SW Codesign

TG-App-HW-Co-design.png


X-Stack Components

TG-XStack-Components.png


Metrics

TG-Metrics.png