Actions

Traleika Glacier: Difference between revisions

From Modelado Foundation

imported>Jsstone1
No edit summary
imported>Jsstone1
No edit summary
Line 20: Line 20:
* [http://cs.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC):] David Padua, Josep Torrellas (PIs); Programming system, Hierarchical Tiles Arrays (HTA), architecture, system architecture evaluation
* [http://cs.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC):] David Padua, Josep Torrellas (PIs); Programming system, Hierarchical Tiles Arrays (HTA), architecture, system architecture evaluation
* [http://www.pnnl.gov/ Pacific Northwest National Laboratory (PNNL):] John Feo (PI); Kernels and proxy apps for evaluation
* [http://www.pnnl.gov/ Pacific Northwest National Laboratory (PNNL):] John Feo (PI); Kernels and proxy apps for evaluation




Line 31: Line 30:


The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.
The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.


'''Objectives:'''
'''Objectives:'''
Line 41: Line 41:
* '''Self-awareness:''' Dynamically respond to changing conditions and demands
* '''Self-awareness:''' Dynamically respond to changing conditions and demands
* '''Resiliency:''' Asymptotically provide reliability of N-modular redundancy using HW/SW co-design; HW detection, SW correction
* '''Resiliency:''' Asymptotically provide reliability of N-modular redundancy using HW/SW co-design; HW detection, SW correction


== Publications ==
== Publications ==
Line 48: Line 49:
* Shekhar Borkar, ''How to stop interconnects from hindering the future of computing!'', Optical interconnects Conference, May 2013
* Shekhar Borkar, ''How to stop interconnects from hindering the future of computing!'', Optical interconnects Conference, May 2013
* Shekhar Borkar, ''Exascale Computing—a fact or a fiction?'', IPDPS, May 2013
* Shekhar Borkar, ''Exascale Computing—a fact or a fiction?'', IPDPS, May 2013


=== University of Delaware ===
=== University of Delaware ===
Line 57: Line 57:
=== Rice University ===
=== Rice University ===
* ''Integrating Asynchronous Task Parallelism with MPI''. Sanjay Chatterjee, Sağnak Taşırlar, Zoran Budimlić, Vincent Cavé, Millind Chabbi, Max Grossman, Yonghong Yan and Vivek Sarkar. 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), May 2013, Boston, MA.  
* ''Integrating Asynchronous Task Parallelism with MPI''. Sanjay Chatterjee, Sağnak Taşırlar, Zoran Budimlić, Vincent Cavé, Millind Chabbi, Max Grossman, Yonghong Yan and Vivek Sarkar. 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), May 2013, Boston, MA.  
* ''Compiler Optimization of an Application-specific Runtime'', Kath Knobe and Zoran Budimlić, CPC 2013: 17th Workshop on Compilers for Parallel Computing, July 3-5, 2013, Lyon, France. (to appear).
* ''Compiler Optimization of an Application-specific Runtime'', Kathleen Knobe (Intel) and Zoran Budimlić (Rice), CPC 2013: 17th Workshop on Compilers for Parallel Computing, July 3-5, 2013, Lyon, France. (to appear).
* ''Compiler Optimization of an Application-specific Runtime''. Kathleen Knobe (Intel) and Zoran Budimlic (Rice). In Compilers for Parallel Computers (CPC), July 2013.
* ''Compiler Optimization of an Application-specific Runtime''. Kathleen Knobe (Intel) and Zoran Budimlic (Rice). Abstract to appear in CnC'13 workshop, September 2013.
* ''The CnC tuning capability'', Sanjay Chatterjee (Rice), Zoran Budimlic (Rice), Vivek Sarkar (Rice), Kathleen Knobe (Intel).  Abstract to appear in CnC'13 workshop, September 2013.
* ''Automatic Selection of Distribution Functions for Distributed CnC'', Kamal Sharma (Rice), Kathleen Knobe (Intel), Frank Schlimbach (Intel), Vivek Sarkar (Rice).  Abstract to appear in CnC'13 workshop, September 2013.
* ''CnC on Open Community Runtime'', Alina Sbirlea (Rice) and Zoran Budimlic (Rice). Abstract to appear in CnC'13 workshop, September 2013.
* ''Bounded Memory Scheduling of CnC Programs'', Dragos Sbirlea (Rice), Zoran Budimlic (Rice) and Vivek Sarkar (Rice). Abstract to appear in CnC'13 workshop, September 2013.
• ''CDSC-GL: A CnC-inspired Graph Language'', Zoran Budimlic (Rice), Jason Cong (UCLA), Zhou Li (UCLA), Louis-Noel Pouchet (UCLA), Vivek Sarkar (Rice), Alina Sbirlea (Rice), Mo Xu (UCLA), Pen Zhang (UCLA). Abstract to appear in CnC'13 workshop, September 2013.
 
 


==Presentations==
==Presentations==

Revision as of 04:00, October 28, 2013

Traleika Glacier
Traleikaglacier.jpg
Team Members Intel, Reservoir Labs, ETI, UDEL, UC San Diego, Rice U., UIUC, PNNL
PI Shekhar Borkar (Intel)
Co-PIs Wilf Pinfold (Intel), Richard Lethin (Reservoir Labs), Rishi Khan (ETI), Guang Gao (UDEL), Laura Carrington (UC San Diego), Vivek Sarkar (Rice U.), David Padua (UIUC), Josep Torrellas (UIUC), John Feo (PNNL)
Website https://www.xstackwiki.com/index.php/Traleika_Glacier
Download {{{download}}}


Team Members


Goals and Objectives

Goal: The Traleika Glacier X-Stack program will develop X-Stack software components in close collaboration with application specialists at the DOE co-design centers and with the best available knowledge of the Exascale systems we anticipate will be available in 2018/2020.

Description: Intel has built a straw-man hardware platform that embodies potential technology solutions to well understood challenges. This straw-man is implemented in the form of a simulator that will be used as a tool to test software components under investigation by Traleika team members. Co-design will be achieved by developing representative application components that stress software components and platform technologies and then use these stress tests to refine platform and software elements iteratively to an optimum solution. All software and simulator components will be developed in open source facilitating open cross team collaboration. The interface between the software components and the simulator will be built to facilitate back end replacement with current production architectures (MIC and Xeon) providing a broadly available software development vehicle and facilitating the integration of new tools and compilers conceived and developed under this proposal with existing environments like MPI, OpenMP, and OpenCL.

The Traleika Glacier X-Stack team brings together strong technical expertise from across the exascale software stack. Utilizing applications of high interest to the DoE from five National Labs, coupled with software systems expertise from Reservoir Labs, ET International, the University of Illinois, University of California San Diego, University of Delaware, and Rice University, using a foundation of platform excellence from Intel. This project builds collaboration between many of the partners making this team uniquely capable of rapid progress. The research is not only expected to further the art in system software for high performance computing but also provide invaluable feedback thru the co-design loop for hardware design and application development. By breaking down research and development barriers between layers in the solution stack this collaboration and the open tools it produces will spur innovation for the next generation of high performance computing systems.


Objectives:

  • Energy efficiency: SW components interoperate, harmonize, exploit HW features, and optimize the system for energy efficiency
  • Data locality: PGM system & system SW optimize to reduce data movement
  • Scalability: SW components scalable, portable to O(109)—extreme parallelism
  • Programmability: New (Codelet) & legacy (MPI), with gentle slope for productivity
  • Execution model: Objective function based, dynamic, global system optimization
  • Self-awareness: Dynamically respond to changing conditions and demands
  • Resiliency: Asymptotically provide reliability of N-modular redundancy using HW/SW co-design; HW detection, SW correction


Publications

Intel

  • Romain Cledat, Sagnak Tasirlar (Rice University) and Rob Knauerhase (Intel), Programmer Obliviousness is Bliss: Ideas for Runtime-Managed Granularity. To be published at HotPar ’13, June 24, 2013, San Jose, CA - https://www.usenix.org/conference/hotpar13
  • Shekhar Borkar, How to stop interconnects from hindering the future of computing!, Optical interconnects Conference, May 2013
  • Shekhar Borkar, Exascale Computing—a fact or a fiction?, IPDPS, May 2013

University of Delaware

  • Joshua Suetterlein, Stephane Zuckerman, and Guang R. Gao, An Implementation of the Codelet Model. To be published in the proceedings of the 19th International European Conference on Parallel and Distributed Computing (EuroPar 2013), August 26-30, Aachen, Germany.
  • Chen Chen, Yao Wu, Stephane Zuckerman, and Guang R. Gao. Towards Memory-Load Balanced Fast Fourier Transformations in Fine-Gain Execution Models. To be published in Proceedings of 2013 Workshop on Multithreaded Architectures and Applications (MTAAP 2013). 27th IEEE International Parallel & Distributed Processing Symposium, May 24, Boston, MA, USA.
  • Aaron Myles Landwehr, Stephane Zuckerman, Guang R. Gao. Toward a Self-Aware System for Exascale Architectures. CAPSL Technical Memo 123, June 2013.

Rice University

  • Integrating Asynchronous Task Parallelism with MPI. Sanjay Chatterjee, Sağnak Taşırlar, Zoran Budimlić, Vincent Cavé, Millind Chabbi, Max Grossman, Yonghong Yan and Vivek Sarkar. 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), May 2013, Boston, MA.
  • Compiler Optimization of an Application-specific Runtime, Kathleen Knobe (Intel) and Zoran Budimlić (Rice), CPC 2013: 17th Workshop on Compilers for Parallel Computing, July 3-5, 2013, Lyon, France. (to appear).
  • Compiler Optimization of an Application-specific Runtime. Kathleen Knobe (Intel) and Zoran Budimlic (Rice). In Compilers for Parallel Computers (CPC), July 2013.
  • Compiler Optimization of an Application-specific Runtime. Kathleen Knobe (Intel) and Zoran Budimlic (Rice). Abstract to appear in CnC'13 workshop, September 2013.
  • The CnC tuning capability, Sanjay Chatterjee (Rice), Zoran Budimlic (Rice), Vivek Sarkar (Rice), Kathleen Knobe (Intel). Abstract to appear in CnC'13 workshop, September 2013.
  • Automatic Selection of Distribution Functions for Distributed CnC, Kamal Sharma (Rice), Kathleen Knobe (Intel), Frank Schlimbach (Intel), Vivek Sarkar (Rice). Abstract to appear in CnC'13 workshop, September 2013.
  • CnC on Open Community Runtime, Alina Sbirlea (Rice) and Zoran Budimlic (Rice). Abstract to appear in CnC'13 workshop, September 2013.
  • Bounded Memory Scheduling of CnC Programs, Dragos Sbirlea (Rice), Zoran Budimlic (Rice) and Vivek Sarkar (Rice). Abstract to appear in CnC'13 workshop, September 2013.

CDSC-GL: A CnC-inspired Graph Language, Zoran Budimlic (Rice), Jason Cong (UCLA), Zhou Li (UCLA), Louis-Noel Pouchet (UCLA), Vivek Sarkar (Rice), Alina Sbirlea (Rice), Mo Xu (UCLA), Pen Zhang (UCLA). Abstract to appear in CnC'13 workshop, September 2013.


Presentations

Intel

University of California San Diego

Traleika Glacier X-Stack Overview, presented by Laura Carrington (UCSD) at the Fourth ExaCT All Hands Meeting, Sandia National Laboratories, May 14, 2013


Scope of the Project

TG-Scope.png


Roadmap

TG-Roadmap.png


Architecture

Straw-man System Architecture and Evaluation

TG-Strawman-System.png


Data-locality and BW Tapering, Why So Important?

TG-Data-Locality.png


Programming and Execution Models

TG-Programming-Model.png

Programming model

  • Separation of concerns: Domain specification & HW mapping
  • Express data locality with hierarchical tiling
  • Global, shared, non-coherent address space
  • Optimization and auto generation of codelets (HW specific)

Execution model

  • Dataflow inspired, tiny codelets (self contained)
  • Dynamic, event-driven scheduling, non-blocking
  • Dynamic decision to move computation to data
  • Observation based adaption (self-awareness)
  • Implemented in the runtime environment

Separation of concerns

  • User application, control, and resource management


Programming System Components

TG-System-Components.png

Runtime

  • Different runtimes target different aspects
    • IRR: targeted for Intel Straw-man architecture
    • SWARM: runtime for a wide range of parallel machines
    • DAR3TS: explore codelet PXM using portable C++
    • Habanero-C: interfaces IRR, tie-in to CnC
  • All explore related aspects of the codelet Program Exec Model (PXM)
  • Goal: Converge towards Open Collaborative Runtime (OCR)
    • Enabling technology development for codelet execution
    • Model systems, foster novel runtime systems research
  • Greater visibility through SW stack -> efficient computing
    • Break OS/Runtime information firewall


Some Promising Results:

TG-Runtime-Results.png

Runtime Research Agenda

  • Locality aware scheduling—heuristics for locality/E-efficiency
    • Extensions to standard Habanero-C runtime
  • Adaptive boosting and idling of hardware
    • Avoid energy expensive unsuccessful steals that perform no work
    • Turbo mode for a core executing serial code
    • Fine grain resource (including energy) management
  • Dynamic data-block movement
    • Co-locate codelets and data
    • Move codelets to data
  • Introspection and dynamic optimization
    • Performance counters, sensors provide real time information
    • Optimization of the system for user defined objective
    • (Go beyond energy proportional computing)


Simulators and Tools

TG-Simulators-Tools.png


Simulators—what to expect and not

  • Evaluation of architecture features for PGM and EXE models
  • Relative comparison of performance, energy
  • Data movement patterns to memory and interconnect
  • Relative evaluation of resource management techniques

TG-Simulator-Expect-Not.png


Results Using Simulators

TG-Simulator-Results.png


Applications and HW-SW Codesign

TG-App-HW-Co-design.png


X-Stack Components

TG-XStack-Components.png


Metrics

TG-Metrics.png