March 20 2013 Technology Marketplace: Difference between revisions
From Modelado Foundation
imported>Rishi (Created page with "== Runtime Systems == == Compiler == == Languages and DSLs == == Resilience == == Auto-tuning and Learning == == Simulation ==") |
imported>Rishi No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
== Runtime Systems == | == Runtime Systems == | ||
=== Open Community Runtime: Zoran Budimlic === | |||
=== Intra-node MPI: Andrew Friedley === | |||
=== GASNet: Paul Hargrove === | |||
=== HPX: Hartmut Kaiser === | |||
=== SWARM: Rishi Khan === | |||
=== TASCEL: Sriram Krishnamoorthy === | |||
=== X10 Runtime: Olivier Tardieu === | |||
== Compiler == | == Compiler == | ||
=== LLNL (Greg Bronevetsky) === | |||
We are developing the Fuse symbolic dataflow analysis and abstract interpretation framework for the ROSE compiler. Fuse enables various separately-implemented analyses to leverage each others' results without accessing each others' internal abstractions by formalizing a common interface that transparently encodes the results of most analyses. This will make it possible to seamlessly combine different generic and domain-specific analyses to analyze to analyze a single application. | |||
=== LLNL (Chunhua "Leo" Liao) === | |||
We will demonstrate how to leverage ROSE's source-to-source translation API functions to create a prototype translator implementation OpenMP Accelerator Model (OpenMP extensions to support GPUs or alike). The translator will take into OpenMP programs with additional accelerator directives as input and generate CUDA codes. | |||
=== OSU (Saday Sadayappan) === | |||
The demo will present an automated compilation system for optimized multi-target code generation from an embedded DSL for stencil computations. The system uses precise data dependence analysis to perform transformations to generate high-performance C/OpenMP and CUDA code. PolyOpt, the polyhedral compilation engine for ROSE will also be demonstrated, including its capability to optimize affine C and Fortran programs for multi-core targets. | |||
=== Reservoir (Benoit Meister) === | |||
Reservoir will demonstrate the current R-Stream prototype capability to map loop codes to the Open Community Runtime (OCR). OCR is a standard API for event-driven tasks developed in the Traleika Glacier project. OCR supports asynchronous point-to-point synchronization, and its reference implementation relies on a locality-aware work-stealing scheduler. " | |||
=== Rice (John Mellor-Crummey) === | |||
Something on CAF2 (better description coming soon). | |||
=== MIT (Armando Solar-Lezama)=== | |||
This demo will be showing how sketch-based synthesis technology can facilitate writing efficient code in a clean and portable manner. I will a stencil as an example to demonstrate how programmers can explore different implementation ideas without worrying about the details. | |||
== Languages and DSLs == | == Languages and DSLs == | ||
=== Building Solver-Aided DSLs with Rosette (Emina Torlak, UC Berkeley) === | |||
We will present Rosette, a new framework for rapid design and prototyping of solver-aided DSLs. It accepts as input an interpreter (a virtual machine) that defines the programming model or a DSL and produces a translator from programs to constraints, facilitating constraint-based synthesis, verification, debugging and execution. We will demo shallow and deep embeddings of sample DSLs in Rosette. | |||
=== Andrew Lumsdaine, Indiana University === | |||
As part of its efforts supported by the X-Stack program, the XPRESS team is developing the eXascale ParalleX Intermediate form (XPI). Initially, XPI will provide a low-level API to HPX functionality that can be directly called as a library or used as a compiler target. However, XPI is not intrinsically tied to HPX -- many of the abstract concepts captured and exposed by XPI are common across the various approaches to exascale being explored under the auspices of the X-Stack program. An initial draft of the XPI specification was recently released. Representatives of the XPRESS team will be available at the technology marketplace to discuss XPI in depth. | |||
=== Armando Solar-Lezama, MIT === | |||
Armando will give a demo of the Sketch system for program synthesis using sketching. | |||
=== Michael Wilde, Argonne National Lab === | |||
Swift is a parallel scripting language for composing scientific applications that run on a large variety of parallel machines. | |||
== Resilience == | == Resilience == | ||
=== Data-oriented, User-controlled Resilience with Global View Resilience (Hajime Fujita) === | |||
Demonstration of a new programming model and tool for resilience, called Global View Resilience (GVR). We will show how GVR could be applied to existing applications, and demonstrate in our prototype how they enjoy the benefit of resilient array and error handling framework. | |||
=== Auto-tuning Resilience using Containment Domains (Erez Student) === | |||
Systems are becoming increasingly hierarchical and complex. At the same time, the inherent relative reliability of components and computation is potentially decreasing. This combination raises serious concerns about the ability to carry out efficient and correct computations at Exascale. We demonstrate how our initial research on the containment domains model enables us to reason about and auto-tune resilience, overcoming efficiency limits while maintaining reliability. We use an early version of the containment domains abstractions to concisely express some key resilience concerns throughout the system to enable new levels of flexibility in synthesizing resilience schemes and evaluating their impact. | |||
=== Resilience with Berkeley Lab's Checkpoint/Restart (Paul Roman) === | |||
In this session we will demonstrate Berkeley Lab's Checkpoint/Restart and discuss it's role as an enabling technology for resilience. We will also describe our strategy for building resilient PGAS | |||
applications with Containment Domains. | |||
=== Scalable Checkpoint Restart (Adam Moody) === | |||
Multi-level storage hierarchies can be used significantly reduce the overhead of traditional checkpoint-restart systems, but requires sophisticated management. We will demonstrate the SCR system. | |||
== Auto-tuning and Learning == | == Auto-tuning and Learning == | ||
=== CHiLL (Mary Hall) === | |||
CHiLL is a | |||
=== OpenTuner (Jason Ansel, Una-May Reilly, Saman Amarasinghe) === | |||
OpenTuner is a new framework for building domain-specific multi-objective program autotuners, it is not an autotuner by itself, but rather a toolbox to allow for the rapid construction of custom autotuners to fit a specific problem. OpenTuner supports fully customizable configuration representations, to allow for the representation of complex structures such as decision trees or data layouts; an extensible technique representation to allow for domain-specific techniques, such as existing heuristics or hand coded solutions; and an easy to use interface for communicating with the tuned program. WIth OpenTuner, we believe that a domain expert can build an effective autotuner that will use appropriate machine learning techniques with minimum effort. | |||
== Simulation == | == Simulation == | ||
=== Intel - FSim (Romain Cledat) === | |||
FSim is a functional simulator for the novel architecture being investigated by the Traleika Glacier XStack team. The architecture is based on a sea of small, efficient cores managed by larger ‘control’ cores, explicit memory hierarchies, a global address space and a lack of caches. In this demonstration, we will present the simulation infrastructure, the programming model as well as future directions of research. | |||
=== LLNL (Greg Bronevetsky) === | |||
We present an approach to predicting application performance on future hardware platforms that operates at full native hardware speeds. Our methodology is based on the insight that future systems will be similar to today’s systems but will provide fewer resources for each application thread. This means that today’s systems can simulate future ones if the amount of resources available to each thread were reduced. Our approach reduces the available resources of real hardware by using active interference workloads to selectively reduce the amount of cache capacity and bandwidth, as well as network bandwidth and latency that is available to the applications. This enables us to simulate future hardware at full speed and our demo will show the measurements available from our simulations and the validation techniques we use to ensure that our interference workloads consume a precise amount of each system resource. | |||
=== UIUC (Chunhua "Leo" Liao) === | |||
The demonstration will discuss some power management API functions developed at UIUC. These functions control the power state of hardware function units. Experienced developers can manually insert these functions into their code to reduce power consumption while maintaining performance. Source code annotations in form of pragmas are also designed to guide compilers to automatically insert the functions at the right places when certain resources are not being used. We will use a simple example to show how compilers and simulators can work together to leverage the API functions. |
Latest revision as of 21:59, March 21, 2013
Runtime Systems
Open Community Runtime: Zoran Budimlic
Intra-node MPI: Andrew Friedley
GASNet: Paul Hargrove
HPX: Hartmut Kaiser
SWARM: Rishi Khan
TASCEL: Sriram Krishnamoorthy
X10 Runtime: Olivier Tardieu
Compiler
LLNL (Greg Bronevetsky)
We are developing the Fuse symbolic dataflow analysis and abstract interpretation framework for the ROSE compiler. Fuse enables various separately-implemented analyses to leverage each others' results without accessing each others' internal abstractions by formalizing a common interface that transparently encodes the results of most analyses. This will make it possible to seamlessly combine different generic and domain-specific analyses to analyze to analyze a single application.
LLNL (Chunhua "Leo" Liao)
We will demonstrate how to leverage ROSE's source-to-source translation API functions to create a prototype translator implementation OpenMP Accelerator Model (OpenMP extensions to support GPUs or alike). The translator will take into OpenMP programs with additional accelerator directives as input and generate CUDA codes.
OSU (Saday Sadayappan)
The demo will present an automated compilation system for optimized multi-target code generation from an embedded DSL for stencil computations. The system uses precise data dependence analysis to perform transformations to generate high-performance C/OpenMP and CUDA code. PolyOpt, the polyhedral compilation engine for ROSE will also be demonstrated, including its capability to optimize affine C and Fortran programs for multi-core targets.
Reservoir (Benoit Meister)
Reservoir will demonstrate the current R-Stream prototype capability to map loop codes to the Open Community Runtime (OCR). OCR is a standard API for event-driven tasks developed in the Traleika Glacier project. OCR supports asynchronous point-to-point synchronization, and its reference implementation relies on a locality-aware work-stealing scheduler. "
Rice (John Mellor-Crummey)
Something on CAF2 (better description coming soon).
MIT (Armando Solar-Lezama)
This demo will be showing how sketch-based synthesis technology can facilitate writing efficient code in a clean and portable manner. I will a stencil as an example to demonstrate how programmers can explore different implementation ideas without worrying about the details.
Languages and DSLs
Building Solver-Aided DSLs with Rosette (Emina Torlak, UC Berkeley)
We will present Rosette, a new framework for rapid design and prototyping of solver-aided DSLs. It accepts as input an interpreter (a virtual machine) that defines the programming model or a DSL and produces a translator from programs to constraints, facilitating constraint-based synthesis, verification, debugging and execution. We will demo shallow and deep embeddings of sample DSLs in Rosette.
Andrew Lumsdaine, Indiana University
As part of its efforts supported by the X-Stack program, the XPRESS team is developing the eXascale ParalleX Intermediate form (XPI). Initially, XPI will provide a low-level API to HPX functionality that can be directly called as a library or used as a compiler target. However, XPI is not intrinsically tied to HPX -- many of the abstract concepts captured and exposed by XPI are common across the various approaches to exascale being explored under the auspices of the X-Stack program. An initial draft of the XPI specification was recently released. Representatives of the XPRESS team will be available at the technology marketplace to discuss XPI in depth.
Armando Solar-Lezama, MIT
Armando will give a demo of the Sketch system for program synthesis using sketching.
Michael Wilde, Argonne National Lab
Swift is a parallel scripting language for composing scientific applications that run on a large variety of parallel machines.
Resilience
Data-oriented, User-controlled Resilience with Global View Resilience (Hajime Fujita)
Demonstration of a new programming model and tool for resilience, called Global View Resilience (GVR). We will show how GVR could be applied to existing applications, and demonstrate in our prototype how they enjoy the benefit of resilient array and error handling framework.
Auto-tuning Resilience using Containment Domains (Erez Student)
Systems are becoming increasingly hierarchical and complex. At the same time, the inherent relative reliability of components and computation is potentially decreasing. This combination raises serious concerns about the ability to carry out efficient and correct computations at Exascale. We demonstrate how our initial research on the containment domains model enables us to reason about and auto-tune resilience, overcoming efficiency limits while maintaining reliability. We use an early version of the containment domains abstractions to concisely express some key resilience concerns throughout the system to enable new levels of flexibility in synthesizing resilience schemes and evaluating their impact.
Resilience with Berkeley Lab's Checkpoint/Restart (Paul Roman)
In this session we will demonstrate Berkeley Lab's Checkpoint/Restart and discuss it's role as an enabling technology for resilience. We will also describe our strategy for building resilient PGAS applications with Containment Domains.
Scalable Checkpoint Restart (Adam Moody)
Multi-level storage hierarchies can be used significantly reduce the overhead of traditional checkpoint-restart systems, but requires sophisticated management. We will demonstrate the SCR system.
Auto-tuning and Learning
CHiLL (Mary Hall)
CHiLL is a
OpenTuner (Jason Ansel, Una-May Reilly, Saman Amarasinghe)
OpenTuner is a new framework for building domain-specific multi-objective program autotuners, it is not an autotuner by itself, but rather a toolbox to allow for the rapid construction of custom autotuners to fit a specific problem. OpenTuner supports fully customizable configuration representations, to allow for the representation of complex structures such as decision trees or data layouts; an extensible technique representation to allow for domain-specific techniques, such as existing heuristics or hand coded solutions; and an easy to use interface for communicating with the tuned program. WIth OpenTuner, we believe that a domain expert can build an effective autotuner that will use appropriate machine learning techniques with minimum effort.
Simulation
Intel - FSim (Romain Cledat)
FSim is a functional simulator for the novel architecture being investigated by the Traleika Glacier XStack team. The architecture is based on a sea of small, efficient cores managed by larger ‘control’ cores, explicit memory hierarchies, a global address space and a lack of caches. In this demonstration, we will present the simulation infrastructure, the programming model as well as future directions of research.
LLNL (Greg Bronevetsky)
We present an approach to predicting application performance on future hardware platforms that operates at full native hardware speeds. Our methodology is based on the insight that future systems will be similar to today’s systems but will provide fewer resources for each application thread. This means that today’s systems can simulate future ones if the amount of resources available to each thread were reduced. Our approach reduces the available resources of real hardware by using active interference workloads to selectively reduce the amount of cache capacity and bandwidth, as well as network bandwidth and latency that is available to the applications. This enables us to simulate future hardware at full speed and our demo will show the measurements available from our simulations and the validation techniques we use to ensure that our interference workloads consume a precise amount of each system resource.
UIUC (Chunhua "Leo" Liao)
The demonstration will discuss some power management API functions developed at UIUC. These functions control the power state of hardware function units. Experienced developers can manually insert these functions into their code to reduce power consumption while maintaining performance. Source code annotations in form of pragmas are also designed to guide compilers to automatically insert the functions at the right places when certain resources are not being used. We will use a simple example to show how compilers and simulators can work together to leverage the API functions.