Compilers
From Modelado Foundation
Sonia requested that Dan Quinlan initiate this page. For comments, please contact Dan Quinlan. This page is still in development.
QUESTIONS | XPRESS | TG X-Stack | DEGAS | D-TEC | DynAX | X-TUNE | GVR | CORVETTE | SLEEC | PIPER |
---|---|---|---|---|---|---|---|---|---|---|
PI | Ron Brightwell | Shekhar Borkar | Katherine Yelick | Daniel Quinlan | Guang Gao | Mary Hall | Andrew Chien | Koushik Sen | Milind Kulkarni | Martin Schulz |
Describe how you expect to require compiler support within your X-Stack project. | Currently, the HPX/XPI approach is based on libraries being developed employing either the C or C++ compilers of which several are currently available. A future project, PXC, will develop an advanced compiler capability to fully represent the ParalleX execution model. But this is out of scope of XSTACK program. | (TG) | (DEGAS) | Compiler support plays an essential role in the D-TEC project.
We use compilers to accept programs written in DSL, analyze, transform and optimize the programs. A backend compiler is also needed to generate the final executable on a target platform. To support flexible definition of DSLs, a compiler infrastructure is needed to facilitate adding extensions to base languages or defining a totally new languages. A DSL may be transformed from a high level format to a low level form. The compiler should be able to maintain the mapping between different levels so we can relate high level semantics and low level performance metrics. |
Compilers have two purposes in the DynAX project, which are both about increasing productivity and programmability. 1- The first role is to automatically parallelize domain-specific applications from a sequential specification. This is what R-Stream does, as it takes sequential C and produce parallel, scalable SWARM code. 2- The second role is to expose high-level parallel programming abstractions. This goal is addressed at two levels: The HTA compiler, which relies on an explicitly parallel intermediate representation (PIL), generates SCALE code. The SCALE compiler, in turn, offers object-oriented programming and simplifies programming to SWARM. |
We are developing source-to-source compiler transformations in our project. ROSE is used as an abstract syntax tree to our compiler and modeling software (CHiLL and PBound, respectively). We rely on native backend compilers to perform architecture-specific optimizations and generate SIMD code (SSE, AVX, etc.). Our transformations generate code that we anticipate will be easily vectorized. |
(GVR) | (CORVETTE) | SLEEC | PIPER does not have/need its own compiler, but we need access to compiler generated information that captures high-level semantics of the language implemented (basically a form of DWARF for DSLs). Additionally, advanced instrumentation or instrumentation point/markers could be useful. Information provided by different DSLs should be interoperable, ideally standardized and compatible with existing debug information provided by existing compilers |
'Program analysis can be both challenging and require specialized expertise. What requirements do you have for program analysis and what level of expertise you expect to require? This problem could be posed in terms of what APIs for program analysis results do you expect? | The XPRESS project is exploring the strengths and opportunities enabled through runtime control for dynamic adaptive resource management and task scheduling. Through this investigation some compile time information will be exposed as of importance to be conveyed to the compiler. It is expected that some compiler dataflow analysis of fine-grained parallelism will be required but this is typical now so doesn’t require new capabilities. | (TG) | (DEGAS) | We would like to have access to a range of baseline compiler analyses through simple API functions.
Typical examples are control flow analysis, data flow analysis, and dependence analysis. In addition, we have to have extensible versions of these baseline analyses so they can be applied to new DSLs. We also need domain-specific analyses which can take advantages of domain knowledge in each DSL. Ideally, program analysis support should be able to leverage users input, through annotations or semantic-specification files. |
The PIL and SCALE compilers support parallel languages as their input (PIL supports Hierarchically Tiled Arrays for data-parallel programs, and SCALE accepts structured, object-oriented codelet programs). The R-Stream compiler supports sequential C loops to which a set of writing rules (a "style") are applied by the programmer. The rules, which entail exposing enough static information to the compiler, are defined in the R-Stream user guide. Some pragmas are defined that allow the user to provide additional hints to the compiler. R-Stream relies on extensions of the polyhedral model to represent, analyze and transform programs. |
At present, we have implemented the program analysis we need for our optimizations. | (GVR) | (CORVETTE) | N/A | PIPER mainly focuses on runtime performance analysis. This could benefit greatly from access to static information (loop structures, static call trees, data structures, ...), which should be provided by compilers in a standardized manner (through APIs or debug information encoded into binaries) |
What types of hardware do you expect to address/target within optimizations and at what level of granularity of the program (e.g. coarse-grain, over multiple functions, or fine-grain within statements)? | The ParalleX based approach exposes very coarse-grain, medium-grain (e.g., threads), and fine-grain dataflow (e.g., instruction level) parallelism in support of heterogeneous functional unit and memory hierarchy hardware structures. But it is also intended to inform future hardware design for greater efficiency mechanisms in support of system-wide operation for communication, global addresses, and dynamic execution. | (TG) | (DEGAS) | We expect that the future extreme-scale computers will be heterogeneous node architectures connected with network connections.
An example heterogeneous node architecture is a multicore shared memory machine with a NVIDIA GPU accelerator with a separated memory space. We want to exploit coarse-grain parallelism of a program first, then incrementally take advantage of finer-grain parallelism and map them to the proper levels of hardware features. |
We currently target clusters of x86 multicore nodes for our experiments, but we are considering other targets as well, such as Intel's straw man architecture. So far we have worked at function-level granularity, but optimizing across multiple functions is possible to some degree. |
We are currently focused on two classes of processor: (1) cache coherent multi- and manycore CPU architectures including the Xeon Phi / MIC architecture; and, (2) NVIDIA GPU architectures which are coherent only within a thread block. The types of optimizations we perform include fusion across operators, wavefront parallelism, introducing ghost zones, and fine-grain rewriting of computations to improve SIMD and instruction-level parallelism. The fusion across operators could potentially be applied across functions. |
(GVR) | (CORVETTE) | N/A | all of the above |
What general purpose languages do you expect to use and or extend to support your research work?" | XPRESS contends that other than for purposes of support of legacy codes, there is no correct language for the future of exascale in spite of the ardent claims of many of the supporters for Fortran, C++, OpenMP, MPI, Chapel, X10, and a long list of others. While it is true that the ultimate language is LISP, only a few truly enlightened individuals are qualified to recognize this. | (TG) | (DEGAS) | We are interested in a wide range of generic transformations, such as loop transformations, instrumentation, GPU code generation, and data structure transformation.
We also want to have a compiler infrastructure which provides easy code transformation APIs so we can add customized, domain-specific transformations. |
Both R-Stream and SWARM are based on C as their input language. |
We currently support C, C++ and Fortran because these are supported by the ROSE frontend. | (GVR) | (CORVETTE) | N/A | PIPER components will be written in C/C++ plus scripting languages (mostly python). All components/tools will be applicable to a wide range of source languages or even binaries. The exact list of supported languages is tbd. and will depend on demand and progress on the overall exascale software stack. |
Do you expect to use, require, or develop an Embedded DSL (defined by compiler support that would leverage semantics of abstractions defined completely within a general purpose base language) or an Extended DSL (define by compiler support that would leverage semantics of abstractions defined by new syntax)?" | DSLs hold the promise of relieving the burden of coding for key applications or functionality classes and XPRESS expects to support these. However, it is required that such DSLs to exhibit exascale capability that their back-ends will have to be targeted to the HPX runtime system either directly or through offered interfaces like XPI or PXC. | (TG) | (DEGAS) | We want to support all flavours of DSLs. |
The R-Stream approach to domain-specificity is to define a programming style and enable domain-specific annotations. The advantage of this approach is that the user still programs in C, and doesn't need to learn a new syntax. We have developed SCALE as a general-purpose programming language within the domain of HPC. We have not conceived of any of the current SCALE features as being specific to a narrower domain than that. |
We are developing domain-specific optimizations for geometric multigrid and stencil computations. These optimizations could be incorporated into a DSL for such applications, but
we are applying the optimizations directly to C and Fortran code. We are also developing a tool for expressing and optimizing tensor computations that could be considered a DSL. |
(GVR) | (CORVETTE) | N/A | Possibly as interface to query and analyze performance data, but unclear as of now |
What generic and customized code transformations do you require to support your project?" | (EXPRESS) | (TG) | (DEGAS) | We are interested in a wide range of generic transformations, such as loop transformations, instrumentation, GPU code generation, and data structure transformation.
We also want to have a compiler infrastructure which provides easy code transformation APIs so we can add customized, domain-specific transformations. |
R-Stream supports a wide range of loop and data layout transformations, some of which are specific to stencil operations. These transformations mainly create data locality and parallelism at various levels of the target architecture. |
We are developing the transformations ourselves. |
(GVR) | (CORVETTE) | N/A | Instrumentation |
Which level of Intermediate Representation do you prefer to work with: source level, normalized middle level, or low level (close to binary code)? " | Currently the selected intermediate representation is a source code XPI interface although lower level HPX library calls through C or C++ are also enabled. | (TG) | (DEGAS) | Within D-TEC, we want to support all levels of intermediate representations to fully support the analysis, optimization and code generation of DSLs. |
We do not have a preference, but in our experience, users want to program at the highest possible level, while still being able to access low-level code and understand the parallelization process. Our source-to-source tools enable this. |
As we are applying source-to-source transformations, we prefer an intermediate representation that is close to the source level. |
(GVR) | (CORVETTE) | N/A | Mostly does not apply to PIPER, but some elements of autotuning could use code transformation, e.g., to create multiple variants of the same code |
Which parallel programming models (MPI, OpenMP, UPC, etc.) do you want to have better compiler support? " | XPRESS support MPI and OpenMP. | (TG) | (DEGAS) | We are interested in MPI and OpenMP. Ideally, the compiler should be aware of MPI function calls and have OpenMP implementation. |
None, although I'm not sure I fully understand the question (as of 5/5/14). |
N/A |
(GVR) | (CORVETTE) | N/A | N/A |
What OS configuration and hardware platforms do you want to run the compiler? " | Initially its anticipated that a cross-compiler will be used running on a standard Linux platform and targeting the HPX environment. | (TG) | (DEGAS) | Linux is our main focus OS. Target platforms may use Intel/AMD x86 multicore machines with NVIDIA GPUs. |
PIL, R-Stream, SCALE and SWARM support Linux (complete list of tested distributions available). Successful Mac OS uses of R-Stream were also reported (using Darwin), although it is not officially supported. R-Stream can also cross-compile to any platform as long as the native low-level compiler supports this feature. For SCALE and SWARM, cross-compilation is done for Xeon Phi. |
We are currently supporting Linux-based systems, and also Nvidia GPUs. | (GVR) | (CORVETTE) | N/A | N/A |
How do you expect compilers to incorporate domain-specific information, through DSL, separated semantics-specification files, or other methods? " | (EXPRESS) | (TG) | (DEGAS) | We want to support DSL, annotations, and separated semantics-specification files. |
As explained above, in R-Stream, a domain-specific formulation of the program is performed by combining style rules and annotations (#pragma). Semantics and syntax remain as well-defined as the underlying C language. PIL is a framework to facilitate any-to-any compilation of parallel languages. There is no domain specific information, but will work for any language.
|
(X-TUNE) | (GVR) | (CORVETTE) | N/A | N/A |
How do you expect compilers to interact with your libraries or runtime systems, if any ? " | (EXPRESS) | (TG) | (DEGAS) | The compiler will generate tasks (kernels) to be executed at runtime. It will also transform and partition data in programs so a NUMA-aware runtime library can be used. |
R-Stream generates codes that includes calls to the target machine's runtimes/libraries for the purpose of parallelization and locality optimization. R-Stream also supports library calls within the input code. |
Our compiler generates calls to run-time libraries to create threads. Currently, we support OpenMP and CUDA. |
(GVR) | (CORVETTE) | N/A | Yes: by providing additional debug/code to abstraction mapping information |