Actions

Compilers

From Modelado Foundation

Revision as of 20:01, May 7, 2014 by imported>Rbbrigh

Sonia requested that Dan Quinlan initiate this page. For comments, please contact Dan Quinlan. This page is still in development.

QUESTIONS XPRESS TG X-Stack DEGAS D-TEC DynAX X-TUNE GVR CORVETTE SLEEC PIPER
PI Ron Brightwell Shekhar Borkar Katherine Yelick Daniel Quinlan Guang Gao Mary Hall Andrew Chien Koushik Sen Milind Kulkarni Martin Schulz
Describe how you expect to require compiler support within your X-Stack project. Currently, the HPX/XPI approach is based on libraries being developed employing either the C or C++ compilers of which several are currently available. A future project, PXC, will develop an advanced compiler capability to fully represent the ParalleX execution model. But this is out of scope of XSTACK program. (TG) (DEGAS) (D-TEC)

Compilers have two purposes in the DynAX project, which are both about increasing productivity and programmability.

1- The first role is to automatically parallelize domain-specific applications from a sequential specification. This is what R-Stream does, as it takes sequential C and produce parallel, scalable SWARM code.

2- The second role is to expose high-level parallel programming abstractions. This goal is addressed at two levels: The HTA compiler, which relies on an explicitly parallel intermediate representation (PIL), generates SCALE code. The SCALE compiler, in turn, offers object-oriented programming and simplifies programming to SWARM.

We are developing source-to-source compiler transformations in our project. ROSE is used as an abstract syntax tree to our compiler and modeling software (CHiLL and PBound, respectively). We rely on native backend compilers to perform architecture-specific optimizations and generate SIMD code (SSE, AVX, etc.). Our transformations generate code that we anticipate will be easily vectorized.

(GVR) (CORVETTE) SLEEC PIPER does not have/need its own compiler, but we need access to compiler generated information that captures high-level semantics of the language implemented (basically a form of DWARF for DSLs). Additionally, advanced instrumentation or instrumentation point/markers could be useful. Information provided by different DSLs should be interoperable, ideally standardized and compatible with existing debug information provided by existing compilers
'Program analysis can be both challenging and require specialized expertise. What requirements do you have for program analysis and what level of expertise you expect to require? This problem could be posed in terms of what APIs for program analysis results do you expect? The XPRESS project is exploring the strengths and opportunities enabled through runtime control for dynamic adaptive resource management and task scheduling. Through this investigation some compile time information will be exposed as of importance to be conveyed to the compiler. It is expected that some compiler dataflow analysis of fine-grained parallelism will be required but this is typical now so doesn’t require new capabilities. (TG) (DEGAS) (D-TEC)

The PIL and SCALE compilers support parallel languages as their input (PIL supports Hierarchically Tiled Arrays for data-parallel programs, and SCALE accepts structured, object-oriented codelet programs).

The R-Stream compiler supports sequential C loops to which a set of writing rules (a "style") are applied by the programmer. The rules, which entail exposing enough static information to the compiler, are defined in the R-Stream user guide. Some pragmas are defined that allow the user to provide additional hints to the compiler. R-Stream relies on extensions of the polyhedral model to represent, analyze and transform programs.

At present, we have implemented the program analysis we need for our optimizations. (GVR) (CORVETTE) N/A PIPER mainly focuses on runtime performance analysis. This could benefit greatly from access to static information (loop structures, static call trees, data structures, ...), which should be provided by compilers in a standardized manner (through APIs or debug information encoded into binaries)
What types of hardware do you expect to address/target within optimizations and at what level of granularity of the program (e.g. coarse-grain, over multiple functions, or fine-grain within statements)? The ParalleX based approach exposes very coarse-grain, medium-grain (e.g., threads), and fine-grain dataflow (e.g., instruction level) parallelism in support of heterogeneous functional unit and memory hierarchy hardware structures. But it is also intended to inform future hardware design for greater efficiency mechanisms in support of system-wide operation for communication, global addresses, and dynamic execution. (TG) (DEGAS) (D-TEC)

We currently target clusters of x86 multicore nodes for our experiments, but we are considering other targets as well, such as Intel's straw man architecture. So far we have worked at function-level granularity, but optimizing across multiple functions is possible to some degree.

We are currently focused on two classes of processor: (1) cache coherent multi- and manycore CPU architectures including the Xeon Phi / MIC architecture; and, (2) NVIDIA GPU architectures which are coherent only within a thread block. The types of optimizations we perform include fusion across operators, wavefront parallelism, introducing ghost zones, and fine-grain rewriting of computations to improve SIMD and instruction-level parallelism. The fusion across operators could potentially be applied across functions.

(GVR) (CORVETTE) N/A all of the above
What general purpose languages do you expect to use and or extend to support your research work?" XPRESS contends that other than for purposes of support of legacy codes, there is no correct language for the future of exascale in spite of the ardent claims of many of the supporters for Fortran, C++, OpenMP, MPI, Chapel, X10, and a long list of others. While it is true that the ultimate language is LISP, only a few truly enlightened individuals are qualified to recognize this. (TG) (DEGAS) (D-TEC)

Both R-Stream and SWARM are based on C as their input language.

We currently support C, C++ and Fortran because these are supported by the ROSE frontend. (GVR) (CORVETTE) N/A PIPER components will be written in C/C++ plus scripting languages (mostly python). All components/tools will be applicable to a wide range of source languages or even binaries. The exact list of supported languages is tbd. and will depend on demand and progress on the overall exascale software stack.
Do you expect to use, require, or develop an Embedded DSL (defined by compiler support that would leverage semantics of abstractions defined completely within a general purpose base language) or an Extended DSL (define by compiler support that would leverage semantics of abstractions defined by new syntax)?" DSLs hold the promise of relieving the burden of coding for key applications or functionality classes and XPRESS expects to support these. However, it is required that such DSLs to exhibit exascale capability that their back-ends will have to be targeted to the HPX runtime system either directly or through offered interfaces like XPI or PXC. (TG) (DEGAS) (D-TEC)

The R-Stream approach to domain-specificity is to define a programming style and enable domain-specific annotations. The advantage of this approach is that the user still programs in C, and doesn't need to learn a new syntax.

We have developed SCALE as a general-purpose programming language within the domain of HPC. We have not conceived of any of the current SCALE features as being specific to a narrower domain than that.

We are developing domain-specific optimizations for geometric multigrid and stencil computations. These optimizations could be incorporated into a DSL for such applications, but

we are applying the optimizations directly to C and Fortran code. We are also developing a tool for expressing and optimizing tensor computations that could be considered a DSL.

(GVR) (CORVETTE) N/A Possibly as interface to query and analyze performance data, but unclear as of now
What generic and customized code transformations do you require to support your project?" (EXPRESS) (TG) (DEGAS) (D-TEC)

R-Stream supports a wide range of loop and data layout transformations, some of which are specific to stencil operations. These transformations mainly create data locality and parallelism at various levels of the target architecture.

We are developing the transformations ourselves.

(GVR) (CORVETTE) N/A Instrumentation
Which level of Intermediate Representation do you prefer to work with: source level, normalized middle level, or low level (close to binary code)? " Currently the selected intermediate representation is a source code XPI interface although lower level HPX library calls through C or C++ are also enabled. (TG) (DEGAS) (D-TEC)

We do not have a preference, but in our experience, users want to program at the highest possible level, while still being able to access low-level code and understand the parallelization process. Our source-to-source tools enable this.

As we are applying source-to-source transformations, we prefer an intermediate representation that is close to the source level.

(GVR) (CORVETTE) N/A Mostly does not apply to PIPER, but some elements of autotuning could use code transformation, e.g., to create multiple variants of the same code
Which parallel programming models (MPI, OpenMP, UPC, etc.) do you want to have better compiler support? " XPRESS support MPI and OpenMP. (TG) (DEGAS) (D-TEC)

None, although I'm not sure I fully understand the question (as of 5/5/14).

N/A

(GVR) (CORVETTE) N/A N/A
What OS configuration and hardware platforms do you want to run the compiler? " Initially its anticipated that a cross-compiler will be used running on a standard Linux platform and targeting the HPX environment. (TG) (DEGAS) (D-TEC)

PIL, R-Stream, SCALE and SWARM support Linux (complete list of tested distributions available).

Successful Mac OS uses of R-Stream were also reported (using Darwin), although it is not officially supported. R-Stream can also cross-compile to any platform as long as the native low-level compiler supports this feature.

For SCALE and SWARM, cross-compilation is done for Xeon Phi.

We are currently supporting Linux-based systems, and also Nvidia GPUs. (GVR) (CORVETTE) N/A N/A
How do you expect compilers to incorporate domain-specific information, through DSL, separated semantics-specification files, or other methods? " (EXPRESS) (TG) (DEGAS) (D-TEC)

As explained above, in R-Stream, a domain-specific formulation of the program is performed by combining style rules and annotations (#pragma). Semantics and syntax remain as well-defined as the underlying C language.

PIL is a framework to facilitate any-to-any compilation of parallel languages. There is no domain specific information, but will work for any language.


(X-TUNE) (GVR) (CORVETTE) N/A N/A
How do you expect compilers to interact with your libraries or runtime systems, if any ? " (EXPRESS) (TG) (DEGAS) (D-TEC)

R-Stream generates codes that includes calls to the target machine's runtimes/libraries for the purpose of parallelization and locality optimization. R-Stream also supports library calls within the input code.

Our compiler generates calls to run-time libraries to create threads. Currently, we support OpenMP and CUDA.

(GVR) (CORVETTE) N/A Yes: by providing additional debug/code to abstraction mapping information