Actions

DSL's: Difference between revisions

From Modelado Foundation

imported>Dquinlan
No edit summary
imported>ChunhuaLiao
No edit summary
Line 41: Line 41:
|Being evaluated for use
|Being evaluated for use
|- style="vertical-align:top;"
|- style="vertical-align:top;"
|''DSL 3
|''D-TEC
|
| Heterogeneous OpenMP
|
| http://rosecompiler.org/
|
| HPC applications running on NVIDIA GPUs
|
| boxlib, internal kernels
|
| Uses C and C++
|
| ROSE IR (AST)
|
| loop collapse to expose more parallelism, Hardware-aware thread/block configuration, data reuse to reduce data transfer, round-robin loop scheduling to reduce memory footprint
|
| ROSE source-to-source + NVIDIA CUDA compiler
|
| NVIDIA GPUs
|
| Implementation released with ROSE (4/29/2014)
|
| Matches or outperforms caparable compilers targeting GPUs.
|- style="vertical-align:top;"
|- style="vertical-align:top;"
|''DSL 4
|''DSL 4

Revision as of 22:31, April 29, 2014

Sonia requested that Saman Amarasinghe and Dan Quinlan initiate this page. For comments, please contact them. This page is still in development.

X-Stack Project Name of the DSL URL Target domain Miniapps supported Front-end technology used Internal representation used Key Optimizations performed Code generation technology used Processors computing models targeted Current status Summary of the best results
D-TEC Halide http://halide-lang.org Image processing algorithms Cloverleaf, miniGMG, boxlib Uses C++ Custom IR Stencil optimizations (fusion, blocking, parallelization, vectorization) Schedules can produce all levels of locality, parallelism and redundant computation. OpenTuner for automatic schedule generation. LLVM X86 multicores, Arm and GPU Working system. Used by Google and Adobe. Local laplacian filter: Adobe top engineer took 3 months and 1500 loc to get 10x over original. Halide in 1-day, 60 lines 20x faster. In addition 90x faster GPU code in the same day (Adobe did not even try GPUs). Also, all the pictures taken by google glass is processed using a Halide pipeline.
DTEC Shared Memory DSL http://rosecompiler.org MPI HPC applications on many core nodes Internal LLNL App Uses C (maybe C++ and Fortran in future) ROSE IR Shared memory optimization for MPI processes on many core architectures permits sharing large data structures between processes to reduce memory requirements per core. ROSE + any vendor compiler Many core architectures with local shared memory Implementation released (4/28/2014) Being evaluated for use
D-TEC Heterogeneous OpenMP http://rosecompiler.org/ HPC applications running on NVIDIA GPUs boxlib, internal kernels Uses C and C++ ROSE IR (AST) loop collapse to expose more parallelism, Hardware-aware thread/block configuration, data reuse to reduce data transfer, round-robin loop scheduling to reduce memory footprint ROSE source-to-source + NVIDIA CUDA compiler NVIDIA GPUs Implementation released with ROSE (4/29/2014) Matches or outperforms caparable compilers targeting GPUs.
DSL 4
DSL 5
DSL 6
DSL 7
DSL 8