DSL's: Difference between revisions

Revision as of 02:38, May 7, 2014

Sonia requested that Saman Amarasinghe and Dan Quinlan initiate this page. For comments, please contact them. This page is still in development.

X-Stack Project	Name of the DSL	URL	Target domain	Miniapps supported	Front-end technology used	Internal representation used	Key Optimizations performed	Code generation technology used	Processors computing models targeted	Current status	Summary of the best results	Interface for perf.&dbg. tools
D-TEC	Halide	http://halide-lang.org	Image processing algorithms	Cloverleaf, miniGMG, boxlib	Uses C++	Custom IR	Stencil optimizations (fusion, blocking, parallelization, vectorization) Schedules can produce all levels of locality, parallelism and redundant computation. OpenTuner for automatic schedule generation.	LLVM	X86 multicores, Arm and GPU	Working system. Used by Google and Adobe.	Local laplacian filter: Adobe top engineer took 3 months and 1500 loc to get 10x over original. Halide in 1-day, 60 lines 20x faster. In addition 90x faster GPU code in the same day (Adobe did not even try GPUs). Also, all the pictures taken by google glass is processed using a Halide pipeline.
DTEC	Shared Memory DSL	http://rosecompiler.org	MPI HPC applications on many core nodes	Internal LLNL App	Uses C (maybe C++ and Fortran in future)	ROSE IR	Shared memory optimization for MPI processes on many core architectures permits sharing large data structures between processes to reduce memory requirements per core.	ROSE + any vendor compiler	Many core architectures with local shared memory	Implementation released (4/28/2014)	Being evaluated for use
D-TEC	Heterogeneous OpenMP	http://rosecompiler.org/	HPC applications running on NVIDIA GPUs	boxlib, internal kernels	Uses C and C++	ROSE IR (AST)	loop collapse to expose more parallelism, Hardware-aware thread/block configuration, data reuse to reduce data transfer, round-robin loop scheduling to reduce memory footprint	ROSE source-to-source + NVIDIA CUDA compiler	NVIDIA GPUs	Implementation released with ROSE (4/29/2014)	Matches or outperforms caparable compilers targeting GPUs.
D-TEC	NUMA DSL	http://rosecompiler.org	HPC applications on NUMA-support many core CPU	internal LLNL App	Uses C++	ROSE IR	NUMA-aware data distribution to enhance data locality and avoid long memory latency. Multiple halo exchanging schemes for stencil codes using structured grid.	ROSE + libnuma support	Many core architecture with NUMA hierarchy	implementation in progress.	1.7x performance improvement compared to OpenMP implementation for 2D 2nd order stencil computation.
	D-TEC	OpenACC	https://github.com/tristanvdb/OpenACC-to-OpenCL-Compiler	Accelerated computing	Not yet.	C (possible C++ and Fortran). Pragma parser for ROSE.	ROSE IR	Uses on tiling to map parallel loops to OpenCL	ROSE (with OpenCL kernel generation backend), OpenCL C Compiler (LLVM)	Any accelerator with OpenCL support (CPUs, GPUs, XeonPhi, ...)	- Basic kernel generation - Directives parsing - Runtime tested on Nividia GPUs, Intel CPUs, and Intel XeonPhi	Reaches ~50 Gflops on Tesla M2070 on matrix multiply. (M2070: ~1Tflops peaks, ~200 to ~400 Gflops effective on linear algebra ; all floating point).
DSL 6
DSL 7
DSL 8

@@ Line 41: / Line 41: @@
 |Implementation released (4/28/2014)
 |Being evaluated for use
+|
 |- style="vertical-align:top;"
 |''D-TEC
@@ Line 54: / Line 55: @@
 | Implementation released with ROSE (4/29/2014)
 | Matches or outperforms caparable compilers targeting GPUs.
+|
 |- style="vertical-align:top;"
 |'' D-TEC
@@ Line 68: / Line 70: @@
 | 1.7x performance improvement compared to OpenMP implementation for 2D 2nd order stencil computation.
 |- style="vertical-align:top;"
+|
 |''D-TEC
 |OpenACC
@@ Line 80: / Line 83: @@
 | - Basic kernel generation - Directives parsing - Runtime tested on Nividia GPUs, Intel CPUs, and Intel XeonPhi
 | Reaches ~50 Gflops on Tesla M2070 on matrix multiply. (M2070: ~1Tflops peaks, ~200 to ~400 Gflops effective on linear algebra ; all floating point).
+|
 |- style="vertical-align:top;"
 |''DSL 6
+|
 |
 |
@@ Line 95: / Line 100: @@
 |- style="vertical-align:top;"
 |''DSL 7
+|
 |
 |
@@ Line 108: / Line 114: @@
 |- style="vertical-align:top;"
 |''DSL 8
+|
 |
 |

DSL's: Difference between revisions

From Modelado Foundation

Revision as of 02:38, May 7, 2014