Actions

Scientific Libraries: Difference between revisions

From Modelado Foundation

imported>Jsstone1
No edit summary
imported>Rlawton
No edit summary
Line 21: Line 21:
|(DEGAS)
|(DEGAS)
|(D-TEC) Where appropriate library abstractions will be provided with compiler support (typically for finer granularity abstractions at an expression or statement level).  Source-to-source transformations will rewrite the code to leverage abstraction semantics and program analysis used to identify the restricted contexts to support the generation of the most efficient code.  Fundamentally, libraries can't see how their abstractions are used within an application were as the compiler can do so readily and use such information to generate tailored code.
|(D-TEC) Where appropriate library abstractions will be provided with compiler support (typically for finer granularity abstractions at an expression or statement level).  Source-to-source transformations will rewrite the code to leverage abstraction semantics and program analysis used to identify the restricted contexts to support the generation of the most efficient code.  Fundamentally, libraries can't see how their abstractions are used within an application were as the compiler can do so readily and use such information to generate tailored code.
|(DynAX)
|We are focusing on ways to identify scalable and resilient data access and movement patterns, and express them efficiently in task-based runtimes.  For computational libraries which do not already provide such semantics, alternative means must be found. (For instance, a LAPACK SVD call can be replaced with a distributed, more scalable equivalent.)
|(X-TUNE)
|(X-TUNE)
|(GVR)   
|(GVR)   
Line 33: Line 33:
|(DEGAS)
|(DEGAS)
|(D-TEC) We expect to leverage existing libraries and runtime systems (most commonly implemented as libraries) as needed.  The X10 runtime system will be used, for example, to abstract communication between distributed memory processors.  Other communication libraries (e.g. MPI) are being use to both simplify the generation of code by the compiler and leverage specific semantics that can, with program analysis, be used to rewrite application code to make it more efficient and/or leverage specific Exascale hardware features.
|(D-TEC) We expect to leverage existing libraries and runtime systems (most commonly implemented as libraries) as needed.  The X10 runtime system will be used, for example, to abstract communication between distributed memory processors.  Other communication libraries (e.g. MPI) are being use to both simplify the generation of code by the compiler and leverage specific semantics that can, with program analysis, be used to rewrite application code to make it more efficient and/or leverage specific Exascale hardware features.
|(DynAX)
|Such libraries often have their own system/runtime requirements. If those requirements line up with the requirements of the application, no further adaptation is necessary.  Otherwise, such a library could possibly be used through some form of adaptation layer, or the algorithm could simply be ported to run on the necessary software stack, directly.  This demonstrates a need for interoperability, which we feel is an area that needs to be explored further.
|(X-TUNE)
|(X-TUNE)
|(GVR)  
|(GVR)  
Line 45: Line 45:
|(DEGAS)
|(DEGAS)
|(D-TEC) Our research work supports more of how to build the compiler support for programming models than a focus on a specific DSL or programming model. However, specific work relative to MPI is leveraging MPI semantics to rewrite application code (via compiler source-to-source transformations) to better overlap communication and computation.  This is done as one of many building blocks from which to construct DSLs that would implement numerous programming models.  Other work on how to leverage semantics in existing HPC code is targeting the rewriting of the code to target both single and multiple GPUs per node, this work leverages several OpenMP runtime libraries.
|(D-TEC) Our research work supports more of how to build the compiler support for programming models than a focus on a specific DSL or programming model. However, specific work relative to MPI is leveraging MPI semantics to rewrite application code (via compiler source-to-source transformations) to better overlap communication and computation.  This is done as one of many building blocks from which to construct DSLs that would implement numerous programming models.  Other work on how to leverage semantics in existing HPC code is targeting the rewriting of the code to target both single and multiple GPUs per node, this work leverages several OpenMP runtime libraries.
|(DynAX)
|The HTA programming model may result in new generation of computational libraries.
 
|(X-TUNE)
|(X-TUNE)
|(GVR)  
|(GVR)  
Line 57: Line 58:
|(DEGAS)
|(DEGAS)
|(D-TEC) Libraries should present simple user level API's with clear semantics. A relatively course level of granularity of semantics is required to avoid library use from contributing to abstraction penalties.  Appropriate properties for libraries are data handling specific to Adaptive Mesh Refinement, data management associated with many-core optimizations, etc. Actual use or fine-grain access of data abstractions via libraries can be a problem for general purpose compilers to optimize.  
|(D-TEC) Libraries should present simple user level API's with clear semantics. A relatively course level of granularity of semantics is required to avoid library use from contributing to abstraction penalties.  Appropriate properties for libraries are data handling specific to Adaptive Mesh Refinement, data management associated with many-core optimizations, etc. Actual use or fine-grain access of data abstractions via libraries can be a problem for general purpose compilers to optimize.  
|(DynAX)
|Wide availability / compatibility with multiple runtimes would help reduce effort.  The ability to tune performance not only for a single library call, but across the application as a whole, would be beneficial.
 
Algorithms vary widely in their data access patterns, and this means that, for a particular algorithm, some data distributions are much more suitable than others.  An application developer may have full control of the data's distribution before the application calls the computational library, but has no idea what data access pattern the library uses internally, and therefore, performance is lost by rearranging data unnecessarily.  Some feedback from the library would be helpful for preventing that kind of performance loss.
|(X-TUNE)
|(X-TUNE)
|(GVR)  
|(GVR)  

Revision as of 13:30, May 12, 2014

Sonia requested that Milind Kulkarni initiate this page. For comments, please contact Milind.

QUESTIONS XPRESS TG X-Stack DEGAS D-TEC DynAX X-TUNE GVR CORVETTE SLEEC PIPER
PI Ron Brightwell Shekhar Borkar Katherine Yelick Daniel Quinlan Guang Gao Mary Hall Andrew Chien Koushik Sen Milind Kulkarni Martin Schulz
Describe how you expect to target (optimize/analyze) applications written using existing computational libraries Libraries written in MPI with C will run on XPRESS systems using UH libraries combined with ParalleX XPI/HPX interoperability interfaces. It is expected that future or important libraries will be developed employing new execution methods/interfaces. OCR scheduler will optimize execution of code generated by R-Stream. (DEGAS) (D-TEC) Where appropriate library abstractions will be provided with compiler support (typically for finer granularity abstractions at an expression or statement level). Source-to-source transformations will rewrite the code to leverage abstraction semantics and program analysis used to identify the restricted contexts to support the generation of the most efficient code. Fundamentally, libraries can't see how their abstractions are used within an application were as the compiler can do so readily and use such information to generate tailored code. We are focusing on ways to identify scalable and resilient data access and movement patterns, and express them efficiently in task-based runtimes. For computational libraries which do not already provide such semantics, alternative means must be found. (For instance, a LAPACK SVD call can be replaced with a distributed, more scalable equivalent.) (X-TUNE) (GVR) (CORVETTE) SLEEC Support optimization efforts with tools that can capture some of the internal semantics of a given library (e.g., levels of multigrid V cycle or patches in an AMR library)
Many computational libraries (e.g., Kokkos in Trilinos) provide support for managing data distribution and communication. Describe how your project targets applications that use such libraries. This issue is unresolved. The OCR tuning hints framework can be used for user directed management of data and communication. (DEGAS) (D-TEC) We expect to leverage existing libraries and runtime systems (most commonly implemented as libraries) as needed. The X10 runtime system will be used, for example, to abstract communication between distributed memory processors. Other communication libraries (e.g. MPI) are being use to both simplify the generation of code by the compiler and leverage specific semantics that can, with program analysis, be used to rewrite application code to make it more efficient and/or leverage specific Exascale hardware features. Such libraries often have their own system/runtime requirements. If those requirements line up with the requirements of the application, no further adaptation is necessary. Otherwise, such a library could possibly be used through some form of adaptation layer, or the algorithm could simply be ported to run on the necessary software stack, directly. This demonstrates a need for interoperability, which we feel is an area that needs to be explored further. (X-TUNE) (GVR) (CORVETTE) N/A PIPER will provide stack wide instrumentation to facilitate optimization - access to internal information only known to the library should be exported to tools through appropriate APIs (preferably through similar and interoperable APIs)
If your project aims to develop new programming models, describe any plans to integrate existing computational libraries into the model, or how you will transition applications written using such libraries to your model. Low-level system oriented libraries such as STDIO will be employed by the LXK and HPX systems among others. No scientific libraries per say will be built in the systems as intrinsics below the compiler level. Over time many libraries will be ported to the ParalleX model for dramatic improvements in efficiency and scalability. R-Stream compiler (DEGAS) (D-TEC) Our research work supports more of how to build the compiler support for programming models than a focus on a specific DSL or programming model. However, specific work relative to MPI is leveraging MPI semantics to rewrite application code (via compiler source-to-source transformations) to better overlap communication and computation. This is done as one of many building blocks from which to construct DSLs that would implement numerous programming models. Other work on how to leverage semantics in existing HPC code is targeting the rewriting of the code to target both single and multiple GPUs per node, this work leverages several OpenMP runtime libraries. The HTA programming model may result in new generation of computational libraries. (X-TUNE) (GVR) (CORVETTE) N/A N/A
What sorts of properties (semantics of computation, information about data usage, etc.) would you find useful to your project if captured by computational libraries? Libraries crafted in a form that eliminated global barriers, worked on globally addressed objects, and exploited message driven computation would greatly facilitate the porting of conventional rigid models to future dynamic adaptive and scalable models such as the ParalleX based methods. Affinities, priorities, accuracy expectations, critical/non-critical tasks and data. (DEGAS) (D-TEC) Libraries should present simple user level API's with clear semantics. A relatively course level of granularity of semantics is required to avoid library use from contributing to abstraction penalties. Appropriate properties for libraries are data handling specific to Adaptive Mesh Refinement, data management associated with many-core optimizations, etc. Actual use or fine-grain access of data abstractions via libraries can be a problem for general purpose compilers to optimize. Wide availability / compatibility with multiple runtimes would help reduce effort. The ability to tune performance not only for a single library call, but across the application as a whole, would be beneficial.

Algorithms vary widely in their data access patterns, and this means that, for a particular algorithm, some data distributions are much more suitable than others. An application developer may have full control of the data's distribution before the application calls the computational library, but has no idea what data access pattern the library uses internally, and therefore, performance is lost by rearranging data unnecessarily. Some feedback from the library would be helpful for preventing that kind of performance loss.

(X-TUNE) (GVR) (CORVETTE) N/A Attribution information about internal data structures (e.g., data distributions, patch information for AMR) as well as phase/time slice information