Scientific Libraries

From Modelado Foundation

Sonia requested that Milind Kulkarni initiate this page. For comments, please contact Milind.

XPRESS Ron Brightwell
TG Shekhar Borkar
DEGAS Katherine Yelick
D-TEC Daniel Quinlan
DynAX Guang Gao
X-TUNE Mary Hall
GVR Andrew Chien
CORVETTE Koushik Sen
SLEEC Milind Kulkarni
PIPER Martin Schulz


Describe how you expect to target (optimize/analyze) applications written using existing computational libraries
XPRESS Libraries written in MPI with C will run on XPRESS systems using UH libraries combined with ParalleX XPI/HPX interoperability interfaces. It is expected that future or important libraries will be developed employing new execution methods/interfaces.
TG OCR scheduler will optimize execution of code generated by R-Stream.
D-TEC Where appropriate library abstractions will be provided with compiler support (typically for finer granularity abstractions at an expression or statement level). Source-to-source transformations will rewrite the code to leverage abstraction semantics and program analysis used to identify the restricted contexts to support the generation of the most efficient code. Fundamentally, libraries can't see how their abstractions are used within an application were as the compiler can do so readily and use such information to generate tailored code.
DynAX We are focusing on ways to identify scalable and resilient data access and movement patterns, and express them efficiently in task-based runtimes. For computational libraries which do not already provide such semantics, alternative means must be found. (For instance, a LAPACK SVD call can be replaced with a distributed, more scalable equivalent.)
X-TUNE Work on autotuning to select among code variants could be applied to libraries that provide multiple implementations of the same computation. The key idea is to build a model for variant selection based on features of input data, and use this to make run-time selection decisions.
PIPER Support optimization efforts with tools that can capture some of the internal semantics of a given library (e.g., levels of multigrid V cycle or patches in an AMR library)
Many computational libraries (e.g., Kokkos in Trilinos) provide support for managing data distribution and communication. Describe how your project targets applications that use such libraries.
XPRESS This issue is unresolved
TG The OCR tuning hints framework can be used for user directed management of data and communication.
D-TEC We expect to leverage existing libraries and runtime systems (most commonly implemented as libraries) as needed. The X10 runtime system will be used, for example, to abstract communication between distributed memory processors. Other communication libraries (e.g. MPI) are being use to both simplify the generation of code by the compiler and leverage specific semantics that can, with program analysis, be used to rewrite application code to make it more efficient and/or leverage specific Exascale hardware features.
DynAX Such libraries often have their own system/runtime requirements. If those requirements line up with the requirements of the application, no further adaptation is necessary. Otherwise, such a library could possibly be used through some form of adaptation layer, or the algorithm could simply be ported to run on the necessary software stack, directly. This demonstrates a need for interoperability, which we feel is an area that needs to be explored further.
X-TUNE There is an opportunity to apply autotuning to such decisions.
PIPER PIPER will provide stack wide instrumentation to facilitate optimization - access to internal information only known to the library should be exported to tools through appropriate APIs (preferably through similar and interoperable APIs)
If your project aims to develop new programming models, describe any plans to integrate existing computational libraries into the model, or how you will transition applications written using such libraries to your model.
XPRESS Low-level system oriented libraries such as STDIO will be employed by the LXK and HPX systems among others. No scientific libraries per say will be built in the systems as intrinsics below the compiler level. Over time many libraries will be ported to the ParalleX model for dramatic improvements in efficiency and scalability.
TG R-Stream compiler
D-TEC Our research work supports more of how to build the compiler support for programming models than a focus on a specific DSL or programming model. However, specific work relative to MPI is leveraging MPI semantics to rewrite application code (via compiler source-to-source transformations) to better overlap communication and computation. This is done as one of many building blocks from which to construct DSLs that would implement numerous programming models. Other work on how to leverage semantics in existing HPC code is targeting the rewriting of the code to target both single and multiple GPUs per node, this work leverages several OpenMP runtime libraries.
DynAX The HTA programming model may result in new generation of computational libraries.
What sorts of properties (semantics of computation, information about data usage, etc.) would you find useful to your project if captured by computational libraries?
XPRESS Libraries crafted in a form that eliminated global barriers, worked on globally addressed objects, and exploited message driven computation would greatly facilitate the porting of conventional rigid models to future dynamic adaptive and scalable models such as the ParalleX based methods.
TG Affinities, priorities, accuracy expectations, critical/non-critical tasks and data.
D-TEC Libraries should present simple user level API's with clear semantics. A relatively course level of granularity of semantics is required to avoid library use from contributing to abstraction penalties. Appropriate properties for libraries are data handling specific to Adaptive Mesh Refinement, data management associated with many-core optimizations, etc. Actual use or fine-grain access of data abstractions via libraries can be a problem for general purpose compilers to optimize.
DynAX Wide availability / compatibility with multiple runtimes would help reduce effort. The ability to tune performance not only for a single library call, but across the application as a whole, would be beneficial.

Algorithms vary widely in their data access patterns, and this means that, for a particular algorithm, some data distributions are much more suitable than others. An application developer may have full control of the data's distribution before the application calls the computational library, but has no idea what data access pattern the library uses internally, and therefore, performance is lost by rearranging data unnecessarily. Some feedback from the library would be helpful for preventing that kind of performance loss.

X-TUNE Affinity information would be helpful.
PIPER Attribution information about internal data structures (e.g., data distributions, patch information for AMR) as well as phase/time slice information