HiHAT SW Stack: Difference between revisions

Revision as of 20:19, February 3, 2017

This page is dedicated to describing possible components of the HiHAT SW Stack.

A link to get back up to the parent page is here.

First, let's outline the general approach of HiHAT:

Create low-level abstractions that expose the goodness of HW platforms
- A bare-bones, minimal "common layer" that is as thin, light and low-overhead as it can possibly be, e.g. it makes almost no decisions and does almost no look-ups. This layer may not be very human usable.
- A richer, more usable "user layer" that may have more overheads, that is layered on top of the common layer
Push functionality that is common across HW platforms above the common and user layers. Focus on offering building blocks, services and transforms that would be used by various runtimes that are built on top of them, rather than on trying to get everyone to unify on some single such runtime, which most agree is futile.
The primary focus for this effort is on supporting tasking runtimes, but the platform-retargetable layers may be relevant to runtimes that have nothing to do with tasking. We are encouraged to tag contributions in this effort as to whether they are applicable beyond just tasking.
The scope of the HiHAT effort spans on-node and cross-node management. We are encouraged to tag issues are to pertaining to one of {on-node only, cross-node only, or both} where appropriate.

Next, let's consider what functionality may be platform specific, that would want to be abstracted by the user and common layers, and how that would be nested. This functionality can be grouped under the head of actions, which have 4 kinds:

Compute: map work to underlying computing resources.
- Ex: OpenMP, TBB, QThreads, Argobots
Data movement: move data from a source to a sink, where these may be memory of different kinds, different layers, different NUMA domains, or different address domains.
1. Local
  - Direct memory writes and memcpy-like APIs
  - DMAs
2. Remote
  - High-level interfaces like MPI, UPC, *SHMEM
  - Mid- and lower-level interfaces like UCX, libfabrics
Data management: operations on memory include allocate, free, pin, materialize, and annotate with various metadata properties
- May include allocation from one or more pools, e.g. different pools for different memory kinds
- May include use of standard libraries like libnuma, or proprietary drivers
Synchronization: provide completion handles on actions, new actions that combine other completion conditions and/or induce other kinds of dependences

We'll need to agree on a layering architecture, and interfaces for each of these

There may be additional platform-specific functionalities of interest

Fast queues (Carter Edwards)

And there may be other components that are related, e.g. because one or more components in this stack interface with them, but that are not part of this hierarchy

Resource management (Stephen Olivier)

We will also want to begin to flesh out what each target would need, under the thin common layer, and what key aspects and components of the glue code between that common layer and the target would be.

The diagram below suggests one possible arrangement.

@@ Line 32: / Line 32: @@
 And there may be other components that are related, e.g. because one or more components in this stack interface with them, but that are not part of this hierarchy
 * Resource management (Stephen Olivier)
+We will also want to begin to flesh out what each target would need, under the thin common layer, and what key aspects and components of the glue code between that common layer and the target would be.
 The diagram below suggests one possible arrangement.
 [[File:HiHAT_Diagram.jpg|center|600px|caption="Possible HiHAT architecture"]]

HiHAT SW Stack: Difference between revisions

From Modelado Foundation

Revision as of 20:19, February 3, 2017