X-ARCC: Difference between revisions
From Modelado Foundation
imported>Shofmeyr No edit summary |
imported>Shofmeyr No edit summary |
||
Line 22: | Line 22: | ||
* What OS abstractions are needed for the durable QoS-guaranteed storage that is essential to resilience? | * What OS abstractions are needed for the durable QoS-guaranteed storage that is essential to resilience? | ||
== Tessellation == | |||
The Tessellation kernel is a lightweight, hypervisor-like layer that provides | |||
support for ARCC. It implements cells, along with interfaces for user-level | |||
scheduling, resource adaptation and cell composition. Since the software in | |||
cells runs entirely in user space, the kernel can enforce resource allocations, | |||
e.g. CPU cores, memory pages, without specialized virtualization hardware, but | |||
enforcing resource allocations to some resources, such as processor cache slices | |||
and memory bandwidth, requires additional hardware support | |||
(e.g.~\cite{akesson07,lee08memqos,sanchez11}). Tessellation is written from | |||
scratch and the prototype runs on both Intel x86 and RAMP architectures. | |||
=== The Cell Model === | |||
Cells provide the basic unit of computation and protection in Tessellation. | |||
Cells are performance-isolated resource containers that export their resources | |||
to user level. The software running within each cell has full user-level control | |||
of the resources assigned to the cell, free from interference from other cells. | |||
Application programmers can customize the cell's runtime for their application | |||
domains with, for instance, a particular CPU core-scheduling algorithm, or a | |||
novel page replacement policy. | |||
The performance isolation of cells is achieved through \emph{space-time | |||
partitioning}~\cite{rushby99,tess09,lei03}, a multiplexing technique that | |||
divides the hardware into a sequence of simultaneously resident spatial | |||
partitions, as shown in Figure~\ref{fig:tess-arch}. Cells can either have | |||
temporally dedicated access to their resources, or be time-multiplexed with | |||
other cells, and, depending on the spatial partitioning, both time-multiplexed | |||
and non-multiplexed cells may be active simultaneously. | |||
Time multiplexing is implemented using | |||
\textit{gang-scheduling}~\cite{gangsched1982,gangschedpatent}, to ensure that | |||
cells provide their hosted applications an environment that is similar to a | |||
dedicated machine. The kernel implements gang-scheduling in a decentralized | |||
manner through a set of kernel multiplexer threads (\emph{muxers}), one per | |||
hardware thread in the system. The muxers implement the same scheduling | |||
algorithm and rely on a high-precision global time-base to simultaneously | |||
activate a cell on multiple hardware threads with minimum skew. In the common | |||
case the muxers do not need to communicate since they replicate the scheduling | |||
decisions of all relevant other muxers; however, the muxers will communicate via | |||
IPI multicast in certain cases, e.g., when cells are created or terminated, when | |||
resource allocations change, etc. | |||
Applications in Tessellation that span multiple cells communicate via efficient | |||
and secure \emph{channels}. Channels provide fast, user-level asynchronous | |||
message-passing between cells. Applications use channels to access standard OS | |||
services (e.g. network and file services) hosted in other cells. New | |||
\emph{composite} services are constructed from OS services by wrapping a cell | |||
around existing resources and exporting a service interface. Tessellation | |||
can support QoS in this \emph{service-oriented architecture}~\cite{soa2007} | |||
because the stable environment of the cell is easily combined with a custom | |||
user-level scheduler to provide QoS-aware access to the resulting service. | |||
With such QoS-aware access providing reproducible service times, applications | |||
in cells experience better performance predictability for autotuning, but | |||
without sacrificing flexibility in job placement for optimized system usage. | |||
Revision as of 03:47, January 9, 2014
X-ARCC: Exascale Adaptive Resource Centric Computing with Tesselation | |
---|---|
Team Members | UC Berkeley, LBNL |
PI | Steven Hofmeyr (LBNL) |
Co-PIs | John Kubiatowicz (UCB) |
Website | http://tessellation.cs.berkeley.edu |
Download |
Overview
We are exploring new approaches to Operating System (OS) design for exascale using Adaptive Resource-Centric Computing (ARCC). The fundamental basis of ARCC is dynamic resource allocation for adaptive assignment of resources to applications, combined with Quality-of-Service (QoS) enforcement to prevent interference between components. We have embodied ARCC in Tessellation, an OS designed for multicore nodes. In this project, our goal is to explore the potential for ARCC to address issues in exascale systems. This will require extending Tessellation with new features for multiple nodes, such as multi-node cell synchronization, distributed resource accounting, and topology-aware resource control. Rather than emphasizing component development for an exascale OS, we are focusing our efforts on high-risk, high-reward topics related to novel OS mechanisms and designs.
There are several aspects we are exploring in the context of a multinode Tessellation:
- What OS support is needed for new global address space programming models and task-based parallel programming models? To explore this, we are porting UPC and Habanero to run on multinode Tessellation, using GASNet as the underlying communication layer. Our test-cases for these runtimes on Tessellation are a subset of the Co-design proxy apps, which we use as representatives of potential exascale applications.
- How should the OS support advanced memory management, including mechanisms for user-level paging, locality-aware memory allocation and multicell shared memory?
- How do we extend hierarchical adaptive resource allocation and control across multiple nodes, including heterogeneous nodes such as the Intel Mic?
- How should the OS manage the trade-off between power and performance optimizations? Will the Tessellation approach of treating both power and other resources (cores, memory) as first class citizens within the adaptive loop be adequate?
- What OS abstractions are needed for the durable QoS-guaranteed storage that is essential to resilience?
Tessellation
The Tessellation kernel is a lightweight, hypervisor-like layer that provides support for ARCC. It implements cells, along with interfaces for user-level scheduling, resource adaptation and cell composition. Since the software in cells runs entirely in user space, the kernel can enforce resource allocations, e.g. CPU cores, memory pages, without specialized virtualization hardware, but enforcing resource allocations to some resources, such as processor cache slices and memory bandwidth, requires additional hardware support (e.g.~\cite{akesson07,lee08memqos,sanchez11}). Tessellation is written from scratch and the prototype runs on both Intel x86 and RAMP architectures.
The Cell Model
Cells provide the basic unit of computation and protection in Tessellation. Cells are performance-isolated resource containers that export their resources to user level. The software running within each cell has full user-level control of the resources assigned to the cell, free from interference from other cells. Application programmers can customize the cell's runtime for their application domains with, for instance, a particular CPU core-scheduling algorithm, or a novel page replacement policy.
The performance isolation of cells is achieved through \emph{space-time
partitioning}~\cite{rushby99,tess09,lei03}, a multiplexing technique that
divides the hardware into a sequence of simultaneously resident spatial partitions, as shown in Figure~\ref{fig:tess-arch}. Cells can either have temporally dedicated access to their resources, or be time-multiplexed with other cells, and, depending on the spatial partitioning, both time-multiplexed and non-multiplexed cells may be active simultaneously.
Time multiplexing is implemented using \textit{gang-scheduling}~\cite{gangsched1982,gangschedpatent}, to ensure that cells provide their hosted applications an environment that is similar to a dedicated machine. The kernel implements gang-scheduling in a decentralized manner through a set of kernel multiplexer threads (\emph{muxers}), one per hardware thread in the system. The muxers implement the same scheduling algorithm and rely on a high-precision global time-base to simultaneously activate a cell on multiple hardware threads with minimum skew. In the common case the muxers do not need to communicate since they replicate the scheduling decisions of all relevant other muxers; however, the muxers will communicate via IPI multicast in certain cases, e.g., when cells are created or terminated, when resource allocations change, etc.
Applications in Tessellation that span multiple cells communicate via efficient and secure \emph{channels}. Channels provide fast, user-level asynchronous message-passing between cells. Applications use channels to access standard OS services (e.g. network and file services) hosted in other cells. New \emph{composite} services are constructed from OS services by wrapping a cell around existing resources and exporting a service interface. Tessellation can support QoS in this \emph{service-oriented architecture}~\cite{soa2007} because the stable environment of the cell is easily combined with a custom user-level scheduler to provide QoS-aware access to the resulting service. With such QoS-aware access providing reproducible service times, applications in cells experience better performance predictability for autotuning, but without sacrificing flexibility in job placement for optimized system usage.