Actions

April 2016 OCR Demos

From Modelado Foundation

Videos

This page contains the videos of OCR feature demonstrations, as well as steps to follow to reproduce the demos. The user can try out hands-on versions of these demos by downloading the software as outlined here

Demo videos:

OCR Portability

OCR has been developed for a variety of platforms. In this demo, we see the OCR version of Smith-Waterman sequence alignment algorithm being run on three different platforms without any changes to the source code. For the sake of simplicity and speed, a small size problem is being used. To try on x86, first navigate to the appropriate directory

cd apps/apps/examples/smithwaterman/ocr

Next, set the WORKLOAD_ARGS environment variable appropriately

export WORKLOAD_ARGS="10 10 ${PWD}/../../../apps/smithwaterman/datasets/string1-medium.txt ${PWD}/../../../apps/smithwaterman/datasets/string2-medium.txt ${PWD}/../../../apps/smithwaterman/datasets/score-medium.txt"

Finally, run the program

OCR_TYPE=x86 make run

Note the output from this run. Compare it against the output from the next run for the distributed version. For this exercise, make sure that mpiicc is in your PATH environment variable.

OCR_TYPE=x86-mpi make run

Finally, run it on tg. Ensure that the tg git repository has been cloned and built alongside the apps and ocr directories for this to work.

OCR_TYPE=tg make run
cat ./install/tg/console.out

Verify that the output from the TG run is the same as that of x86 and x86-mpi

Visualization tools

Video of demo

In this demo, we see the usage of three visualization tools producible by OCR, and a specialized tracing framework. In general, the visualizations are produced via logging application output, and sourcing a script to post-process said application output. For the sake of simplicity, this demo is performed with a small sized fibonacci application. To try these tools:

Timeline & Flowgraph

Start by opening common.mk (File containing all user configurable settings).

 vi ${OCR_ROOT}/build/common.mk

Locate the following CFLAGS and include them:

 CFLAGS += -DOCR_ENABLE_VISUALIZER -DOCR_ENABLE_EDT_NAMING
 CFLAGS += -DOCR_DEBUG_LVL=DEBUG_LVL_INFO 

Navigate to application directory.

 cd ${APPS_ROOT}/examples/fib

Set the following environment variable.

 export ENABLE_VISUALIZER=yes

Run the application and redirect stdout to a file.

 make -f Makefile.x86 run > out

Navigate to the ocr visualizer scripts directory.

 cd ${OCR_ROOT}/scripts/Visualizer

Here you will find the Timeline and Flowgraph directories, each of which contain their respective post-processing scripts. Independently navigate to each directory and source the script, passing each the application logfile.

 python <timeline,flowgraph>.py ${APPS_ROOT}/examples/fib/out

The scripts will produce respective HTML files containing the visualization. Copy to a host machine, and open the files in a web browser.

More details can be found here.

Tracing and Network Heatmap

Start by opening common.mk (File containing all user configurable settings).

 vi ${OCR_ROOT}/build/common.mk

Locate the following CFLAGS and include them:

 CFLAGS += -DOCR_TRACE_BINARY

Navigate to application directory.

 cd ${APPS_ROOT}/examples/fib

The network heatmap is designed strictly for distributed OCR so we will run the app targeting x86-mpi.

Set the desired number of nodes and a nodefile if desired

 export NODE_FILE=<path_to_node_file>
 export OCR_NUM_NODES=<optional_integer_value>

To invoke the tracing framework and execute the app, run the following:

 CONFIG_FLAGS=--sysworker make -f Makefile.x86-mpi run 

Trace information will be written to a binary called trace.bin in <current_app_directory>/install/x86-mpi/

Navigate to trace decoding script directory.

 cd ${OCR_ROOT}/scripts/TraceUtils

Build the executable.

 make

Execute the trace decoder, passing it the trace binary, and write to file.

 ./traceDecoder ${APPS_ROOT}/examples/fib/install/x86-mpi/trace.bin > msgs

The file "msgs" will contain decoded human-readable tracing information, and will be used as input to the network heatmap visualization.

To run the network heatmap, you will find a Linux, and a Windows executable under ${OCR_ROOT}/scripts/networkHeatmap/executables. Copy the necessary executable as well as the decoded trace file to a host machine. Execute the .exe file, and follow the prompts.

More details can be found here.

HPCG application

HPCG is a benchmark code from Jack Dongarra and Mike Heroux that implements preconditioned conjugate gradients on a matrix arising from a 27 point stencil on a large 3D grid. The preconditioner is four levels of multigrid with Symmetric Gauss-Seidel smoothing before and after each coarser grid. The algorithm has been written in native OCR using an OCR reduction library for the required inner products. The code was run on Edison at NERSC on powers of 2 nodes from 1 to 1024 using weak scaling. Each node had 8 workers with a 64x64x64 block of the grid. The video shows a set of foils explaining the algorithm in more detail, launches the 11 batch jobs, and examines some of the resulting outfiles, and finally it shows a summary of the results to date. Scaling has gotten better over time but there is still more work to do.

Legacy MPI application support with MPI-Lite

The demonstration shows MPI-Lite running unmodified MPI code on top of the Open Community Runtime (OCR). The candidate application is the ExMatEx CoMD proxy application which uses MPI to distribute work across cluster nodes.

  • We first show the application being compiled and ran through a standard MPI tool-chain on NERSC's Edison cluster.
  • We then show the application being compiled through the MPI-Lite tool-chain. The following steps are necessary to map MPI onto the OCR task-based execution model.
    • A ROSE-based source-to-source code transformation is invoked on the CoMD source code. The tool rewrites MPI API calls into corresponding MPI-Lite calls as well as extract and rewrite all global variables into an OCR datablock.
    • The generated code can be compiled with standard tools and linked with the MPI-Lite and OCR libraries. We then show the code running on Edison through OCR.
  • Single- and distributed- nodes scaling results are presented and highlight inefficiencies in the distributed scenario.
  • A scaling bottleneck is then investigated:
    • Because the distributed implementation of OCR can use MPI as a communication layer (i.e. does sends and receives as one would do with regular sockets), we've used standard profiling tools such as the Intel Trace Analyzer and Collector (ITAC) to diagnose the issue.
    • The analysis helped highlight a performance bottleneck in the MPI-Lite implementation as well as unfairness in the way communications are handled. These issues are currently being addressed.
  • The video concludes with a summary of the aforementioned topics.

Resiliency

Resiliency in OCR has currently been implemented based on a few simplifying assumptions which will be relaxed:

  • The fault is detected and reported immediately to the runtime (currently this is simulated in software).
  • Only data faults are considered (core or node failures are for a future implementation).
  • The OCR program is written in single assignment form. OCR code generated by high level languages, such as CnC, already follow this constraint.
  • The mean time between failures is less than the time taken for recovery.
  • The current implementation works in a single node configuration.
  • Only faults occurring in heaps used by the application are targeted by this solution. Stacks and runtime memory are exempt from recovery.

The current implementation consists of the following steps:

  • All the OCR API calls in a task are deferred so as to be executed at the successful completion of a task.
  • At this stage, each datablock written to by the task is also backed up.
  • When a fault is detected, the runtime enters into recovery mode, with all the currently running tasks paused.
  • Then, all the affected tasks -- i.e., those operating on any data from a fault-affected datablock -- are restarted, this time using the contents from the last backup.
  • Once the restarted tasks complete execution, the runtime exits recovery mode and normal execution continues.

For the purposes of the demo, we inject fault using software simulation. We assume that data faults would typically detected during a failing load or store, with the faulting address known to the runtime. This would allow the runtime to map out the affected datablock and tasks. Future work in this area would involve relaxing the "single assignment" assumption so as to accommodate a variety of program types. Core and node failures, in a distributed runtime, would also be considered.

Contacts

For any support, please contact one of: