Actions

Performance Tools: Difference between revisions

From Modelado Foundation

imported>Rbbrigh
No edit summary
imported>Jsstone1
No edit summary
Line 12: Line 12:
! style="width: 200;" | SLEEC
! style="width: 200;" | SLEEC
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What abstractions does your runtime stack use for parallelism?  
|'''What abstractions does your runtime stack use for parallelism?'''
|
|
|
|EDTs and data blocks.
|
|
|
|
Line 23: Line 23:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What performance information would application developers need to know to tune codes that use your X-Stack project's software?
|'''What performance information would application developers need to know to tune codes that use your X-Stack project's software?'''
|''Critical questions are of granularity of tasks based on overhead costs of managing threads and relative localities of execution and data objects, although both can be addressed in part by compiler and runtime functions.''
|Critical questions are of granularity of tasks based on overhead costs of managing threads and relative localities of execution and data objects, although both can be addressed in part by compiler and runtime functions.
|
|With the goal being separation of concerns, no platform-specific information need to be known by the application developer. They need to provide hints that describe the software, using appropriate runtime APIs, which the runtime uses to aid in appropriate resource management.
|
|
|
|
Line 34: Line 34:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What would a systems software developer need to know to tune the performance of your software stack?
|'''What would a systems software developer need to know to tune the performance of your software stack?'''
|''Critical questions are of granularity of tasks based on overhead costs of managing threads and relative localities of execution and data objects, although both can be addressed in part by compiler and runtime functions.''
|Critical questions are of granularity of tasks based on overhead costs of managing threads and relative localities of execution and data objects, although both can be addressed in part by compiler and runtime functions.
|
|The runtime exposes resource management modules (introspection, allocator, scheduler) using well-defined internal interfaces that can be replaced or tweaked by the systems software developer to target the underlying platform.
|
|
|
|
Line 45: Line 45:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What information should a performance tool gather from each level in your software stack?  
|'''What information should a performance tool gather from each level in your software stack?'''
|''Performance information is gathered by the APEX runtime introspection data gathering and analysis tool and the RCR low-level system operation data gathering tool. With HPX runtime system policies this information is used to dynamically and adaptively guide resource allocation and task scheduling.''
|Performance information is gathered by the APEX runtime introspection data gathering and analysis tool and the RCR low-level system operation data gathering tool. With HPX runtime system policies this information is used to dynamically and adaptively guide resource allocation and task scheduling.
|
|At application layer - application profiling, at runtime - resource management decisions and runtime overheads, and at simulation - detailed resource usage including monitoring exposed by hardware.
|
|
|
|
Line 56: Line 56:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What performance information can/does each level of your software stack maintain for inspection by performance tools?  
|'''What performance information can/does each level of your software stack maintain for inspection by performance tools?'''
|''Performance information is gathered by the APEX runtime introspection data gathering and analysis tool and the RCR low-level system operation data gathering tool. With HPX runtime system policies this information is used to dynamically and adaptively guide resource allocation and task scheduling.''
|Performance information is gathered by the APEX runtime introspection data gathering and analysis tool and the RCR low-level system operation data gathering tool. With HPX runtime system policies this information is used to dynamically and adaptively guide resource allocation and task scheduling.
|
|Please see above.
|
|
|
|
Line 67: Line 67:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What information would your software stack need to maintain in order to measure per-thread or per-task, performance? Can this information be accessed safely from a signal handler?  Could a performance tool register its own tasks to monitor the performance of the runtime?
|'''What information would your software stack need to maintain in order to measure per-thread or per-task, performance? Can this information be accessed safely from a signal handler?  Could a performance tool register its own tasks to monitor the performance of the runtime?'''
|
|
|
|Currently, the runtime maintains these information at varying degrees of granularity, depending on developers' choice (from instruction & byte counts all the way to task statistics), and this information is available for offline analysis. Future work will allow a portion of this analysis to be made online, so that custom performance tool tasks are accommodated.
|
|
|
|
Line 78: Line 78:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What types of performance problems do you want tools to measure and diagnose? CPU resource consumption? CPU utilization? Network bandwidth? Network latency? Contention for shared resources? Waste? Inefficiency? Insufficient parallelism? Load-imbalance? Task dependences? Idleness? Data movement costs? Power or energy consumption? Failures and failure handling costs? The overhead of resilience mechanisms? I/O bandwidth consumed? I/O latency?
|'''What types of performance problems do you want tools to measure and diagnose? CPU resource consumption? CPU utilization? Network bandwidth? Network latency? Contention for shared resources? Waste? Inefficiency? Insufficient parallelism? Load-imbalance? Task dependences? Idleness? Data movement costs? Power or energy consumption? Failures and failure handling costs? The overhead of resilience mechanisms? I/O bandwidth consumed? I/O latency?'''
|''All of the above and more.''
|All of the above and more.
|
|All the above mentioned, with the exception of I/O. In addition to these - runtime overheads at module-level granularity, memory use at different levels of the hierarchy, temperature & reaction time, DVFS & its effects.
|
|
|
|
Line 89: Line 89:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kinds of performance problems do you foresee analyzing using post-mortem analysis?
|'''What kinds of performance problems do you foresee analyzing using post-mortem analysis?'''
|''Post-mortem information would be useful to analyze non-causal behavioral data that cannot be predicted prior to execution. It must also differentiate this information from that which is entirely data dependent and therefore likely to change from that which is an intrinsic property of the program. A determination of the critical path and side path tasks combined with energy and time consumption requirements for each task would be very useful.''
|Post-mortem information would be useful to analyze non-causal behavioral data that cannot be predicted prior to execution. It must also differentiate this information from that which is entirely data dependent and therefore likely to change from that which is an intrinsic property of the program. A determination of the critical path and side path tasks combined with energy and time consumption requirements for each task would be very useful.
|
|The primary problems diagnosed this way will be resource management decisions, and whether hints supplied by the program/compiler are internalized in decision making correctly. Additionally, runtime overheads will also be tracked closely.
|
|
|
|
Line 100: Line 100:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kinds of performance problems do you foresee analyzing using runtime analysis? What interfaces will be needed to gather the necessary information?  
|'''What kinds of performance problems do you foresee analyzing using runtime analysis? What interfaces will be needed to gather the necessary information?'''
|''The challenge is to prioritize the critical and sub critical tasks for execution filling in with side-path threads with resource availability. Parallelism governing is important to avoid system jamming through throttling so usage monitoring is crucial. The XPRESS APEX runtime subsystem performs these and other services with additional support from the RCR RIOS subsystem.''
|The challenge is to prioritize the critical and sub critical tasks for execution filling in with side-path threads with resource availability. Parallelism governing is important to avoid system jamming through throttling so usage monitoring is crucial. The XPRESS APEX runtime subsystem performs these and other services with additional support from the RCR RIOS subsystem.
|
|DVFS decisions by the runtime, and its impacts will be analyzed using runtime analysis.
|
|
|
|
Line 111: Line 111:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What control interfaces will be necessary to enable runtime adaptation based on runtime performance measurements?
|'''What control interfaces will be necessary to enable runtime adaptation based on runtime performance measurements?'''
|''The challenge is to prioritize the critical and sub critical tasks for execution filling in with side-path threads with resource availability. Parallelism governing is important to avoid system jamming through throttling so usage monitoring is crucial. The XPRESS APEX runtime subsystem performs these and other services with additional support from the RCR RIOS subsystem.''
|The challenge is to prioritize the critical and sub critical tasks for execution filling in with side-path threads with resource availability. Parallelism governing is important to avoid system jamming through throttling so usage monitoring is crucial. The XPRESS APEX runtime subsystem performs these and other services with additional support from the RCR RIOS subsystem.
|
|This is an ongoing work, with scalability being the emphasis (since the metrics will provide huge volumes of data that will be hard to manage). Currently statistical properties of metrics is being considered to be used as proxies for various underlying causes. The interfaces are predominantly those exposed by hardware to the runtime (via counters), and minimal interfaces provided to resource management modules by the runtime.
|
|
|
|
Line 122: Line 122:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| There is a gap between the application-level and implementation-level views of programming languages and DSLs. What information should your software layers (compiler and runtime system) provide to attribute implementation-level performance measurement data to an application-level view?
|'''There is a gap between the application-level and implementation-level views of programming languages and DSLs. What information should your software layers (compiler and runtime system) provide to attribute implementation-level performance measurement data to an application-level view?'''
|''For purposes of performance portability, the principal information required from programmer level is parallelism and some relative locality information. It is possible that some higher-level idiomatic patterns of control and access may be useful but these have as yet to be determined.''
|For purposes of performance portability, the principal information required from programmer level is parallelism and some relative locality information. It is possible that some higher-level idiomatic patterns of control and access may be useful but these have as yet to be determined.
|
|The runtime provides implementation-level performance at the runtime API level. Source transformations from high level application to runtime API would also need to provide mechanisms to also reverse-map the runtime-provided information at the implementation, level back to high level application. Currently, the implementation-level details can still be mapped back to application design with basic level of familiarity with the transformation tools.
|
|
|
|
Line 133: Line 133:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| What kind of visualization and presentation support do you want from performance tools?  Do you envision any IDE integration for performance tools?
|'''What kind of visualization and presentation support do you want from performance tools?  Do you envision any IDE integration for performance tools?'''
|''Visualization of resource usage and pending (bottlenecked) work will help to inform about intrinsic code parallelism and precedent constraints.''
|Visualization of resource usage and pending (bottlenecked) work will help to inform about intrinsic code parallelism and precedent constraints.
|
|Some high level transformation tools already have a graphical representation of the program abstractions. Additionally, we also have a graphical representation of data movement and energy consumption at the simulator level. These will be enhanced to accommodate other performance metrics currently being tracked. IDE integration has not been a focus so far, but will be considered once the toolchain attains maturity.
|
|
|
|
Line 144: Line 144:
|
|
|- style="vertical-align:top;"
|- style="vertical-align:top;"
| List the performance challenges that you think next generation programming languages and models will face.
|'''List the performance challenges that you think next generation programming languages and models will face.'''
|''Overhead and its impact on granularity, diversity of forms and scales of parallelism, parallelism discovery from meta data, energy suppression.''
|Overhead and its impact on granularity, diversity of forms and scales of parallelism, parallelism discovery from meta data, energy suppression.
|
|Based on our choice of the EDT (Event Driven Tasks) model, a primary challenge will be to ensure that the resource management overheads do not nullify the gains got due to the extra parallelism the model enables. We plan to address this by settling on the right granularity of task length and data block sizes so that the overheads are kept low, and the right balance between parallelism and management overheads is struck.
|
|
|
|

Revision as of 04:04, May 12, 2014

QUESTIONS XPRESS TG X-Stack DEGAS D-TEC DynAX X-TUNE GVR CORVETTE SLEEC
What abstractions does your runtime stack use for parallelism? EDTs and data blocks.
What performance information would application developers need to know to tune codes that use your X-Stack project's software? Critical questions are of granularity of tasks based on overhead costs of managing threads and relative localities of execution and data objects, although both can be addressed in part by compiler and runtime functions. With the goal being separation of concerns, no platform-specific information need to be known by the application developer. They need to provide hints that describe the software, using appropriate runtime APIs, which the runtime uses to aid in appropriate resource management.
What would a systems software developer need to know to tune the performance of your software stack? Critical questions are of granularity of tasks based on overhead costs of managing threads and relative localities of execution and data objects, although both can be addressed in part by compiler and runtime functions. The runtime exposes resource management modules (introspection, allocator, scheduler) using well-defined internal interfaces that can be replaced or tweaked by the systems software developer to target the underlying platform.
What information should a performance tool gather from each level in your software stack? Performance information is gathered by the APEX runtime introspection data gathering and analysis tool and the RCR low-level system operation data gathering tool. With HPX runtime system policies this information is used to dynamically and adaptively guide resource allocation and task scheduling. At application layer - application profiling, at runtime - resource management decisions and runtime overheads, and at simulation - detailed resource usage including monitoring exposed by hardware.
What performance information can/does each level of your software stack maintain for inspection by performance tools? Performance information is gathered by the APEX runtime introspection data gathering and analysis tool and the RCR low-level system operation data gathering tool. With HPX runtime system policies this information is used to dynamically and adaptively guide resource allocation and task scheduling. Please see above.
What information would your software stack need to maintain in order to measure per-thread or per-task, performance? Can this information be accessed safely from a signal handler? Could a performance tool register its own tasks to monitor the performance of the runtime? Currently, the runtime maintains these information at varying degrees of granularity, depending on developers' choice (from instruction & byte counts all the way to task statistics), and this information is available for offline analysis. Future work will allow a portion of this analysis to be made online, so that custom performance tool tasks are accommodated.
What types of performance problems do you want tools to measure and diagnose? CPU resource consumption? CPU utilization? Network bandwidth? Network latency? Contention for shared resources? Waste? Inefficiency? Insufficient parallelism? Load-imbalance? Task dependences? Idleness? Data movement costs? Power or energy consumption? Failures and failure handling costs? The overhead of resilience mechanisms? I/O bandwidth consumed? I/O latency? All of the above and more. All the above mentioned, with the exception of I/O. In addition to these - runtime overheads at module-level granularity, memory use at different levels of the hierarchy, temperature & reaction time, DVFS & its effects.
What kinds of performance problems do you foresee analyzing using post-mortem analysis? Post-mortem information would be useful to analyze non-causal behavioral data that cannot be predicted prior to execution. It must also differentiate this information from that which is entirely data dependent and therefore likely to change from that which is an intrinsic property of the program. A determination of the critical path and side path tasks combined with energy and time consumption requirements for each task would be very useful. The primary problems diagnosed this way will be resource management decisions, and whether hints supplied by the program/compiler are internalized in decision making correctly. Additionally, runtime overheads will also be tracked closely.
What kinds of performance problems do you foresee analyzing using runtime analysis? What interfaces will be needed to gather the necessary information? The challenge is to prioritize the critical and sub critical tasks for execution filling in with side-path threads with resource availability. Parallelism governing is important to avoid system jamming through throttling so usage monitoring is crucial. The XPRESS APEX runtime subsystem performs these and other services with additional support from the RCR RIOS subsystem. DVFS decisions by the runtime, and its impacts will be analyzed using runtime analysis.
What control interfaces will be necessary to enable runtime adaptation based on runtime performance measurements? The challenge is to prioritize the critical and sub critical tasks for execution filling in with side-path threads with resource availability. Parallelism governing is important to avoid system jamming through throttling so usage monitoring is crucial. The XPRESS APEX runtime subsystem performs these and other services with additional support from the RCR RIOS subsystem. This is an ongoing work, with scalability being the emphasis (since the metrics will provide huge volumes of data that will be hard to manage). Currently statistical properties of metrics is being considered to be used as proxies for various underlying causes. The interfaces are predominantly those exposed by hardware to the runtime (via counters), and minimal interfaces provided to resource management modules by the runtime.
There is a gap between the application-level and implementation-level views of programming languages and DSLs. What information should your software layers (compiler and runtime system) provide to attribute implementation-level performance measurement data to an application-level view? For purposes of performance portability, the principal information required from programmer level is parallelism and some relative locality information. It is possible that some higher-level idiomatic patterns of control and access may be useful but these have as yet to be determined. The runtime provides implementation-level performance at the runtime API level. Source transformations from high level application to runtime API would also need to provide mechanisms to also reverse-map the runtime-provided information at the implementation, level back to high level application. Currently, the implementation-level details can still be mapped back to application design with basic level of familiarity with the transformation tools.
What kind of visualization and presentation support do you want from performance tools? Do you envision any IDE integration for performance tools? Visualization of resource usage and pending (bottlenecked) work will help to inform about intrinsic code parallelism and precedent constraints. Some high level transformation tools already have a graphical representation of the program abstractions. Additionally, we also have a graphical representation of data movement and energy consumption at the simulator level. These will be enhanced to accommodate other performance metrics currently being tracked. IDE integration has not been a focus so far, but will be considered once the toolchain attains maturity.
List the performance challenges that you think next generation programming languages and models will face. Overhead and its impact on granularity, diversity of forms and scales of parallelism, parallelism discovery from meta data, energy suppression. Based on our choice of the EDT (Event Driven Tasks) model, a primary challenge will be to ensure that the resource management overheads do not nullify the gains got due to the extra parallelism the model enables. We plan to address this by settling on the right granularity of task length and data block sizes so that the overheads are kept low, and the right balance between parallelism and management overheads is struck.

Note, PIPER is not listed as a column above, since it is intended as a recipient of this information.