HHAT Usage Meeting Minutes: Difference between revisions
From Modelado Foundation
imported>Cnewburn mNo edit summary |
imported>Wpinfold No edit summary |
||
Line 1: | Line 1: | ||
This wiki page is used to keep minutes from phone and face to face meetings on the topic of usage models, user stories, and applications for heterogeneous hierarchical asynchronous tasking. Most recent meetings are listed on top. | This wiki page is used to keep minutes from phone and face to face meetings on the topic of usage models, user stories, and applications for heterogeneous hierarchical asynchronous tasking. Most recent meetings are listed on top. | ||
= OCR Review, Feb. 21, 2017 = | |||
* Wilf: Presentation material out on the wiki: OCR usage models is the one for today | |||
* Bala - OCR (Open Community Runtime), presents overview of OCR | |||
* Wilf: How do you decide on granularity of the task breakdown for AutoOCR? Is there some sort of input file? | |||
* Bala: Granularity is entirely the choice of the developer. AutoOCR is pretty straightforward - use a keyword to indicate that a task should be an EDT and annotate data blocks. Compiler will follow that and decorate with OCR API. It makes no decisions regarding granularity for itself. Compiler path is implemented in LLVM which looks at the keywords and generates OCR code. | |||
* Wilf: With MPI-Lite can you get some resiliency that you can't get from MPI? | |||
* Bala: That's interesting; we've not tried it. Resiliency & MPI-Lite have each been tried in isolation but not together. | |||
* Stephen Jones: How do people usually port to OCR? | |||
* Bala: People usually try to see if their MPI code can adapt to OCR. Will sacrifice performance while they see if they can implement in OCR. Some constructs like MPI_Wait are not aligned with OCR (which assumes an EDT can run to completion). Once people have adapted to OCR then there's no more reason to run MPI at all - they'll then restructure their program to reduce bottlenecks once they have a much better view of the dataflow graph. | |||
* CJ: What about continuation-style semantics. | |||
* Bala: A constant back-and-forth: should we stick to the "pure" model of no waits or stalls once a task has started? This would mean we need to split the task around a stall, but would also make data management complex between tasks. Some have looked at continuation semantics as a way to wait & context-switch within a task: moves the complexity into the runtime, which has to implement the continuation. Not many people have been trying this yet. | |||
* CJ: That's what Argobots & Qthreads are going after. HiHAT is looking to layer these on top of it to manage such continuations. | |||
* Bala presents on app requirements support | |||
* Wilf: What's performance looking like right now for e.g. MPI-Lite? How heavy is the task-based overhead at this time? | |||
* Bala: For MPI-Lite we've not put any effort into performance, because it's not trying to compete with MPI. OCR uses MPI for communication in this mode.Numbers look promising. At 16k cores OCR does not appear to perform any worse than MPI. | |||
* Wilf: How does resiliency play into this, if you've got 16k cores for example? | |||
* Bala: Not tried it at that scale yet. It will obviously slow things down. Has been tried out in isolation but not mixed together with performance yet. | |||
* Wilf: What about load-balancing? Was that 16k run fairly regular? | |||
* Bala: Again, have not yet tried this out in an application. In isolation, have used it at 64-node scale. | |||
** Have tried it out with Mini-AMR and seen some good results but still wrestling with heuristics that are needed. More heuristic intelligence does not seem to provide a lot of benefit because of the overhead of coming up with intelligent heuristics. | |||
* Stephen Olivier: Do you have any full-sized apps you have results for? | |||
* Michael Wong: Do you have a regular OCR call? | |||
* MW: Have you looked at any bottlenecks inside OCR? | |||
* Bala: One of the things we're already aware of is the GUID implementation. Making it globally unique can be expensive and in practice you don't always need it to be truly global around the cluster: you only need uniqueness spatially or temporally. Suggests two types of GUID: truly global, and then more local UID. | |||
** Can also probably shave off some overhead in event management (Legion has managed this, for example). You can often re-use events without the overhead of creation/destruction. | |||
* Wilf: Here's where we are with the meetings | |||
** We've been using EventBrite for registration but it's getting a bit awkward. Trying to move over to MailChimp. We've got about 69 in the group (30 on the call today). | |||
** Everyone will receive an email in the next week for registration. Use that to register, not EventBrite, in future please. | |||
** Wiki will be kept with link to database of MailChimp info | |||
* CJ: Some higher comments & contexts | |||
** Upcoming talks will look at the apps/algos which will be layered on top of HiHAT. | |||
** Lots of good work in progress - appreciate people contributing and sharing | |||
* Michael Wong: One thing he's looking at is developing heterogeneous C++. If the group is interested he can send out some information about that. Also going to be running a workshop on this. | |||
* CJ: Want to look at these things and decide "would these be called BY HiHAT, or built on top of HiHAT?" | |||
* MW: Do have models which can build on top of HiHAT. Can have discussion at a later meeting. | |||
= Community meeting, Jan 17, 2017 = | = Community meeting, Jan 17, 2017 = |
Revision as of 08:00, February 22, 2017
This wiki page is used to keep minutes from phone and face to face meetings on the topic of usage models, user stories, and applications for heterogeneous hierarchical asynchronous tasking. Most recent meetings are listed on top.
OCR Review, Feb. 21, 2017
- Wilf: Presentation material out on the wiki: OCR usage models is the one for today
- Bala - OCR (Open Community Runtime), presents overview of OCR
- Wilf: How do you decide on granularity of the task breakdown for AutoOCR? Is there some sort of input file?
- Bala: Granularity is entirely the choice of the developer. AutoOCR is pretty straightforward - use a keyword to indicate that a task should be an EDT and annotate data blocks. Compiler will follow that and decorate with OCR API. It makes no decisions regarding granularity for itself. Compiler path is implemented in LLVM which looks at the keywords and generates OCR code.
- Wilf: With MPI-Lite can you get some resiliency that you can't get from MPI?
- Bala: That's interesting; we've not tried it. Resiliency & MPI-Lite have each been tried in isolation but not together.
- Stephen Jones: How do people usually port to OCR?
- Bala: People usually try to see if their MPI code can adapt to OCR. Will sacrifice performance while they see if they can implement in OCR. Some constructs like MPI_Wait are not aligned with OCR (which assumes an EDT can run to completion). Once people have adapted to OCR then there's no more reason to run MPI at all - they'll then restructure their program to reduce bottlenecks once they have a much better view of the dataflow graph.
- CJ: What about continuation-style semantics.
- Bala: A constant back-and-forth: should we stick to the "pure" model of no waits or stalls once a task has started? This would mean we need to split the task around a stall, but would also make data management complex between tasks. Some have looked at continuation semantics as a way to wait & context-switch within a task: moves the complexity into the runtime, which has to implement the continuation. Not many people have been trying this yet.
- CJ: That's what Argobots & Qthreads are going after. HiHAT is looking to layer these on top of it to manage such continuations.
- Bala presents on app requirements support
- Wilf: What's performance looking like right now for e.g. MPI-Lite? How heavy is the task-based overhead at this time?
- Bala: For MPI-Lite we've not put any effort into performance, because it's not trying to compete with MPI. OCR uses MPI for communication in this mode.Numbers look promising. At 16k cores OCR does not appear to perform any worse than MPI.
- Wilf: How does resiliency play into this, if you've got 16k cores for example?
- Bala: Not tried it at that scale yet. It will obviously slow things down. Has been tried out in isolation but not mixed together with performance yet.
- Wilf: What about load-balancing? Was that 16k run fairly regular?
- Bala: Again, have not yet tried this out in an application. In isolation, have used it at 64-node scale.
- Have tried it out with Mini-AMR and seen some good results but still wrestling with heuristics that are needed. More heuristic intelligence does not seem to provide a lot of benefit because of the overhead of coming up with intelligent heuristics.
- Stephen Olivier: Do you have any full-sized apps you have results for?
- Michael Wong: Do you have a regular OCR call?
- MW: Have you looked at any bottlenecks inside OCR?
- Bala: One of the things we're already aware of is the GUID implementation. Making it globally unique can be expensive and in practice you don't always need it to be truly global around the cluster: you only need uniqueness spatially or temporally. Suggests two types of GUID: truly global, and then more local UID.
- Can also probably shave off some overhead in event management (Legion has managed this, for example). You can often re-use events without the overhead of creation/destruction.
- Wilf: Here's where we are with the meetings
- We've been using EventBrite for registration but it's getting a bit awkward. Trying to move over to MailChimp. We've got about 69 in the group (30 on the call today).
- Everyone will receive an email in the next week for registration. Use that to register, not EventBrite, in future please.
- Wiki will be kept with link to database of MailChimp info
- CJ: Some higher comments & contexts
- Upcoming talks will look at the apps/algos which will be layered on top of HiHAT.
- Lots of good work in progress - appreciate people contributing and sharing
- Michael Wong: One thing he's looking at is developing heterogeneous C++. If the group is interested he can send out some information about that. Also going to be running a workshop on this.
- CJ: Want to look at these things and decide "would these be called BY HiHAT, or built on top of HiHAT?"
- MW: Do have models which can build on top of HiHAT. Can have discussion at a later meeting.
Community meeting, Jan 17, 2017
Agenda
- Welcome: Wilf Pinfold
- Overview, purpose
- Solicit apps that need hierarchical tasking
- Solicit usage models
- Fully dynamic to semi-static - Pall
- Solicit user stories (requirements)
- Map tasks to multiple GPUs - Dmitry
- Granularity - Pall
- Finite memory - Carter; see "Sandia" on Applications page
- Distributed data structures in finite memory - Toby
- For latency sensitivity apps, anything overheads need to be offset by significant gains - Pall
- Hierarchical topology - Toby
- Building libs for finite physical memory; libs cooperating with caller, e.g. via callbacks - John Stone
- Aggregated task groups, recursive task model that enables decomposition - Dmitry, Ashwin Aji
- Data affinity-driven binding and scheduling and data decomposition - Pall
- Move work to data vs. other way around - John
- PGAS support, data affinity and decomposition - Toby
- Housekeeping - Wilf
Participants included: Wilfred Pinfold - creator, John Stone, umit@gatech.edu, Wael Elwasif, xg@purdue.edu Xinchen Guo, belak1@llnl.gov, Ruymán Reyes, pa13269@bristol.ac.uk Patrick Atkinson, Max Grossman, gordon@codeplay.com, bala.seshasayee@intel.com, mbianco@cscs.ch, ashwin.aji@amd.com, khalbiniak@icis.pcz.pl - Kamil Halbiniak, roman@icis.pcz.pl - Roman Wyrzykowski, fabien.delalondre@epfl.ch, richards12@llnl.gov, pszilard@kth.se - Pall, Michael Wong, Shekhar Borkar, David Bernholdt, rabuch2@illinois.edu, bill@feiereisen.net, cnewburn@nvidia.com, Piotr Luszczek, liakhdi@ornl.gov, Muthu Baskaran, jesmin.jahan.tithi@intel.com, slolivi@sandia.gov, hcedwar@sandia.gov - Carter, fuchst@nm.ifi.lmu.de - Toby, rbbrigh@sandia.gov - Ron
Signed up, but seemed not to make it: timothy.g.mattson@intel.com, schulzm@llnl.gov, oscar@ornl.gov[conflict], mbauer@nvidia.com, romain.e.cledat@intel.com, aiken@cs.stanford.edu, mfarooqi14@ku.edu.tr, lopezmg@ornl.gov, Benoit Meister, vgrover@nvidia.com, kelly.a.livingston@intel.com, alexandr.nigay@inf.ethz.ch, matthieu.schaller@durham.ac.uk, manjugv@ornl.gov, esaule@uncc.edu, schandra@udel.edu, cychan@lbl.gov, gshipman@lanl.gov, mgarland@nvidia.com, vsarkar@me.com, Didem Unat, maria.garzaran@intel.com, john.feo@pnnl.gov, mike.chu@amd.com, timothee.ewart@epfl.ch, jim@ks.uiuc.edu, n-maruyama@acm.org, pcicotti@sdsc.edu, kk13@rice.edu, srajama@sandia.gov
Kickoff, Dec. 20, 2016
Agenda
- Welcome: Wilf Pinfold
- Overview, purpose
- Approach
- Wiki explanation
- Next steps
- Feedback, expression of interest
Participants (33) included
BillF, CarterE, DavidR, Erik, JimP, KamilH & RomanW, PatrickA, PietroC, SenT, ShekharB, CJ, WilfP, StephenJ, XinchenG, TimM, RomainC, OscarH, AlexandrN, VinodG, KathK, Ashwin Aji, JoshF, GalenS, ManjuG, PallS, MariaG, ... See calendar entry, if you signed up
Discussion
- Glossary suggested by Tim, try not to invent new definitions
- Report suggested by Oscar - summary of usage cases could be useful for DoE
- How do we keep from getting fragmented? (Tim) Try to bringing community together by focusing on common requirements (Wilf)
- Start with usage models, requirements, provisioning constraints, rather than comparing and contrasting specific implementations
- We have data and experience to share
- Looking to have a phone meeting 3rd Tue each month at 9am PST; some here had standing conflicts; Wilf to try a Doodle poll
- Time scale, involvement, outputs?
- Are we sold on async tasking? Driven more by efficiency on HW? (Shekhar) Yes (Oscar) Who needs it for what? We need compelling examples of where mainline DoE apps need it. (Dave Richards) Clever use of MPI goes a long way (Tim)
- MPI: resilience not well addressed (Wilf) Comparison with MPI is inappropriate, tasking can be done on top of MPI, e.g. two-hot, accelerated MD. It's about the benefit of a computational model, which helps some and not others. (Galen) Tim agrees that MPI is low-level runtime.
- Interesting to identify a set of apps that embody tasking, and understand why they chose that model (Galen) Sounds like a potential value proposition (Shekhar).
- Characteristics: granularity of tasks - the finer the granularity the less portable the solution, explicit vs. implicit control (DaveR) If task relationships can be described, it can become more portable (Stephen) How will decomposition happen - expert, compiler, runtime? (DaveR)
- How do we make this applicable to large, portable code bases, enabling productivity? Where does the tasking model emerge? (DaveR)
- What does it mean to have an async environment, what are the critical features? (Josh)
- The way to resolving differences at various levels may lie in hierarchy (Kath) Strongly agree with hierarchy (Tim)
- Strongly agree with a bottom up approach, with a hierarchical perspective (Tim)