Skip to content

Meeting 2024

Howard Pritchard edited this page Apr 24, 2024 · 83 revisions

Open MPI Developer's 2024 Meeting

Meeting logistics:

Remote attendance information

The meeting rooms are integrated with MSTeams, there will be separate link for every day to attend the meeting for remote participants. This is a link to a non-public repo for the info (posting links publicly just invites spam; sorry folks).

If you do not have access to the non-public repo, please email Jeff Squyres.

Attendance

Please put your name down here if you plan to attend.

  1. Edgar Gabriel (AMD)
  2. Howard Pritchard (LANL)
  3. Thomas Naughton (ORNL)
  4. George Bosilca (NVidia)
  5. Joseph Schuchart (UTK)
  6. Kawthar Shafie Khorassani (AMD)
  7. Manu Shantharam (AMD)
  8. Luke Robison (AWS)
  9. Jun Tang (AWS)
  10. Wenduo Wang (AWS)
  11. Tommy Janjusic (Nvidia)

Agenda items

The meeting is tentatively scheduled to start on April 24 around 1pm, and is expected to finish on April 26 around lunch time.

Please add Agenda items we need to discuss here.

  • Support for MPI 4.0 compliance (https://github.com/open-mpi/ompi/projects/2)

  • Support for MPI 4.1 compliance (https://github.com/open-mpi/ompi/projects/4)

    • Memory kind info objects

    Edgar presents some slides summarizing this MPI 4.1 feature. Discussion of mpi_assert_memory_alloc_kinds. How would we actually use this with Open MPI? Slides also show work items. Some work required in prrte. Discuss lazy init of cuda, etc. We may not be able to do that much optimization based on memory kinds anyway out side of ptr check. Some discussion of the complications of restrictors and how that may complicate actually using this kind info for Open MPI internally. Also Open MPI can be configured with multiple device type support. Do we need to support different device types concurrently? Accelerator framework currently not set up to deal with this, it allows only one component to be active. Discuss multiple devices of a single type. Right now cuda and rocm not making use of APIs that use device IDs, but could, at least for cuda. Maybe items for a 6.0 release?

  • Support for MPI 4.2(?) ABI (https://github.com/mpi-forum/mpi-issues/issues/751)

  • Collective Operations

    • xhc/shared memory collectives
    • GPU collectives
    • Collective configuration file
    • Memory allocation caching
  • Accelerator support

    • shared memory plans for 5.1 and beyond
    • one-sided operations

    IPC support in accelerators for 5.1. In main no components outside of accelerator framework make cuda calls. We do need IPC support in accelerator/cuda component.

    GMAC parameter support? PMIx may have something similar. Idea would be to change priorities for accelerator related components without having to set multiple MCA parameters.

    Joseph working on PR 12356 - https://github.com/open-mpi/ompi/pull/12356 and 12318 - https://github.com/open-mpi/ompi/pull/12318 related to accelerator support.

  • PRRTe future topic

  • Review previously-created wiki pages for 5.1.x and 6.0.x in the context of planning for Open MPI vNEXT

    • These were made a long time ago; it would probably be good to re-evaluate, see which items are realistic, which will actually happen, etc. Timing / version numbers may change / consolidate, too, if we re-integrate PRRTE for v6.0.x (e.g., is doing a v5.1.x worth it at all?).
    • Proposed v5.1.x feature list
    • Proposed v6.0.x feature list
  • What to do about SLURM?

  • For OFI group

    • Adopt libfabric 2.0 API?
    • Adopt dma-buf API
    • mtl/ofi vs. btl/ofi performance differences
  • Misc

    • MPI_Info_set handling https://github.com/open-mpi/ompi/pull/11823
    • What is the bar for merging something into main ? Just a successful CI pass ? What if there are complains from the rest of the community ? What is the solution is known to be partial and incomplete ?
    • Should we enable better downstream build pipeline security for those downloading from open-mpi.org?
      • For v5.0.x, we have md5, sha1, and sha256 checksums in the HTML on the download page.
      • Should we have these values in (more easily) machine-readable formats somewhere?
      • Should we be to cryptographically signing releases somehow? (tarballs do not support signatures)
      • What do others do (e.g., GNU projects)?
  • Action items

    • Joseph will ping scorep folks about interest in MPI_T events.
Clone this wiki locally