Skip to content

Releases: aws/aws-ofi-nccl

AWS OFI NCCL v1.16.0

27 Jun 23:26
v1.16.0
Compare
Choose a tag to compare

v1.16.0 (2025-06)

The 1.16.0 release series supports NCCL v2.27.5-1 while maintaining backward compatibility with older NCCL versions ((NCCL v2.17.1 and later).

With this release, building with platform-aws requires Libfabric v1.22.0amzn4.0 or greater. And it is currently tested with versions up to Libfabric 2.1.0amzn3.

Bug Fixes and Improvements:

  • On AWS platforms the following environment variables NCCL_BUFFSIZE, NCCL_P2P_NET_CHUNKSIZE, NCCL_NVLSTREE_MAX_CHUNKSIZE, NCCL_NVLS_CHUNKSIZE, NCCL_NET_FORCE_FLUSH may be set by the plugin
  • Fix bug that prevented communicators from aborting gracefully, as part of supporting NCCL fault tolerance features
  • On AWS platforms, enable collective algorithm tuner by default
  • Improve P6-B200 tuner configuration to improve performance for 4 -- 32 MiB messages across node counts and large message AllReduce on 8 nodes
  • Added libnccl-tuner-ofi.so symlink for easier configuration with NCCL_TUNER_PLUGIN=ofi

Checksum (sha512) for the release tarball aws-ofi-nccl-1.16.0.tar.gz:

079635016d1e12407e072f7c0023d45074a9bb60ceeee23b4e82b3be8b2dbf7944eb57a03e33e57242da06386027a4fb3eb64f7c5d85f4a84215072d8b23a8fc  aws-ofi-nccl-1.16.0.tar.gz

AWS OFI NCCL v1.15.0

04 Jun 04:18
v1.15.0
Compare
Choose a tag to compare

v1.15.0 (2025-06)

The 1.15.x release series supports NCCL 2.26.6-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

With this release, building with platform-aws requires Libfabric v1.22.0amzn4.0 or greater. And it is currently tested with versions up to Libfabric 2.1.0amzn3.

Bug Fixes and Improvements:

  • Build system and platform support
    • Added AWS P6-B200 platform support
    • Changed default plugin library name to libnccl-net-ofi.so, and by default create symlink from libnccl-net-ofi.so to libnccl-net.so to maintain backward compatibility. This allows users to set NCCL_NET_PLUGIN=ofi to force NCCL to use the OFI plugin for communication. Specifying --disable-nccl-net-symlink to configure will skip the symlink, allowing multiple plugins to be installed in the same container.
  • Tuning and performance improvements
    • Added tuner support on P6-B200 for AllReduce, AllGather, and ReduceScatter regions for 0x0 and 0x7 bitmask
    • Updated default latency for P5en and P6-B200 platforms based on empirical results and analysis
  • Update to use NCCL v10 API with trafficClass parameter support for future traffic prioritization
  • Migrated plugin code base from C to C++
  • Added support for jobs where the number of NICs per GPU is different across systems. See the OFI_NCCL_FORCE_NUM_RAILS runtime environment variable documentation for more information.

OFI NCCL plugin runtime environment variable changes:

Deprecated environment variables

  • OFI_NCCL_RDMA_MIN_POSTED_BOUNCE_BUFFERS
  • OFI_NCCL_RDMA_MAX_POSTED_BOUNCE_BUFFERS

New environment variables

  • OFI_NCCL_SCHED_MAX_SMALL_RR_SIZE
  • OFI_NCCL_RDMA_MIN_POSTED_EAGER_BUFFERS
  • OFI_NCCL_RDMA_MAX_POSTED_EAGER_BUFFERS
  • OFI_NCCL_RDMA_MIN_POSTED_CONTROL_BUFFERS
  • OFI_NCCL_RDMA_MAX_POSTED_CONTROL_BUFFERS
  • OFI_NCCL_CQ_SIZE

Updated environment variables defaults

  • OFI_NCCL_RR_CTRL_MSG: default changed from 0 to 1

Checksum (sha512) for the release tarball aws-ofi-nccl-1.15.0.tar.gz:

9d529512927d3b2d1387f942283846889d0679dfd21b427f72e90d89d43bceb301e9f839a0290df3accb1ca9929818e811b94517241722becf6878d6d8646242  aws-ofi-nccl-1.15.0.tar.gz

AWS OFI NCCL v1.14.2

26 Apr 05:30
v1.14.2
Compare
Choose a tag to compare

v1.14.2 (2025-04)

This is a general release that is broadly applicable and is designed to be used with any network that can satisfy the network capabilities the plugin requires, as expressed through the Libfabric API's provider discovery mechanism. We are expanding our test coverage to continue making general releases going forward. If you would like to facilitate this effort to get coverage for networks you intend to use the plugin with, please reach out to us.

With this release, building with platform-aws requires Libfabric v1.22.0amzn4.0 or greater. And it is currently tested with versions up to Libfabric 2.1.0

The 1.14.x release series supports NCCL 2.26.2-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

Improvements:

  • Enable DMA-BUF by default, but blocklist DMA-BUF on EFA versions 1-3 due to a known issue on those platforms.

Checksum (sha512) for the release tarball aws-ofi-nccl-1.14.2.tar.gz:

68488362185222818070456e141a51aa7e4afafdbd403018bed618063969b63c62c194eeac58f23bad96e484f48ed76c5c4c8a845d9129dfbfffc649ea919521  aws-ofi-nccl-1.14.2.tar.gz

AWS OFI NCCL v1.14.1

08 Apr 04:04
v1.14.1
Compare
Choose a tag to compare

v1.14.1 (2025-04)

This is a general release that is broadly applicable and is designed to be used with any network that can satisfy the network capabilities the plugin requires, as expressed through the Libfabric API's provider discovery mechanism. We are expanding our test coverage to continue making general releases going forward. If you would like to facilitate this effort to get coverage for networks you intend to use the plugin with, please reach out to us.

With this release, building with platform-aws requires Libfabric v1.22.0amzn4.0 or greater. And it is currently tested with versions upto Libfabric 2.1.0

The 1.14.x release series supports NCCL 2.26.2-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

Bug Fixes and Improvements:

  • Fixed an issue in the sendrecv protocol that would result in a leaking MR keys warning with some providers.

These changes improve compatibility with libfabric 2.0 and enhance the overall reliability of the plugin, particularly in scenarios involving memory registration and connection establishment.

Checksum (sha512) for the release tarball aws-ofi-nccl-1.14.1.tar.gz:

188c84750cce0121f6abd090c9d7bc419dab095a2224292fc8c79d4653cf72955a30777211318f8cfaff87d689a6ac1f6daddb7144db986611b8ddb1f6602ca5

AWS OFI NCCL v1.14.0

14 Mar 22:39
v1.14.0
Compare
Choose a tag to compare

v1.14.0 (2025-03)

Releases v1.7.0-aws through v1.13.2-aws were intended only for use on AWS P* instances. With this release, we are resuming general releases that are broadly applicable and is designed to be used with any network that can satisfy the network capabilities the plugin requires, as expressed through the Libfabric API's provider discovery mechanism.

We are expanding our test coverage to continue making general releases going forward. If you would like to facilitate this effort to get coverage for networks you intend to use the plugin with, please reach out to us.

With this release, building with platform-aws requires 1.22.0amzn4.0 or greater. AWS customers are generally recommended to track the latest-available EFA Installer for performance improvements and bug fixes.

The 1.14.x release series supports NCCL 2.26.2-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

Bug Fixes and Improvements:

  • Transport Enhancements:

    • Added memory descriptor handling for control messages to properly support FI_MR_LOCAL.
    • RDMA Transport Enhancements: Added NCCL receive request early completion support when provider data progress model is FI_PROGRESS_AUTO
  • Tuning Improvements:

    • Modified tuner behavior to default to NCCL internal tuner on two-node configurations.
      This change addresses outlier performance issues in two-node scenarios.

These changes improve compatibility with libfabric 2.0 and enhance the overall reliability of the plugin, particularly in scenarios involving memory registration and connection establishment.

Checksum (sha512) for the release tarball:

d0943ecea58d4335e59f007275789eee2da9acad639d3a46f676d71525e6161a65c875602ccbeef7ade54339c2388137cf0e767402c5bfc8eb77637651b05c46

AWS OFI NCCL v1.13.2

11 Dec 17:58
v1.13.2-aws
Compare
Choose a tag to compare

v1.13.2-aws (2024-12-06)

This release is intended only for use on AWS P* instances. A general release that supports other libfabric networks may be made in the near future.

With this release, building with platform-aws requires 1.22.0amzn4.0 or greater. AWS customers are generally recommended to track the latest-available EFA Installer for performance improvements and bug fixes.

The 1.13.x release series supports NCCL 2.23.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

Bug Fixes:

  • Tuner Improvements:
    • Fixed algorithm selection for larger ranks and message sizes.
    • Re-calibrated the tuner for AllGather and ReduceScatter regions for 0x7 bitmask on P5en, optimizing performance for larger messages.
    • Added tuner support for AllGather and ReduceScatter regions for 0x0 bitmask on P5en.
  • Resolved a performance issue by preventing the eager protocol when RDMA writes are in flight, improving small AllReduce collective performance.

Note: dmabuf support is now turned off by default. Users can enable it explicitly using OFI_NCCL_DISABLE_DMABUF=0 if needed.

Checksum (sha512) for the release tarball:

4c0ac3144f178062fda9e86b50bb1784822e8fdbdffadf41cdbb30839456c4e912254ff12a5b0a8c63abbe910597fd14211a42572a451d10e01932100013971e  aws-ofi-nccl-1.13.2-aws.tar.gz

AWS OFI NCCL v1.13.1

26 Nov 23:10
v1.13.1-aws
Compare
Choose a tag to compare

(2024-11-26)

This release is intended only for use on AWS P* instances. A general release that supports other libfabric networks may be made in the near future.

With this release, building with platform-aws requires 1.22.0amzn4.0 or greater. AWS customers are generally recommended to track the latest-available EFA Installer for performance improvements and bug fixes.

The 1.13.x release series supports NCCL 2.23.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

Supported Distributions

  • Amazon Linux 2
  • Amazon Linux 2023
  • Ubuntu 20.04 LTS, 22.04 LTS.

For releases before v1.6.0, we generally created releases from two separate
branches, an AWS-specific branch and a general release branch. With v1.6.0, we
have unified the code into a single branch, and made the AWS-specific parts a
compile-time option. When a feature (or entire release) only supports one of
the two variants, we note that in the release notes.

What's Changed

This release contains no functional changes compared to v1.13.0-aws. This release merely updates the version set in AC_INIT to include the -aws suffix to match the tag name and ensure generated artifacts are named correctly.

Checksum (sha512) for the release tarball:

b71afd2e7776b77392c91abb818fa011e415f31fa9061556cd725d7a52eb4101b45a10fe91284ec7cff06a9653456e95ae70a472affb32f68e01b1ce5e49ff83  aws-ofi-nccl-1.13.1-aws.tar.gz

v1.13.0-aws

19 Nov 05:37
cf7606e
Compare
Choose a tag to compare

(2024-11-18)

This release is intended only for use on AWS P* instances. A general release that supports other libfabric networks may be made in the near future.

With this release, building with platform-aws requires 1.22.0amzn4.0 or greater. AWS customers are generally recommended to track the latest-available EFA Installer for performance improvements and bug fixes.

The 1.13.x release series supports NCCL 2.23.4-1 while maintaining backward compatibility with older NCCL versions (NCCL v2.17.1 and later).

New features:

  • AWS P5en platform support was added.

  • support was added for the NCCL v3 tuner API. The tuner now supports multiple
    platforms and supports multiple collectives.

  • Scheduling improvements were made to the plugin RDMA protocol. In multirail
    configurations, this is expected to balance traffic more optimally.

  • dmabuf memory registration support was added. Users facing problems with
    dmabuf may disable dmabuf with OFI_NCCL_DISABLE_DMABUF=1.

Breaking changes:

  • As mentioned above, building with support for platform-aws now requires
    libfabric version 1.22.0amzn4.0 or greater.

  • Under CUDA, the plugin now statically links the CUDA runtime by default.
    Packagers preferring to dynamically link CUDA may pass
    --enable-cudart-dynamic at configure time to disable this.

Supported Distributions

  • Amazon Linux 2
  • Amazon Linux 2023
  • Ubuntu 20.04 LTS, 22.04 LTS.

For releases before v1.6.0, we generally created releases from two separate
branches, an AWS-specific branch and a general release branch. With v1.6.0, we
have unified the code into a single branch, and made the AWS-specific parts a
compile-time option. When a feature (or entire release) only supports one of
the two variants, we note that in the release notes.

What's Changed

  • ci: build oldest working EFA installer and latest by @aws-nslick in #522
  • api: fail when using connect/accept_v4 with RDMA protocol by @rauteric in #529
  • rdma: write topo file only for multi-rail platforms by @rauteric in #532
  • dist: set TAR_OPTIONS to remove ownership info by @rauteric in #523
  • Revert ".ci/aws: Add trainium tests to CI" by @a-szegel in #535
  • nvidia: Change default network name to "Libfabric" by @bwbarrett in #530
  • tuner: support tuner v3 API by @AmedeoSapio in #524
  • init: Avoid hang by forcing SENDRECV in case of neuron v4 API usage by @maxtmann in #537
  • Fix naming of array in nccl_net_ofi_plugin_init by @ryanhankins in #539
  • Revert "param: increase CQ read count to 16 for performance" by @maxtmann in #538
  • .ci/aws: Add g4dn testing to PR CI by @a-szegel in #527
  • .ci/aws: Make failures happen in correct stage by @a-szegel in #528
  • platform: Set RDMA protocol as default for trn1/trn1n platforms by @maxtmann in #540
  • Expose each libfabric NIC as one NIC device to the user in case of non-NVIDIA platforms by @maxtmann in #544
  • ci: cache efa installer by @aws-nslick in #545
  • ci: fix efa installer caching by @aws-nslick in #546
  • fix(rdma): endpont_per_comm: NULL ptr bug by @rauteric in #551
  • tuner: Enable tuner init msg on INFO logs by @arunkarthik-akkart in #549
  • .ci/aws: Decrease NCCL_TEST iterations to 5 by @a-szegel in #550
  • fix(tree): use correct __cplusplus guards by @aws-nslick in #554
  • Separate endpoint for control messages by @rajachan in #543
  • fix(tree): add spaces around PRIu64 by @aws-nslick in #555
  • feat(tree): add static_assert shim macro by @aws-nslick in #556
  • fix(aws): align declaration and init order by @aws-nslick in #557
  • fix(rdma): fi_{send,write}data: do arithmetic on uintptr by @aws-nslick in #558
  • fix(tuner): don't choose NVLSTree if nRanks==nNodes by @AmedeoSapio in #583
  • rdma: Eliminate unnecessary ctrl message waits in eager protocol by @rauteric in #553
  • fix(tracing): use header-only nvtx3 by @aws-nslick in #590
  • chore(.github/workflows): constrain push triggers to known branches by @aws-nslick in #582
  • feat(build): better --enable-debug defaults by @aws-nslick in #596
  • fix(freelist): use uintptr_t for pointer arithmetic by @aws-nslick in #560
  • Fix: access domain from ep during mr on device by @maxtmann in #602
  • Feature/v6 rma ops by @maxtmann in #541
  • platform: trn1 default protocol send receive by @hunnorth in #603
  • fix(tree): import libfabric's container_of macro by @aws-nslick in #605
  • fix(valgrind): fix autotools mistake by @aws-nslick in #607
  • feat(ci/github): use docker instead of codebuild by @aws-nslick in #608
  • CI updates by @rajachan in #612
  • util: Use FI_ENOPROTOOPT to check for a provider's support for option by @rajachan in #613
  • Fix log format string behavior by @bwbarrett in #615
  • Improve protocol selection logic by @bwbarrett in #610
  • .ci/aws: Unpin al2 p3dn ami by @a-szegel in #552
  • .ci/aws: re-Add trainium tests to CI by @a-szegel in #619
  • fix(m4): set redzone size to 0 by @rauteric in #616
  • Fully destroy endpoints when refcount is 0 by @bwbarrett in #617
  • feat: add DMA-BUF support by @aws-nslick in #618
  • Improve end of process cleanup and reporting by @bwbarrett in #620
  • fix(rdma): stop setting FI_ORDER_NONE by @aws-nslick in #621
  • fix(tree): use empty brace initializers for zero-initialization by @aws-nslick in #594
  • fix(build): ensure -pthread is passed by @aws-nslick in #623
  • fix(build): add missing AC_PROG_RANLIB by @aws-nslick in #622
  • fix(ci): prefer ecr to dockerhub by @aws-nslick in #628
  • feat(build): disable semantic interposition by @aws-nslick in #624
  • fix(init): fix sendrecv fallback logic by @aws-nslick in #629
  • fix: rdma: inverted print statement by @aws-nslick in #630
  • rdma: Use get_device_from_ep() accessor by @bwbarrett in #626
  • Combined -Wextra -Werror Commits by @aws-nslick in #627
  • Add platform data settings for TRN2N by @maxtmann in #638
  • tuner: add regions for AllGather/ReduceScatter in the one rank per node case by @AmedeoSapio in #641
  • fix(rdma): send periodic control messages to sync sender/receiver by @rauteric in #640
  • feat(build): add -fanalyzer when --enable-werror by @aws-nslick in #632
  • Add Multiplexed-round-robin scheduler by @arunkarthik-akkart in #604
  • fix : Fix flexible array member allocation by @arunkarthik-akkart in #649
  • Revert "neuron: Disable rdma eager messages by default" by @maxtmann in #650
  • .ci/aws: All CI use ami with EFA Installer by @a-szegel in #648
  • separate out 3rd-party headers by @aws-nslick in #634
  • Add a proper endpoint interface by @bwbarrett in #654
  • feat(ci): add workflow_dispatch to distcheck by @aws-nslick in #658
  • Fix use of uninitialized lock by @bwbarrett in #659
  • aws: Skip the WRITE_IN_ORDER_ALIGNED_128_BYTES check for P5en by @rajachan in #625
  • rdma: remove "request completed with error" message by @rauteric in #660
  • rdma: do local RDMA read on all NIC rails for flush() by @taeilum00 in https://...
Read more

AWS OFI NCCL v1.12.1

25 Oct 05:10
2301579
Compare
Choose a tag to compare

All users of v1.12.0-aws are strongly recommended to take this fix when using EFA Installer >= 1.35.0.

Bug fixes:

  • platform-aws vf sorting code produces significant performance regressions or
    crashes when used atop latest EFA driver releases. This sorting code has been
    reverted and mitigates the problem. (adb47dc)

The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:

  • efa

Digests:

3722e0790b98e65d04f143fe8484fd0a05dcec6419eeac1cdcad5e49f6c7cf8e  aws-ofi-nccl-1.12.1-aws.tar.gz

AWS OFI NCCL v1.11.1

25 Oct 05:10
2db8375
Compare
Choose a tag to compare

All users of v1.11.0-aws are strongly recommended to take this fix when using EFA Installer >= 1.35.0.

Bug fixes:

  • platform-aws vf sorting code produces significant performance regressions or
    crashes when used atop latest EFA driver releases. This sorting code has been
    reverted and mitigates the problem. (84b7cfa)

The plugin has been tested with following libfabric providers using tests bundled in the source code and nccl-tests suite:

  • efa

Digests:

6d95eff619208e30d11044068c3781c1c079b180a683d422ce9f6a96ebeadb80  aws-ofi-nccl-1.11.1-aws.tar.gz