-
Notifications
You must be signed in to change notification settings - Fork 68
Refactor endpoint types to use C++ inheritance #851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor endpoint types to use C++ inheritance #851
Conversation
b152394
to
f47456c
Compare
bbd2e8e
to
f47456c
Compare
I made a mistake by undesirably sync-ing my feature branch with master on Github, which then pushed to the PR. I've force pushed the same version of the PR that had been reviewed, and I will push again once I've actually updated the PR to address the received feedback. |
bot:aws:retest |
2 similar comments
bot:aws:retest |
bot:aws:retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bunch of whitespace inconsistencies and things like that which need cleaning up, but generally this seems to be the right next step.
f47456c
to
2140b2e
Compare
The latest force-push to the PR both addresses feedback from @bwbarrett, and refactors all free functions associated with the SENDRECV and RDMA transport endpoint types to be member functions. I also updated the RDMA endpoint's control and data rail arrays and deque from being member pointers to being member instances (if this change isn't relevant enough I can punt it to a later PR). Some call-outs for the draft PR in its current state:
|
2140b2e
to
b11b94c
Compare
b11b94c
to
3a70bbb
Compare
cb10c71
to
ffcffd3
Compare
Previous revision testing failed with:
Should be fixed in latest force push, along with some comment fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it through first commit; waypoint comments posted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got through the rest of the PR. Added a few more minor comments.
ffcffd3
to
f18a1e8
Compare
Replaces the C-style inheritance of having a base endpoint instance as the first member of the transport endpoint types with the C++ approach of making nccl_net_ofi_ep_t a parent class with virtual functions that the transport endpoint types override. This also refactors the allocation and free functions for the base and transport endpoint types to be constructors and destructors. Signed-off-by: Aviv Benchorin <[email protected]>
f18a1e8
to
b9e362a
Compare
Refactors all stand-alone functions that should belong to the RDMA endpoint type to become member functions, including functions associated with RDMA endpoint rails. Helper functions for the RDMA endpoint constructor and destructor were kept as private functions. Initializing and cleaning up the RDMA endpoint rails has been kept as a part of the endpoint constructor and destructor rather than moved to a separate constructor and destructor for the endpoint rail. This commit does not refactor the `rx_buff_req_alloc` function pointer or free functions associated with it, this refactor should wait until a later PR that more broadly refactors the RDMA request types. Signed-off-by: Aviv Benchorin <[email protected]>
Refactors all stand-alone functions that should belong to the SENDRECV transport endpoint type to become member functions. Signed-off-by: Aviv Benchorin <[email protected]>
Previously, the RDMA transport endpoint had endpoint rail pointers to an array of control rails and an array of data rails, which required manually allocating and freeing the memory associated with the arrays. By replacing the endpoint rail pointers with std::vector instances, the RDMA endpoint contructor and destructor can automatically manage the rails' memory. Signed-off-by: Aviv Benchorin <[email protected]>
Previously, the RDMA transport endpoint had a std::deque pointer which required manually allocating and freeing the memory associated with the deque. By replacing the std::deque pointer with an std::deque instance, the RDMA endpoint constructor and destructor can automatically manage the deque's memory. Signed-off-by: Aviv Benchorin <[email protected]>
b9e362a
to
88219d0
Compare
After comparing my changes against the C++ code already added to the codebase, I see I've been using the |
Ha, I actually mildly preferred the |
I don't think this is a place where we should be hyper pedantic. In something like the environ code, where we're talking about 4-5 variables, I think explicitly using |
I remember more than 20 years ago, "m_" prefix was widely used for class member variables, is it now less preferred? |
Since I have approvals, I will resolve comments and merge this PR. I can address readability/formatting improvement in a fast-follow PR. |
Draft PR for a first iteration of using proper C++ inheritance in the OFI NCCL plugin.
Replaces the C-style inheritance of having a base endpoint instance as the first member of the transport endpoint types with the C++ approach of making nccl_net_ofi_ep_t a parent class with virtual functions that the transport endpoint types override. This also refactors the allocation and free functions for the base and transport endpoint types to be constructors and destructors.
Some changes this draft PR does not yet include:
nccl_net_ofi_rdma_ep_t
's constructor calls helper functionsinit_rail_ofi_resources
andinit_rx_buffers
, which are currently free functions innccl_ofi_rdma.cpp
and should be private member functions ofnccl_net_ofi_rdma_ep_t
nccl_net_ofi_ep_rail_t
is currently initialized and torn-down using free functionsep_rail_init
andep_rail_release
, which could be turned into the rail's constructor and destructor respectivelyBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.