Skip to content

[phase closure] sum closure memory opt [ci skip] #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 3, 2021

Conversation

basaks
Copy link
Contributor

@basaks basaks commented Jun 1, 2021

memory optimisations during sum closure

@basaks basaks requested review from mcgarth and richardt94 June 1, 2021 04:06
log.debug(f"shape of ifgs_breach_count is {ifgs_breach_count.shape}")
log.debug(f"dtype of ifgs_breach_count is {ifgs_breach_count.dtype}")

mpiops.comm.Reduce(ifgs_breach_count, ifgs_breach_count, op=MPI.SUM, root=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit which changes this to a capital R reduce. This fixes further issues that lower-case reduce was having with creating a 13 GB pickle object to communicate its part of the ifgs_breach_count_arr back to the root node. Using Reduce is safe because while mpiops.array_split may result in different numbers of loops being sent to each process, the accumulated breach count arrays are all numpy arrays of the same size (height of ifg, width of ifg, # of ifgs) so we can use preallocated buffers for our reduction operation (and the sum over axis 0 is default behaviour of MPI.SUM so we don't need to mess with a custom sum0_op). Still a couple of things to change before we can merge this:

  1. I'm not sure if it's safe to use the same buffer for sending and receiving for the reduction - if not we'll need to create an empty array of zeros
  2. This commit will break the non-MPI option, because I had to directly import mpi4py in sum_closure.py to get MPI.SUM. Should be easy to fix this by exporting MPI.SUM out of mpiops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good pick up on the Reduce use here Richard. Thank you.

@basaks basaks force-pushed the sb/sum-closure-memory-opt branch from 5e974c1 to 2f287a2 Compare June 2, 2021 23:47
@basaks basaks force-pushed the sb/sum-closure-memory-opt branch from 2f287a2 to 7418e9a Compare June 3, 2021 00:41
@richardt94
Copy link
Contributor

Thanks Sudipta for your work debugging this PR and fixing up the buffer-based Reduce method. As far as I can tell it is now fully functional, so I'm happy to merge it now - might be worth getting @mcgarth 's opinion first

@basaks basaks changed the title Sb/sum closure memory opt [phase closure] sum closure memory opt [ci skip] Jun 3, 2021
@mcgarth mcgarth merged commit 1c5df3e into develop Jun 3, 2021
@mcgarth mcgarth deleted the sb/sum-closure-memory-opt branch June 3, 2021 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants