Skip to content

NetCDFWriter scalar indexing error on GPU #4397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xiaozhour opened this issue Apr 14, 2025 · 14 comments · Fixed by #4441
Closed

NetCDFWriter scalar indexing error on GPU #4397

xiaozhour opened this issue Apr 14, 2025 · 14 comments · Fixed by #4441

Comments

@xiaozhour
Copy link
Collaborator

xiaozhour commented Apr 14, 2025

Hi all,

I got a weird error on a simple output line with NetCDFWriter on GPU. Switching to JLD2Writer removes the error. Any insights would be appreciated!

u, v, w = model.velocities
b = model.tracers.b
outputs = (; u, v, w, b)

simulation.output_writers[:fields] = NetCDFWriter(model, outputs;
                                                   filename = "test_drag.nc",
                                                   schedule = TimeInterval(2minutes),
                                                   overwrite_existing = true)

The error massage is the following:
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations do not execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use allowscalar or @allowscalar
to enable scalar iteration globally or for the operations in question.
Stacktrace:

@glwagner
Copy link
Member

Do you mind pasting your whole script if possible? It will help to be able to reproduce the error!

@xiaozhour
Copy link
Collaborator Author

Thanks Greg! The code is attached below. It likely doesn't give any meaningful results - this is just a demonstration to show how the setup works.

using Oceananigans
using Oceananigans.Units
using NCDatasets
using Random
using Printf



# grid configuration
Lx = 600meters
Ly = 150meters
Lz = 50meters
Nx = 64
Ny = 16
Nz = 64


# Creates a grid with near-constant spacing `refinement * Lz / Nz`
# near the bottom:
refinement = 1.8 # controls spacing near surface (higher means finer spaced)
stretching = 10  # controls rate of stretching at bottom

# "Warped" height coordinate
h(k) = (Nz + 1 - k) / Nz

# Linear near-surface generator
ζ(k) = 1 + (h(k) - 1) / refinement

# Bottom-intensified stretching function
Σ(k) = (1 - exp(-stretching * h(k))) / (1 - exp(-stretching))

# Generating function
z_faces(k) = - Lz * (ζ(k) * Σ(k) - 1)

grid = RectilinearGrid(GPU(); size = (Nx, Ny, Nz),
                          x = (0, Lx),
                          y = (0, Ly),
                          z = z_faces)

const N2 = 9e-5
const M2 = 4.24e-7
B∞(x, y, z, t) =  -M2 * x



# surface buoyancy bc
const bᵗ = 4.24e-8 # surfacre cooling
# bottom buoyancy bc
const bᵇ = 0; # zero buoyancy flux 

zero_background_diffusive_flux = GradientBoundaryCondition(bᵇ)
b_bcs = FieldBoundaryConditions(top = FluxBoundaryCondition(bᵗ),
                                bottom = zero_background_diffusive_flux)



# velocity boundary condition free-slip at the surface and no-slip at the bottom
u_bcs = FieldBoundaryConditions(top = FluxBoundaryCondition(nothing),
                               bottom = ValueBoundaryCondition(0.0))
v_bcs = FieldBoundaryConditions(top = FluxBoundaryCondition(nothing),
                               bottom = ValueBoundaryCondition(0.0))



# set up the model 
model = NonhydrostaticModel(; grid, 
                            buoyancy=BuoyancyTracer(),
                            advection = UpwindBiased(order=5),
                            tracers = :b,
                            coriolis = FPlane(f=1e-4),
                            closure = SmagorinskyLilly(),
                            boundary_conditions = (u=u_bcs, v=v_bcs, b=b_bcs),
                            background_fields = (b=B∞,))



noise(x, y, z) = 1e-4 * randn()                           
bᵢ(x, y, z) = N2 * z + noise(x, y, z)
vᵢ(x, y, z) = 0.3 + M2/1e-4 * z # surface velocity is set to be 0.3 m/s and decaying accoridng to thermal wind

set!(model, v=vᵢ, b=bᵢ)

simulation = Simulation(model, Δt=5.0, stop_time=40minutes)

# time step
wizard = TimeStepWizard(cfl=1.0, max_change=1.1, max_Δt=1minute)
simulation.callbacks[:wizard] = Callback(wizard, IterationInterval(10))

# Print a progress message
progress_message(sim) = @printf("Iteration: %04d, time: %s, Δt: %s, max(|w|) = %.1e ms⁻¹, wall time: %s\n",
                                iteration(sim), prettytime(sim), prettytime(sim.Δt),
                                maximum(abs, sim.model.velocities.w), prettytime(sim.run_wall_time))

add_callback!(simulation, progress_message, IterationInterval(20))



# output
u, v, w = model.velocities
b = model.tracers.b


outputs = (; u, v, w, b)



simulation.output_writers[:fields] = NetCDFWriter(model, outputs;
                                                    filename = "bottom_drag.nc",
                                                    schedule = TimeInterval(2minutes),
                                                    overwrite_existing = true)


                                                 


run!(simulation)

@glwagner
Copy link
Member

@xiaozhour do you still get this error if you don't have the background field?

@xiaozhour
Copy link
Collaborator Author

Yep @glwagner

@glwagner
Copy link
Member

@tomchor @ali-ramadhan

@glwagner glwagner changed the title Output error on GPU NetCDFWriter scalar indexing error on GPU Apr 22, 2025
@glwagner
Copy link
Member

Probably same problem over here: #4433

@xiaozhour it might help if you confirm the Oceanangians version and julia version you're using.

@xiaozhour
Copy link
Collaborator Author

Thanks, @glwagner. Oceananigans is v0.96.3 and Julia is 1.10.9.

@tomchor
Copy link
Collaborator

tomchor commented Apr 22, 2025

Not sure it'd make a difference, but can you try running the latest Oceananigans version?

@xiaozhour
Copy link
Collaborator Author

@tomchor Updating to v0.96.21 didn't help

@rafacmartins
Copy link

Hi all,
Just to highlight this here — I was using the latest version of Oceananigans (v0.96.21) and was getting the same error. Then I downgraded to a previous version v0.92.1 — kind of randomly, and it’s working just fine now.
Note that the function name reverted to NetCDFOutputWriter instead of the current NetCDFWriter.

@glwagner
Copy link
Member

It's likely the change from 0.95 to 0.96 that broke this case. If you can confirm it might help. I am somewhat puzzled --- isn't NetCDFWriter tested on GPU?

@tomchor
Copy link
Collaborator

tomchor commented Apr 23, 2025

It's likely the change from 0.95 to 0.96 that broke this case. If you can confirm it might help. I am somewhat puzzled --- isn't NetCDFWriter tested on GPU?

It is:

for arch in archs
@testset "NetCDF output writer [$(typeof(arch))]" begin
@info " Testing NetCDF output writer [$(typeof(arch))]..."
test_datetime_netcdf_output(arch)
test_timedate_netcdf_output(arch)
test_netcdf_grid_metrics_rectilinear(arch, Float64)
test_netcdf_grid_metrics_rectilinear(arch, Float32)
test_netcdf_grid_metrics_latlon(arch, Float64)
test_netcdf_grid_metrics_latlon(arch, Float32)
test_netcdf_rectilinear_grid_fitted_bottom(arch)
test_netcdf_latlon_grid_fitted_bottom(arch)
test_netcdf_rectilinear_flat_xy(arch)
test_netcdf_rectilinear_flat_xz(arch, immersed=false)
test_netcdf_rectilinear_flat_xz(arch, immersed=true)
test_netcdf_rectilinear_flat_yz(arch, immersed=false)
test_netcdf_rectilinear_flat_yz(arch, immersed=true)
test_netcdf_rectilinear_column(arch)
test_thermal_bubble_netcdf_output(arch, Float64)
test_thermal_bubble_netcdf_output(arch, Float32)
test_thermal_bubble_netcdf_output_with_halos(arch, Float64)
test_thermal_bubble_netcdf_output_with_halos(arch, Float32)
test_netcdf_size_file_splitting(arch)
test_netcdf_time_file_splitting(arch)
test_netcdf_function_output(arch)
test_netcdf_output_alignment(arch)
test_netcdf_spatial_average(arch)
test_netcdf_time_averaging(arch)
test_netcdf_output_just_particles(arch)
test_netcdf_output_particles_and_fields(arch)
test_netcdf_vertically_stretched_grid_output(arch)
test_netcdf_overriding_attributes(arch)
test_netcdf_free_surface_only_output(arch)
test_netcdf_free_surface_mixed_output(arch)
test_netcdf_buoyancy_force(arch)
end
end

@tomchor
Copy link
Collaborator

tomchor commented Apr 23, 2025

It has to do with the stretched dimension. Probably when writing the grid metrics somewhere. Here's a MWE for the record:

using Oceananigans
using NCDatasets
L = 1
N = 4
grid = RectilinearGrid(GPU(); size = (N, N, N),
                       x = (0, L),
                       y = (0, L),
                       z = 1:N+1)
model = NonhydrostaticModel(; grid)

simulation = Simulation(model, Δt=1, stop_time=1)
NetCDFWriter(model, model.velocities;
             filename = "mwe",
             schedule = TimeInterval(1),          
             overwrite_existing = true)           

@tomchor
Copy link
Collaborator

tomchor commented Apr 23, 2025

@xiaozhour as a side note, whenever you encounter an issue like this, it's best to keep removing things from your code until you reach a minimal working example (MWE, or more correctly called MRE). If you do that, generally one of two things happen. You either:

  1. End up with a very small piece of code that you can post, which will make it easier for us to help you, or
  2. you end up finding the culprit of the error in the process, in which case you can just fix your script or, if it's a bug, you can immediately contribute with a solution :)

I think for me I end up finding out I need to fix my own code more often than not 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants