Skip to content

Add the ability to disable specific (or all) CDI hooks when generating a CDI specification #1077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 3, 2025

Conversation

ArangoGutierrez
Copy link
Collaborator

@ArangoGutierrez ArangoGutierrez commented May 12, 2025

This change adds the ability to disable specific (or all) CDI hooks when generating a CDI specification. This is supported in the nvidia-ctk cdi generate command through a --disable-hook command line argument (that can be repeated):

nvidia-ctk cdi generate \
    --disable-hook=enable-cuda-compat \
    --disable-hook=update-ldcache

Specifying:

nvidia-ctk cdi generate \
    --disable-hooks=all

(or --disable-hook=all) will generate a CDI specification with NO NVIDIA-specific CDI hooks.

Note that when using such as spec, the resultant container may not function in the same way as a regular container due to the ldcache not being updated, for example.

When using the nvcdi API, the functional option WithDisabledHook("enable-cuda-compat") or WithDisabledHook("all") can be used. (Constants were added for supported hooks).

Fixes: #1074

@ArangoGutierrez ArangoGutierrez requested review from elezar and Copilot May 12, 2025 16:53
@ArangoGutierrez ArangoGutierrez self-assigned this May 12, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new CLI flag (--skip-hooks) to allow users to opt out of specific hooks when generating the CDI specification.

  • Updated tests to include a skip-hook option.
  • Modified the command initialization and spec generation logic to support disabling hooks based on the provided flag.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cmd/nvidia-ctk/cdi/generate/generate_test.go Added a test case to validate the skip-hook functionality and updated expected spec/options accordingly.
cmd/nvidia-ctk/cdi/generate/generate.go Introduced a new CLI flag and added logic to disable hooks based on user input.
Comments suppressed due to low confidence (1)

cmd/nvidia-ctk/cdi/generate/generate_test.go:40

  • [nitpick] The variable name 'skipdHook' may be a typo and could be renamed to 'skipHook' for consistency and clarity.
skipdHook := cli.NewStringSlice("enable-cuda-compat")

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a new flag (--skip-hooks) to the nvidia-ctk cdi generate command so that users can opt out of specific hooks when creating the CDI spec.

  • In generate.go, a new skipHook option and corresponding CLI flag are introduced and processed when creating CDI library options.
  • In generate_test.go, a new test case verifies the behavior when a non-default hook is skipped.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cmd/nvidia-ctk/cdi/generate/generate_test.go Added test case for --skip-hooks flag and its effect.
cmd/nvidia-ctk/cdi/generate/generate.go Introduced skipHook flag and integrated it into initialization options.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new flag --skip-hooks to the cdi generate command to allow users to disable specific hooks while generating the CDI specification.

  • Introduces a new CLI flag in generate.go to capture the hooks to skip
  • Updates the options struct and test cases in generate_test.go to include and validate the new skipHook functionality

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cmd/nvidia-ctk/cdi/generate/generate_test.go Adds a test case for skipHook using a new CLI string slice
cmd/nvidia-ctk/cdi/generate/generate.go Registers the new --skip-hook flag and passes the provided hooks to disable options during CDI spec generation
Comments suppressed due to low confidence (1)

cmd/nvidia-ctk/cdi/generate/generate_test.go:40

  • [nitpick] Consider adding a test case to verify that when a hook name is provided via skipHook, the generated CDI specification omits or disables that hook, ensuring the new flag's functionality is fully verified.
skipHook := cli.NewStringSlice("enable-cuda-compat")

@@ -176,6 +177,12 @@ func (m command) build() *cli.Command {
Usage: "Specify a pattern the CSV mount specifications.",
Destination: &opts.csv.ignorePatterns,
},
&cli.StringSliceFlag{
Name: "skip-hook",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about disable-hook (we could even add an alias as disable-hooks) instead? Thinking ahead, it may be useful to allow users to do something like:

nvidia-ctk cdi generate \
    --disable-hook=update-ldcache \
    --disable-hook=enable-cuda-compat \
    --mode=nvml

Even further forward, more conservative users may want to consider:

nvidia-ctk cdi generate --disable-hooks=all

although that is out of scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klueska @jgehrcke what are your thoughts on the UX?

@ArangoGutierrez ArangoGutierrez changed the title Add --skip-hooks flag to cdi generate command Add --disable-hook flag to cdi generate command May 14, 2025
@ArangoGutierrez ArangoGutierrez requested review from elezar and Copilot May 14, 2025 10:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a new flag that allows users to disable specific hooks during the CDI spec generation. Key changes include:

  • Adding conditional logic to check for supported hooks in the NVML driver library.
  • Introducing the new CLI flag (and corresponding option) to allow disabling hooks.
  • Updating tests to reflect the new --disable-hook(s) functionality.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/nvcdi/driver-nvml.go Wraps hook-based discoverer creation in conditionals based on hook support.
cmd/nvidia-ctk/cdi/generate/generate_test.go Adds test cases and sample disable-hook values in accordance with the new functionality.
cmd/nvidia-ctk/cdi/generate/generate.go Introduces a new flag and integrates hook disabling into the initialization options.
Comments suppressed due to low confidence (1)

cmd/nvidia-ctk/cdi/generate/generate.go:181

  • [nitpick] Consider renaming the flag from "disable-hook" to "disable-hooks" to match the PR title and description for consistency.
Name:        "disable-hook",

@ArangoGutierrez ArangoGutierrez force-pushed the 1074 branch 2 times, most recently from 64375df to b32d8a9 Compare May 14, 2025 10:37
pkg/nvcdi/api.go Outdated
)

// NewHookName takes a string and returns a []HookName, empty if the HookName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place to implement this. we have a pkg/nvcdi/hooks.go file where this can go instead.

Also, in terms of the implementation, does adding special handling to the disabledHooks type for the all hook not make this cleaner?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved

Comment on lines 288 to 293
if len(opts.disableHooks.Value()) > 0 {
for _, hook := range opts.disableHooks.Value() {
for _, hookName := range nvcdi.NewHookName(hook) {
initOpts = append(initOpts, nvcdi.WithDisabledHook(hookName))
}
}
}
Copy link
Member

@elezar elezar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The len check is not required. One can always iterate over an empty slice.

Suggested change
if len(opts.disableHooks.Value()) > 0 {
for _, hook := range opts.disableHooks.Value() {
for _, hookName := range nvcdi.NewHookName(hook) {
initOpts = append(initOpts, nvcdi.WithDisabledHook(hookName))
}
}
}
for _, hook := range opts.disableHooks.Value() {
initOpts = append(initOpts, nvcdi.WithDisabledHook(hook))
}

(note that I have updated WithDisabledHook to:

// WithDisabledHook allows specific hooks to the disabled.
// This option can be specified multiple times for each hook.
func WithDisabledHook[T string | HookName](hook T) Option {
	return func(o *nvcdilib) {
		if o.disabledHooks == nil {
			o.disabledHooks = make(map[HookName]bool)
		}
		o.disabledHooks[HookName(hook)] = true
	}
}

to allow it to accept both string and HookName arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion taken

@@ -57,6 +57,7 @@ type options struct {

configSearchPaths cli.StringSlice
librarySearchPaths cli.StringSlice
disableHooks cli.StringSlice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disableHooks cli.StringSlice
disabledHooks cli.StringSlice

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

l.nvidiaCDIHookPath,
)
discoverers = append(discoverers, driverDotSoSymlinksDiscoverer)
if l.HookIsSupported(HookCreateSymlinks) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this hook specifically (and the update-ldcache hook too), there are many places in the code where we are injecting this hook. As such this is not going to be sufficient to prevent their injection. This was why I suggested that handling these hooks is out of scope for this PR.

}

func NewHookCreator(nvidiaCDIHookPath string) HookCreator {
func NewHookCreator(nvidiaCDIHookPath string, disabledHooks DisabledHooks) HookCreator {
Copy link
Member

@elezar elezar May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using:

Suggested change
func NewHookCreator(nvidiaCDIHookPath string, disabledHooks DisabledHooks) HookCreator {
func NewHookCreator[T string|HookName](nvidiaCDIHookPath string, disabledHooks ...T) HookCreator {

The fact that we're using a map should be internal to this pacakge and not exposed through the interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead I used the options route, let me know what you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have added a commit on top that also sets the hook path as an option for consistency.

Comment on lines 52 to 58
func (d DisabledHooks) Set(value HookName) {
if value == "all" {
for _, hook := range AllHooks {
d[hook] = true
}
return
}

d[value] = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, we can add an isDisabled(HookName) function (something like we had before in IsHookSupported that has been removed). We can impliment it as follows:

// HookIsSupported checks whether a hook of the specified name is supported.
// Hooks must be explicitly disabled, meaning that if no disabled hooks are
// all hooks are supported.
func (d disabledHooks) isDisabled(h HookName) bool {
	if d["all"] {
		return true
	}
	return d[h]
} 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func removed, it was not needed by moving to a options implementation

@ArangoGutierrez ArangoGutierrez requested a review from elezar May 23, 2025 09:08
@ArangoGutierrez ArangoGutierrez force-pushed the 1074 branch 2 times, most recently from 9aa88ea to f72732e Compare May 23, 2025 09:12
@@ -270,7 +270,7 @@ func (m command) generateSpec(opts *options) (spec.Interface, error) {
deviceNamers = append(deviceNamers, deviceNamer)
}

initOpts := []nvcdi.Option{
cdiOptions := []nvcdi.Option{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: initOptions was too generic.

@@ -127,7 +124,7 @@ containerEdits:
vendor: "example.com",
class: "device",
driverRoot: driverRoot,
disabledHooks: *disableHook1,
disabledHooks: valueOf(cli.NewStringSlice("enable-cuda-compat")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of arbitrary names for the variables made the intent of the tests more difficult to understand. A valueOf helper allows us to return the value of a string slice and list the disabled hooks in the tests.

// A HookName represents a supported CDI hooks.
type HookName string

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the following changes:

  • moved the constants earlier in the file as this is common practice in golang
  • renamed the types *Hook instead of Hook*. This allows the code at the callsite to read more naturally. It also seems more idiomatic (https://google.github.io/styleguide/go/decisions.html#constant-names).
  • added an AllHooks constant.
  • added a deprecated ChmodHook constant.

// cache inside the directory path to be mounted into a container.
HookUpdateLDCache = HookName("update-ldcache")
)
type cdiHookCreator struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the type private and renamed it for clarity.

type HookName string

// disabledHooks allows individual hooks to be disabled.
type disabledHooks map[HookName]bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type was not used since we were not attaching methods to it.

HookEnableCudaCompat,
HookCreateSymlinks,
HookUpdateLDCache,
fixedArgs []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking the path on every Create invocation, we check it once at construction.

}

type Option func(*CDIHook)
// An allDisabledHookCreator is a HookCreator that does not create any hooks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking disabledHooks on every call to Create, we instantiate a type that creates no hooks. (This is the null object pattern).

}

// isDisabled checks if the specified hook name is disabled.
func (c cdiHookCreator) isDisabled(name HookName, args ...string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the isDisabled check to a method and made this dependent on the args as well. This allows for the create-symlinks and chmod hooks to be handled more cleanly.

return append(c.fixedArgs, string(name))
}

func (c cdiHookCreator) transformArgs(name HookName, args ...string) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a function to handle the per-hook transforms.

}

// isDisabled checks if the specified hook name is disabled.
func (c cdiHookCreator) isDisabled(name HookName, args ...string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the isDisabled check to a method and made this dependent on the args as well. This allows for the create-symlinks and chmod hooks to be handled more cleanly.

@@ -113,7 +113,7 @@ func TestWithWithDriverDotSoSymlinks(t *testing.T) {
expectedHooks: []Hook{
{
Lifecycle: "createContainer",
Path: "/path/to/nvidia-cdi-hook",
Path: "/usr/bin/nvidia-cdi-hook",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a specific path here wasn't adding value. Changed the tests to check the default.

@elezar elezar force-pushed the 1074 branch 3 times, most recently from c6428c1 to caf855b Compare June 3, 2025 04:43
@elezar elezar changed the title Add --disable-hook flag to cdi generate command Add the ability to disable specific (or all) CDI hooks Jun 3, 2025
@elezar elezar changed the title Add the ability to disable specific (or all) CDI hooks Add the ability to disable specific (or all) CDI hooks when generating a CDI specification Jun 3, 2025
This change adds the ability to disabled specific (or all) CDI hooks to
both the nvidia-ctk cdi generate command and the nvcdi API.

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar merged commit d59fd3d into NVIDIA:main Jun 3, 2025
13 checks passed
@ArangoGutierrez ArangoGutierrez added this to the v1.18.0 milestone Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a flag to nvidia-ctk cdi generate to allow users to specificy which Hooks to be skipped
3 participants