Skip to content

Run preflights in manager UI / API #2234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Jun 6, 2025
Merged

Conversation

sgalsaleh
Copy link
Member

@sgalsaleh sgalsaleh commented Jun 3, 2025

What this PR does / why we need it:

Added ability to run preflight checks in the manager UI/API, including status monitoring, error handling, and rerun capabilities.

Which issue(s) this PR fixes:

SC-124343
SC-124239
SC-124379
SC-124345

Does this PR require a test?

Yes

Does this PR require a release note?

NONE

Does this PR require documentation?

NONE

@sgalsaleh sgalsaleh force-pushed the run-preflights-in-manager-ui branch from ad9ed44 to 3825c4e Compare June 3, 2025 18:55
Copy link

github-actions bot commented Jun 4, 2025

This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID.

Online Installer:

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci/appver-dev-ff2d783" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Airgap Installer (may take a few minutes before the airgap bundle is built):

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci-airgap/appver-dev-ff2d783?airgap=true" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Happy debugging!

miaawong added 4 commits June 4, 2025 15:19
handle preflights error
add rerun
add start install btn
@sgalsaleh sgalsaleh marked this pull request as ready for review June 5, 2025 00:12
@sgalsaleh sgalsaleh requested review from emosbaugh and JGAntunes June 5, 2025 00:12
@sgalsaleh sgalsaleh changed the title Run preflights in manager UI Run preflights in manager UI / API Jun 5, 2025
Copy link
Member

@JGAntunes JGAntunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished reviewing but submitting what I have right now.

Comment on lines +166 to +174
installController, err := install.NewInstallController(
install.WithRuntimeConfig(api.rc),
install.WithLogger(api.logger),
install.WithHostUtils(api.hostUtils),
install.WithMetricsReporter(api.metricsReporter),
install.WithReleaseData(api.releaseData),
install.WithLicenseFile(api.licenseFile),
install.WithAirgapBundle(api.airgapBundle),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely review some of these struct properties to understand if they need to be pointers to structs and we're making that choice deliberately or not. Right now I'm worried that we receive pointers to structs that we then pass to the controllers (and maybe these can even end up being used by things such as managers). If some end up unintentionally mutating state we can be in trouble trying to debug a particularly airy bug.

a.logError(r, err, "failed to set installation config")
a.jsonError(w, r, err)
return
}

a.getInstall(w, r)
a.getInstallInstallationStatus(w, r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be returning just the InstallationStatus? Maybe it's ok for now but I'm wondering if it would be worthwile sending the config too. Maybe it's the same object entirely so no point in that I guess? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the client need the config if it just sent it to configure the installation? i would assume it should get the status as a confirmation that it's "running" now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up mutating the config in some way and we want to let the client know (e.g. this doesn't really apply here because the client probably doesn't need to know about it but imagine something like pod-cidr and service-cidr which is calculated by the API). But we can definitely cross that bridge if we ever get there.

Comment on lines +17 to +23
func (m *installationManager) isRunning() (bool, error) {
status, err := m.GetStatus()
if err != nil {
return false, err
}
return status.State == types.StateRunning, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will be a problem right now (and maybe it will never be?), but I wonder if we'll ever need to make this an atomic operation at the store level to protect us from potential problems in the future? Something like m.installationStore.IsStatus(types.StateRunning)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i might agree that it should ideally be atomic, but probably not at the store level. IMO, the store shouldn't have such logic and should simply be an interface for CRUD operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked it over and we both agreed it makes sense to do this at the manager level 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.logError(r, err, "failed to set installation config")
a.jsonError(w, r, err)
return
}

a.getInstall(w, r)
a.getInstallInstallationStatus(w, r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up mutating the config in some way and we want to let the client know (e.g. this doesn't really apply here because the client probably doesn't need to know about it but imagine something like pod-cidr and service-cidr which is calculated by the API). But we can definitely cross that bridge if we ever get there.

Comment on lines +16 to +18
if preflightStatus.State != types.StateFailed && preflightStatus.State != types.StateSucceeded {
return fmt.Errorf("host preflight checks did not complete")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think we should have something within the Status type, something like isFinal that we can call and that would return a bool telling us if the given state is final or not. It's more scoped and we can keep it all under one place.

Comment on lines +20 to +33
if preflightStatus.State == types.StateFailed && c.metricsReporter != nil {
preflightOutput, err := c.GetHostPreflightOutput(ctx)
if err != nil {
return fmt.Errorf("get install host preflight output: %w", err)
}
if preflightOutput != nil {
c.metricsReporter.ReportPreflightsFailed(ctx, preflightOutput)
}
}

// TODO: implement node setup

return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked it over with @sgalsaleh we should gate this at the API level, requiring some property to be sent over to the API.

Comment on lines +45 to +48
// update the runtime config
c.rc.SetDataDir(config.DataDirectory)
c.rc.SetLocalArtifactMirrorPort(config.LocalArtifactMirrorPort)
c.rc.SetAdminConsolePort(config.AdminConsolePort)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked it over with @sgalsaleh and we agreed that ideally we would only need the runtime config towards the end of the installation flow. We even debated about the installation config being the runtime config essentially. However a good milestone to have would be to aim to entirely work with the installation config initially and towards the end of the flow map it out to a runtime config (this would already be helpful).

For now though:

but maybe we add a UpdateFromInstallationConfig method to the runtimeconfig interface

This would already be super helpful since it would mean we have a single place we can track back in our code where we're mutating the runtime config.

}

func (c *InstallController) GetStatus(ctx context.Context) (*types.Status, error) {
return c.install.Status, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked it over and maybe something to look for in the future would be to have generic stores (disk and memory) that we can pass over to each of the individual stores (likely using generics).

Comment on lines +17 to +23
func (m *installationManager) isRunning() (bool, error) {
status, err := m.GetStatus()
if err != nil {
return false, err
}
return status.State == types.StateRunning, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked it over and we both agreed it makes sense to do this at the manager level 👍

return fmt.Errorf("set running status: %w", err)
}

go m.configureForInstall(context.Background(), config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for your reasoning for background here so that it is not mistakenly changed in the future?

Comment on lines +77 to +82
func (s *MemoryStore) IsRunning() bool {
s.mu.RLock()
defer s.mu.RUnlock()

return s.hostPreflight.Status.State == types.StateRunning
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should follow a similar pattern to what we described here and have this be a mutexed method at the manager level - #2234 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +3 to +7
// HostPreflight represents the host preflight checks state
type HostPreflight struct {
Titles []string `json:"titles"`
Output *HostPreflightOutput `json:"output"`
Status *Status `json:"status"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be HostPreflights instead? E.g. the response is InstallHostPreflightsStatusResponse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if hostname != "" {
return fmt.Sprintf("https://%s:%v", hostname, port)
}
ipaddr := runtimeconfig.TryDiscoverPublicIP()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a "now problem" but I was looking through this method the other day, is there any reason why this lives in the runtimeconfig package currently? Seems totally unrelated unless I'm missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason, was thinking the same. we should move it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@JGAntunes JGAntunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm. There's some feedback that remains to be addressed but given this PR is blocking a lot of work it's something that we can create follow up PRs/tickets to do.

@sgalsaleh
Copy link
Member Author

Merging. Will address feedback in follow up PRs. Also, the failing test is known to be flaky.

@sgalsaleh sgalsaleh merged commit 9bf21ed into main Jun 6, 2025
245 of 252 checks passed
@sgalsaleh sgalsaleh deleted the run-preflights-in-manager-ui branch June 6, 2025 14:04
@sgalsaleh sgalsaleh restored the run-preflights-in-manager-ui branch June 6, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants