-
Notifications
You must be signed in to change notification settings - Fork 5
Run preflights in manager UI / API #2234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ers' into run-preflights-in-manager-ui
ad9ed44
to
3825c4e
Compare
This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID. Online Installer:
Airgap Installer (may take a few minutes before the airgap bundle is built):
Happy debugging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't finished reviewing but submitting what I have right now.
installController, err := install.NewInstallController( | ||
install.WithRuntimeConfig(api.rc), | ||
install.WithLogger(api.logger), | ||
install.WithHostUtils(api.hostUtils), | ||
install.WithMetricsReporter(api.metricsReporter), | ||
install.WithReleaseData(api.releaseData), | ||
install.WithLicenseFile(api.licenseFile), | ||
install.WithAirgapBundle(api.airgapBundle), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely review some of these struct properties to understand if they need to be pointers to structs and we're making that choice deliberately or not. Right now I'm worried that we receive pointers to structs that we then pass to the controllers (and maybe these can even end up being used by things such as managers). If some end up unintentionally mutating state we can be in trouble trying to debug a particularly airy bug.
a.logError(r, err, "failed to set installation config") | ||
a.jsonError(w, r, err) | ||
return | ||
} | ||
|
||
a.getInstall(w, r) | ||
a.getInstallInstallationStatus(w, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be returning just the InstallationStatus
? Maybe it's ok for now but I'm wondering if it would be worthwile sending the config too. Maybe it's the same object entirely so no point in that I guess? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would the client need the config if it just sent it to configure the installation? i would assume it should get the status as a confirmation that it's "running" now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up mutating the config in some way and we want to let the client know (e.g. this doesn't really apply here because the client probably doesn't need to know about it but imagine something like pod-cidr
and service-cidr
which is calculated by the API). But we can definitely cross that bridge if we ever get there.
func (m *installationManager) isRunning() (bool, error) { | ||
status, err := m.GetStatus() | ||
if err != nil { | ||
return false, err | ||
} | ||
return status.State == types.StateRunning, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will be a problem right now (and maybe it will never be?), but I wonder if we'll ever need to make this an atomic operation at the store level to protect us from potential problems in the future? Something like m.installationStore.IsStatus(types.StateRunning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i might agree that it should ideally be atomic, but probably not at the store level. IMO, the store shouldn't have such logic and should simply be an interface for CRUD operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked it over and we both agreed it makes sense to do this at the manager level 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up ticket - https://app.shortcut.com/replicated/story/124680
a.logError(r, err, "failed to set installation config") | ||
a.jsonError(w, r, err) | ||
return | ||
} | ||
|
||
a.getInstall(w, r) | ||
a.getInstallInstallationStatus(w, r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up mutating the config in some way and we want to let the client know (e.g. this doesn't really apply here because the client probably doesn't need to know about it but imagine something like pod-cidr
and service-cidr
which is calculated by the API). But we can definitely cross that bridge if we ever get there.
if preflightStatus.State != types.StateFailed && preflightStatus.State != types.StateSucceeded { | ||
return fmt.Errorf("host preflight checks did not complete") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think we should have something within the Status
type, something like isFinal
that we can call and that would return a bool telling us if the given state is final or not. It's more scoped and we can keep it all under one place.
if preflightStatus.State == types.StateFailed && c.metricsReporter != nil { | ||
preflightOutput, err := c.GetHostPreflightOutput(ctx) | ||
if err != nil { | ||
return fmt.Errorf("get install host preflight output: %w", err) | ||
} | ||
if preflightOutput != nil { | ||
c.metricsReporter.ReportPreflightsFailed(ctx, preflightOutput) | ||
} | ||
} | ||
|
||
// TODO: implement node setup | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked it over with @sgalsaleh we should gate this at the API level, requiring some property to be sent over to the API.
// update the runtime config | ||
c.rc.SetDataDir(config.DataDirectory) | ||
c.rc.SetLocalArtifactMirrorPort(config.LocalArtifactMirrorPort) | ||
c.rc.SetAdminConsolePort(config.AdminConsolePort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked it over with @sgalsaleh and we agreed that ideally we would only need the runtime config towards the end of the installation flow. We even debated about the installation config being the runtime config essentially. However a good milestone to have would be to aim to entirely work with the installation config initially and towards the end of the flow map it out to a runtime config (this would already be helpful).
For now though:
but maybe we add a UpdateFromInstallationConfig method to the runtimeconfig interface
This would already be super helpful since it would mean we have a single place we can track back in our code where we're mutating the runtime config.
} | ||
|
||
func (c *InstallController) GetStatus(ctx context.Context) (*types.Status, error) { | ||
return c.install.Status, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked it over and maybe something to look for in the future would be to have generic stores (disk and memory) that we can pass over to each of the individual stores (likely using generics).
func (m *installationManager) isRunning() (bool, error) { | ||
status, err := m.GetStatus() | ||
if err != nil { | ||
return false, err | ||
} | ||
return status.State == types.StateRunning, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked it over and we both agreed it makes sense to do this at the manager level 👍
return fmt.Errorf("set running status: %w", err) | ||
} | ||
|
||
go m.configureForInstall(context.Background(), config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment for your reasoning for background here so that it is not mistakenly changed in the future?
func (s *MemoryStore) IsRunning() bool { | ||
s.mu.RLock() | ||
defer s.mu.RUnlock() | ||
|
||
return s.hostPreflight.Status.State == types.StateRunning | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should follow a similar pattern to what we described here and have this be a mutexed method at the manager level - #2234 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up ticket - https://app.shortcut.com/replicated/story/124680
// HostPreflight represents the host preflight checks state | ||
type HostPreflight struct { | ||
Titles []string `json:"titles"` | ||
Output *HostPreflightOutput `json:"output"` | ||
Status *Status `json:"status"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be HostPreflights
instead? E.g. the response is InstallHostPreflightsStatusResponse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up ticket created - https://app.shortcut.com/replicated/story/124682/rename-hostpreflight-to-hostpreflights
if hostname != "" { | ||
return fmt.Sprintf("https://%s:%v", hostname, port) | ||
} | ||
ipaddr := runtimeconfig.TryDiscoverPublicIP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a "now problem" but I was looking through this method the other day, is there any reason why this lives in the runtimeconfig
package currently? Seems totally unrelated unless I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason, was thinking the same. we should move it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created follow up ticket -https://app.shortcut.com/replicated/story/124684
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm. There's some feedback that remains to be addressed but given this PR is blocking a lot of work it's something that we can create follow up PRs/tickets to do.
Merging. Will address feedback in follow up PRs. Also, the failing test is known to be flaky. |
What this PR does / why we need it:
Added ability to run preflight checks in the manager UI/API, including status monitoring, error handling, and rerun capabilities.
Which issue(s) this PR fixes:
SC-124343
SC-124239
SC-124379
SC-124345
Does this PR require a test?
Yes
Does this PR require a release note?
Does this PR require documentation?
NONE