Skip to content

Add Improved Error Recovery Options To Watcher Webhook Actions #128498

@lukewhiting

Description

@lukewhiting

Description

Currently Watcher fast fails if there are any problems hitting an upstream webhook, ending the processing of the entire watch.

We should introduce a new flag to the webhook action options to enable a fixed number of retries, with exponential or random backoff, that allow the webhook to be re-sent if the upstream error is recoverable (HTTP 500's, TCP RSTs etc)

We would need to decide on which errors to handle and if we should use a blacklist approach of explicitly defining which not to retry or a whitelist of which should be allowed to retry.

This addresses an issue some users have seen with intermittent networking faults preventing watcher alerts being sent to some cloud services such as Slack and Teams.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions