Open
Description
Description
Currently Watcher fast fails if there are any problems hitting an upstream webhook, ending the processing of the entire watch.
We should introduce a new flag to the webhook action options to enable a fixed number of retries, with exponential or random backoff, that allow the webhook to be re-sent if the upstream error is recoverable (HTTP 500's, TCP RSTs etc)
We would need to decide on which errors to handle and if we should use a blacklist approach of explicitly defining which not to retry or a whitelist of which should be allowed to retry.
This addresses an issue some users have seen with intermittent networking faults preventing watcher alerts being sent to some cloud services such as Slack and Teams.