Skip to content

Datadog Monitor for SLO fails to update when SLO parameters change #3022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eimjustas opened this issue May 22, 2025 · 0 comments
Open

Datadog Monitor for SLO fails to update when SLO parameters change #3022

eimjustas opened this issue May 22, 2025 · 0 comments
Labels

Comments

@eimjustas
Copy link

Datadog Terraform Provider Version

3.40.0

Terraform Version

1.5.7

What resources or data sources are affected?

datadog_service_level_objective
datadog_monitor (specifically SLO alert type)

Terraform Configuration Files

# SLO Resource
resource "datadog_service_level_objective" "metric_based_slo" {
  name        = "Example SLO"
  type        = "metric"
  
  query {
    numerator   = "count:trace.http.request{status:200}.as_count()"
    denominator = "count:trace.http.request{}.as_count()"
  }

  thresholds {
    timeframe = var.timeframe  # Changed from "7d" to "30d"
    target    = var.target     # Changed from 99.9 to 99.95
  }

  tags = ["service:example"]
}

# Monitor Resource
module "datadog_slo_alert" {
  depends_on = [datadog_service_level_objective.metric_based_slo]
  source     = "../datadog_slo_alert"
  slo_id     = datadog_service_level_objective.metric_based_slo.id
  timeframe  = var.timeframe
  slo_target = var.target
}

# In the datadog_slo_alert module:
resource "datadog_monitor" "slo_alert" {
  name    = "SLO Alert - ${var.slo_name}"
  type    = "slo alert"
  
  query = format(
    "burn_rate(\"%s\").over(\"%s\").long_window(\"1h\").short_window(\"5m\") > %s",
    var.slo_id,
    var.timeframe,
    local.critical_threshold
  )
}

Relevant debug or panic output

Error: error validating monitor from /api/v1/monitor/22717999/validate: 400 Bad Request: {"errors":["The value provided for parameter 'query' is invalid: Slo '8f1fae59f10c51a28fd338fe08d1bec8' does not have timeframe '30d'."]}

with module.custom_slos.module.fee_service_member_facing_api_slo.module.datadog_slo_alert.datadog_monitor.slo_alert,
on modules/datadog_slo_alert/main.tf line 14, in resource "datadog_monitor" "slo_alert":
14: resource "datadog_monitor" "slo_alert"

Expected Behavior

When an SLO's parameters (timeframe or target) are changed, any monitors that reference this SLO should be automatically recreated during the Terraform apply phase.

The provider should handle the dependency correctly by:

  1. Recognizing that the SLO parameters have changed
  2. Automatically recreating the monitor that references the SLO
  3. Ensuring the SLO is updated before attempting to update/recreate the monitor

Actual Behavior

When an SLO's parameters (timeframe or target) are changed, the Terraform plan/apply fails with a validation error. The Datadog API rejects the update with an error message stating that the SLO does not have the new timeframe or target.

This creates a chicken-and-egg problem:

  1. The SLO needs to be updated first with the new parameters
  2. But the monitor that references the SLO needs to be recreated to use the new parameters
  3. Terraform tries to update both resources in a single plan, which fails

Steps to Reproduce

  1. Create an SLO with specific parameters (e.g., timeframe="7d", target=99.9)
  2. Create a monitor that references this SLO using burn_rate() in the query
  3. Change the SLO parameters (e.g., timeframe="30d" or target=99.95)
  4. Run terraform plan or terraform apply

Important Factoids

We've already tried using depends_on to ensure the SLO is created before the monitor, but this doesn't help with updates.
We've attempted several workarounds:

  1. Using create_before_destroy - Terraform still tries to validate the monitor update before deciding to recreate it.
  2. Using replace_triggered_by - Failed with "Only resources, count.index, and each.key may be used in replace_triggered_by."
  3. Current workaround (hacky) - We're embedding a hash of the SLO parameters in the monitor name and query to force recreation, but this pollutes the UI with implementation details.

References

No response

@eimjustas eimjustas added the bug label May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant