Skip to content

Datadog Monitor for SLO fails to update when SLO parameters change #3022

Open
@eimjustas

Description

@eimjustas

Datadog Terraform Provider Version

3.40.0

Terraform Version

1.5.7

What resources or data sources are affected?

datadog_service_level_objective
datadog_monitor (specifically SLO alert type)

Terraform Configuration Files

# SLO Resource
resource "datadog_service_level_objective" "metric_based_slo" {
  name        = "Example SLO"
  type        = "metric"
  
  query {
    numerator   = "count:trace.http.request{status:200}.as_count()"
    denominator = "count:trace.http.request{}.as_count()"
  }

  thresholds {
    timeframe = var.timeframe  # Changed from "7d" to "30d"
    target    = var.target     # Changed from 99.9 to 99.95
  }

  tags = ["service:example"]
}

# Monitor Resource
module "datadog_slo_alert" {
  depends_on = [datadog_service_level_objective.metric_based_slo]
  source     = "../datadog_slo_alert"
  slo_id     = datadog_service_level_objective.metric_based_slo.id
  timeframe  = var.timeframe
  slo_target = var.target
}

# In the datadog_slo_alert module:
resource "datadog_monitor" "slo_alert" {
  name    = "SLO Alert - ${var.slo_name}"
  type    = "slo alert"
  
  query = format(
    "burn_rate(\"%s\").over(\"%s\").long_window(\"1h\").short_window(\"5m\") > %s",
    var.slo_id,
    var.timeframe,
    local.critical_threshold
  )
}

Relevant debug or panic output

Error: error validating monitor from /api/v1/monitor/22717999/validate: 400 Bad Request: {"errors":["The value provided for parameter 'query' is invalid: Slo '8f1fae59f10c51a28fd338fe08d1bec8' does not have timeframe '30d'."]}

with module.custom_slos.module.fee_service_member_facing_api_slo.module.datadog_slo_alert.datadog_monitor.slo_alert,
on modules/datadog_slo_alert/main.tf line 14, in resource "datadog_monitor" "slo_alert":
14: resource "datadog_monitor" "slo_alert"

Expected Behavior

When an SLO's parameters (timeframe or target) are changed, any monitors that reference this SLO should be automatically recreated during the Terraform apply phase.

The provider should handle the dependency correctly by:

  1. Recognizing that the SLO parameters have changed
  2. Automatically recreating the monitor that references the SLO
  3. Ensuring the SLO is updated before attempting to update/recreate the monitor

Actual Behavior

When an SLO's parameters (timeframe or target) are changed, the Terraform plan/apply fails with a validation error. The Datadog API rejects the update with an error message stating that the SLO does not have the new timeframe or target.

This creates a chicken-and-egg problem:

  1. The SLO needs to be updated first with the new parameters
  2. But the monitor that references the SLO needs to be recreated to use the new parameters
  3. Terraform tries to update both resources in a single plan, which fails

Steps to Reproduce

  1. Create an SLO with specific parameters (e.g., timeframe="7d", target=99.9)
  2. Create a monitor that references this SLO using burn_rate() in the query
  3. Change the SLO parameters (e.g., timeframe="30d" or target=99.95)
  4. Run terraform plan or terraform apply

Important Factoids

We've already tried using depends_on to ensure the SLO is created before the monitor, but this doesn't help with updates.
We've attempted several workarounds:

  1. Using create_before_destroy - Terraform still tries to validate the monitor update before deciding to recreate it.
  2. Using replace_triggered_by - Failed with "Only resources, count.index, and each.key may be used in replace_triggered_by."
  3. Current workaround (hacky) - We're embedding a hash of the SLO parameters in the monitor name and query to force recreation, but this pollutes the UI with implementation details.

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions