[BUG] Unhelpful crash message when ECS `pidMode` is configured incorrectly

**Agent Environment**


Agent 7.50.3 on AWS ECS Fargate, using the latest container image.

<details>
<summary>Stack trace</summary>

```
| 1708126659023 | panic: runtime error: index out of range [0] with length 0                                                                                                                                                                                                                                                                                                                                                        |
| 1708126659023 | goroutine 378 [running]:                                                                                                                                                                                                                                                                                                                                                                                          |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/util.(*ChunkAllocator[...]).Accept(0x5dcf9a0, {0xc001bb0080?, 0xc, 0x10}, 0x681)                                                                                                                                                                                                                                                                                     |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/util/chunking.go:96 +0x2f9                                                                                                                                                                                                                                                                                                           |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/util.ChunkPayloadsBySizeAndWeight[...](0xc00118b720, 0xc001bd25f0, 0x48, 0xf4240?)                                                                                                                                                                                                                                                                                   |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/util/chunking.go:166 +0x2c5                                                                                                                                                                                                                                                                                                          |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/checks.chunkProcessesBySizeAndWeight({0xc001bb0080?, 0xc, 0x10}, 0xc001a8b680, 0x4044b33333333333?, 0x0?, 0xc001bd25f0)                                                                                                                                                                                                                                              |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/checks/chunking.go:42 +0x326                                                                                                                                                                                                                                                                                                         |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/checks.chunkProcessesAndContainers(0x97d03d8?, {0xc000152720, 0x3, 0xc001c0d860?}, 0xc001bad560?, 0xc001afcdb0?)                                                                                                                                                                                                                                                     |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/checks/process.go:414 +0x118                                                                                                                                                                                                                                                                                                         |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/checks.createProcCtrMessages(0xc000074360, 0x40401c28f5c28f5c?, {0xc000152720?, 0x0?, 0x403a8a3d70a3d70a?}, 0x0?, 0x3ff9eb851eb851ec?, 0x57eac212, {0x0, 0x0}, ...)                                                                                                                                                                                                  |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/checks/process.go:371 +0x5d                                                                                                                                                                                                                                                                                                          |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/checks.(*ProcessCheck).run(0xc000b5e480, 0x57eac212, 0x1)                                                                                                                                                                                                                                                                                                            |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/checks/process.go:277 +0x6fe                                                                                                                                                                                                                                                                                                         |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/checks.(*ProcessCheck).Run(0x3?, 0xc00144ce00, 0xc001ac53b0)                                                                                                                                                                                                                                                                                                         |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/checks/process.go:351 +0xe5                                                                                                                                                                                                                                                                                                          |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/runner.(*CheckRunner).runCheckWithRealTime(0xc0002392c0, {0x6ed99b0, 0xc000b5e480}, 0xc001ac53b0)                                                                                                                                                                                                                                                                    |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/runner/runner.go:182 +0xc6                                                                                                                                                                                                                                                                                                           |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/runner.(*CheckRunner).runnerForCheck.func2({0x1, 0x1})                                                                                                                                                                                                                                                                                                               |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/runner/runner.go:350 +0x65                                                                                                                                                                                                                                                                                                           |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/checks.(*runnerWithRealTime).run(0xc000b4df40)                                                                                                                                                                                                                                                                                                                       |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/checks/runner.go:73 +0x35a                                                                                                                                                                                                                                                                                                           |
| 1708126659023 | github.com/DataDog/datadog-agent/pkg/process/runner.(*CheckRunner).Run.func1()                                                                                                                                                                                                                                                                                                                                    |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/runner/runner.go:287 +0x5c                                                                                                                                                                                                                                                                                                           |
| 1708126659023 | created by github.com/DataDog/datadog-agent/pkg/process/runner.(*CheckRunner).Run                                                                                                                                                                                                                                                                                                                                 |
| 1708126659023 |  /omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/process/runner/runner.go:285 +0x3c9                                                                                                                                                                                                                                                                                                          |
| 1708126659055 | process-agent exited with code 2, signal 0, restarting in 2 seconds
```
</details>

**Describe what happened:**

Got this crash while setting up DataDog process monitoring on a Fargate task. Eventually realized I'd made a typo and set `pidMode = task` on one of the container definitions instead of the root task config, and that fixed the immediate issue.

**Describe what you expected:**

Any diagnostic message that would help me discover the configuration issue.

**Steps to reproduce the issue:**

Set up an ECS task with multiple containers and don't set `pidMode`, sort of. There are more conditions but I'm not sure exactly what they are yet (in local testing I discovered it's apparently sensitive to whether the `datadog-agent` container's name comes alphabetically before or after any other containers in the task. more on that in a bit).

**Additional environment details (Operating System, Cloud provider, etc):**

---

Here's how far I've gotten in debugging the crash:

https://github.com/DataDog/datadog-agent/blob/abce0cb22b94b1519861b36ac9b8c9ea317bf6a1/pkg/process/util/chunking.go#L85-L98

The specific line is `c.props[c.idx].size += len(ps)`; we know `c.idx` is zero from the panic message, but we also know that `c.idx >= len(c.chunks)` is _not_ true because otherwise the `if` would have been taken and `c.props` would have a 0th element. Substituting, `!(c.idx >= len(c.chunks)` => `c.idx < len(c.chunks)` => `0 < len(c.chunks)` — that is, `c.chunks` _does_ have (at least) a 0th element.

`pkg/process/util/chunking.go` correctly maintains the relationship between `c.chunks` and `c.props`, but `c.chunks` can also escape the module by reference via `GetChunks`:

https://github.com/DataDog/datadog-agent/blob/abce0cb22b94b1519861b36ac9b8c9ea317bf6a1/pkg/process/util/chunking.go#L100-L102

If another section were to acquire a reference to `c.chunks` via `GetChunks`, and then `append` to it, that would violate `Accept`'s assumption that `len(c.props) >= len(c.chunks)`. This occurs:

https://github.com/DataDog/datadog-agent/blob/abce0cb22b94b1519861b36ac9b8c9ea317bf6a1/pkg/process/checks/chunking.go#L15-L22

https://github.com/DataDog/datadog-agent/blob/abce0cb22b94b1519861b36ac9b8c9ea317bf6a1/pkg/process/checks/chunking.go#L45-L51

If either of the "two scenarios" referenced in `chunkProcessesBySizeAndWeight`'s comment occurs, _and_ the container with unmappable processes is the first one inspected (hence why order matters!) so that `appendContainerWithoutProcesses` sees an empty `collectorProcs`, `c.chunks` will be extended but `c.props` won't. When `chunkProcessesBySizeAndWeight` later calls `utils.ChunkPayloadsBySizeAndWeight`, which calls `Accept`, this crash will occur.

My first instinct would be to say that `GetChunks` shouldn't exist (pass `chunker` down to `appendContainerWithoutProcesses` and use `Accept` with an empty process list? I dunno) but I'm seeing this code for the first time so take it with a grain of salt.

Presumably in this case `process <=> container mapping cannot be established` because of the `pidMode` configuration issue, but I'm not certain. I don't know if there's a good way to detect that setting from within the container, but this crash definitely shouldn't be happening.

	func (c *ChunkAllocator[T, P]) Accept(ps []P, weight int) {
	if c.idx >= len(c.chunks) {
	// If we are outside of the range of allocated chunks, allocate a new one
	c.chunks = append(c.chunks, *new(T))
	c.props = append(c.props, chunkProps{})
	}

	if c.OnAccept != nil {
	c.OnAccept(&c.chunks[c.idx])
	}
	c.AppendToChunk(&c.chunks[c.idx], ps)
	c.props[c.idx].size += len(ps)
	c.props[c.idx].weight += weight
	}

	func chunkProcessesBySizeAndWeight(procs []model.Process, ctr model.Container, maxChunkSize, maxChunkWeight int, chunker util.ChunkAllocator[model.CollectorProc, model.Process]) {
	if ctr != nil && len(procs) == 0 {
	// can happen in two scenarios, and we still need to report the container
	// a) if a process is skipped (e.g. disallowlisted)
	// b) if process <=> container mapping cannot be established (e.g. Docker on Windows)
	appendContainerWithoutProcesses(ctr, chunker.GetChunks())
	return
	}

	func appendContainerWithoutProcesses(ctr model.Container, collectorProcs []model.CollectorProc) {
	if len(*collectorProcs) == 0 {
	collectorProcs = append(collectorProcs, model.CollectorProc{})
	}
	collectorProc := &(collectorProcs)[len(collectorProcs)-1]
	collectorProc.Containers = append(collectorProc.Containers, ctr)
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Unhelpful crash message when ECS `pidMode` is configured incorrectly #22940

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	func (c ChunkAllocator[T, P]) GetChunks() []T {
	return &c.chunks
	}

[BUG] Unhelpful crash message when ECS pidMode is configured incorrectly #22940

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] Unhelpful crash message when ECS `pidMode` is configured incorrectly #22940