Skip to content

[BUG] Can't start new thread #13136

Open
Open
@phillebaba

Description

@phillebaba

Agent Environment

2022-08-15 13:18:16 UTC | CORE | INFO | (pkg/util/log/log.go:571 in func1) | runtime: final GOMAXPROCS value is: 4
2022-08-15 13:18:16 UTC | CORE | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec
2022-08-15 13:18:16 UTC | CORE | INFO | (pkg/util/log/log.go:571 in func1) | Features detected from environment: kubernetes,cri,containerd
2022-08-15 13:18:16 UTC | CORE | INFO | (cmd/agent/app/run.go:253 in StartAgent) | Starting Datadog Agent v7.38.2

Describe what happened:

After running for a while Datadog Agent will start logging warnings that it can't start new thread. My understanding is that this comes from some collector written in Python, but I have not looked further than that. It will output a large amount of these logs and eventually crash and restart. Then it will repeat itself. This is not happening in all clusters but just in some clusters, and in some it occurs more often than others. I have done a log of debugging in the container trying to figure out where this is coming from and what process limit the agent is hitting but I can't find the source.

2022-08-17 07:51:45 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:135) | Unable to get disk metrics for /host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/035e59e774b42a19a1e21a19bc751c24d1abcfe5fd51d66361ea10808a7bfae6/rootfs: can't start new thread. You can exclude this mountpoint in the settings if it is invalid.
2022-08-17 07:51:45 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:135) | Unable to get disk metrics for /host/var/run/containerd/io.containerd.grpc.v1.cri/sandboxes/fef06dbdc5249250035ffc399b4d3ade24f29d9f25ea26a03aedc2928896fd85/shm: can't start new thread. You can exclude this mountpoint in the settings if it is invalid.
2022-08-17 07:51:45 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:135) | Unable to get disk metrics for /host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/fef06dbdc5249250035ffc399b4d3ade24f29d9f25ea26a03aedc2928896fd85/rootfs: can't start new thread. You can exclude this mountpoint in the settings if it is invalid.
2022-08-17 07:51:45 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:124 in LogMessage) | disk:e5dffb8bef24336f | (disk.py:135) | Unable to get disk metrics for /host/var/run/containerd/io.containerd.runtime.v2.task/k8s.io/41b68a2dafbcae96c6deb2a6124178781c609fd65098b61215bde2bb8bbc0ac2/rootfs: can't start new thread. You can exclude this mountpoint in the settings if it is invalid.

Describe what you expected:

I expect this error to not occur and for metrics and logs to work.

Steps to reproduce the issue:

No real method to reproduce it, it just starts occurring randomly.

Additional environment details (Operating System, Cloud provider, etc):

Cloud Provider: Azure
Kubernetes: AKS 1.22

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions