Skip to content

🐛 Bug Report: Memory Leak When Enabled opentelemetry-instrumentation-langchain #2790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Workom2016 opened this issue Mar 18, 2025 · 3 comments
Open
1 task done

Comments

@Workom2016
Copy link

Which component is this bug for?

Traceloop SDK

📜 Description

I'm having a memory leak for a service that's using the traceloop-sdk, opentelemetry-instrumentation-langchain and having an HTTP call inside one of langgraph nodes,

That causes my service to restart due to OOM exception

👟 Reproduction steps

Relevant requirements.txt

fastapi[standard]==0.115.6
asgi-correlation-id==4.3.1
uvicorn==0.29.0
aiohttp==3.11.11
langchain==0.2.16
langchain-openai==0.1.25
langgraph==0.2.6
langchain-community==0.2.16
pydantic-settings==2.7.1
opentelemetry-api==1.29.0
opentelemetry-sdk==1.29.0
openinference-semantic-conventions==0.1.12
opentelemetry-exporter-otlp-proto-http==1.29.0
opentelemetry-instrumentation-fastapi==0.50b0
opentelemetry-instrumentation-aiohttp-client==0.50b0
traceloop-sdk==0.35.0
pydantic==2.10.4

I used tracemalloc to find what causes the memory leak and it seems like it is caused by JsonDecoder which keeps growing,
After 40 calls to my API that triggered langgraph, the size of JsonDecoder is ~14MB, please see

{
"stat": ".../plugins/python-ce/helpers/pydev/pydevd.py:2270: size=14.8 MiB (+14.8 MiB), count=115519 (+115519), average=135 B",
"frames": [
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 2270)\n",
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 2252)\n",
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 1563)\n",
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 1570)\n",
"$('.../plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py', 18)\n",
"$('.../app/main.py', 69)\n",
"$('.../venv/lib/python3.11/site-packages/uvicorn/main.py', 575)\n",
"$('.../venv/lib/python3.11/site-packages/uvicorn/server.py', 65)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py', 190)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py', 118)\n",
"$('.../app/tools/base/tool.py', 144)\n",
"$('.../venv/lib/python3.11/site-packages/traceloop/sdk/decorators/base.py', 193)\n",
"$('.../app/tools/banking_question/banking_question_tool.py', 108)\n",
"$('.../venv/lib/python3.11/site-packages/oz_logger/decorators.py', 24)\n",
"$('.../app/client/intent_to_answer_client.py', 51)\n",
"$('.../venv/lib/python3.11/site-packages/tenacity/asyncio/init.py', 189)\n",
"$('.../venv/lib/python3.11/site-packages/tenacity/asyncio/init.py', 114)\n",
"$('.../app/client/intent_to_answer_client.py', 74)\n",
"$('.../venv/lib/python3.11/site-packages/aiohttp/client_reqrep.py', 1298)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/init.py', 346)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py', 337)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py', 353)\n"
]
}

👍 Expected behavior

No memory leak

👎 Actual Behavior with Screenshots

Having memory leak and Pods restart due to OOM exception

🤖 Python Version

3.11.11

📃 Provide any additional context for the Bug.

After disabling opentelemetry-instrumentation-langchain or the call to another service via aiohttp, there's no memory leak

For now, I disabled opentelemetry-instrumentation-langchain and added custom spans using @workflow/@task annotations around the nodes and tools

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

None

@doronkopit5
Copy link
Contributor

Hi @Workom2016 , thanks for opening the issue - we'll take a look and prioritize resolving it.
Can you attach some reproduction code of your usage of Langgraph so it will be easy for us to reproduce and resolve the issue.

@myniva
Copy link

myniva commented Apr 24, 2025

We're experiencing this same memory leak issue in our application as well.

After investigating the code, I believe I've identified the root cause. In the LangChain instrumentation callback handler, when a span is created, a SpanHolder is added to the self.spans dictionary:

self.spans[run_id] = SpanHolder(
    span, token, None, [], workflow_name, entity_name, entity_path
)

See: https://github.com/traceloop/openllmetry/blob/main/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py#L367-L369

However, when the span ends in the _end_span method, while the spans themselves are ended, the SpanHolder objects are never removed from the self.spans dictionary:

def _end_span(self, span: Span, run_id: UUID) -> None:
    for child_id in self.spans[run_id].children:
        child_span = self.spans[child_id].span
        if child_span.end_time is None:  # avoid warning on ended spans
            child_span.end()
    span.end()

See: https://github.com/traceloop/openllmetry/blob/main/packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py#L316-L321

This appears to be causing the memory leak, as these objects accumulate over time, eventually leading to the OOM exception.

@marioweid
Copy link

marioweid commented May 9, 2025

Any updates on that?
We can't enable langchain tracing in our prod env atm, since why we investigated a bit further into the issue.
However even if you del self.spans[child_id] on _end_span the memory seems to increase over time.
Im not so familiar with the codebase but do we know which exact part is causing the memory leak? is it the self.spans ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants