-
Notifications
You must be signed in to change notification settings - Fork 726
🐛 Bug Report: Memory Leak When Enabled opentelemetry-instrumentation-langchain #2790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Workom2016 , thanks for opening the issue - we'll take a look and prioritize resolving it. |
We're experiencing this same memory leak issue in our application as well. After investigating the code, I believe I've identified the root cause. In the LangChain instrumentation callback handler, when a span is created, a SpanHolder is added to the self.spans dictionary: self.spans[run_id] = SpanHolder(
span, token, None, [], workflow_name, entity_name, entity_path
) However, when the span ends in the _end_span method, while the spans themselves are ended, the SpanHolder objects are never removed from the self.spans dictionary: def _end_span(self, span: Span, run_id: UUID) -> None:
for child_id in self.spans[run_id].children:
child_span = self.spans[child_id].span
if child_span.end_time is None: # avoid warning on ended spans
child_span.end()
span.end() This appears to be causing the memory leak, as these objects accumulate over time, eventually leading to the OOM exception. |
Any updates on that? |
Which component is this bug for?
Traceloop SDK
📜 Description
I'm having a memory leak for a service that's using the traceloop-sdk, opentelemetry-instrumentation-langchain and having an HTTP call inside one of langgraph nodes,
That causes my service to restart due to OOM exception
👟 Reproduction steps
Relevant requirements.txt
fastapi[standard]==0.115.6
asgi-correlation-id==4.3.1
uvicorn==0.29.0
aiohttp==3.11.11
langchain==0.2.16
langchain-openai==0.1.25
langgraph==0.2.6
langchain-community==0.2.16
pydantic-settings==2.7.1
opentelemetry-api==1.29.0
opentelemetry-sdk==1.29.0
openinference-semantic-conventions==0.1.12
opentelemetry-exporter-otlp-proto-http==1.29.0
opentelemetry-instrumentation-fastapi==0.50b0
opentelemetry-instrumentation-aiohttp-client==0.50b0
traceloop-sdk==0.35.0
pydantic==2.10.4
I used tracemalloc to find what causes the memory leak and it seems like it is caused by JsonDecoder which keeps growing,
After 40 calls to my API that triggered langgraph, the size of JsonDecoder is ~14MB, please see
{
"stat": ".../plugins/python-ce/helpers/pydev/pydevd.py:2270: size=14.8 MiB (+14.8 MiB), count=115519 (+115519), average=135 B",
"frames": [
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 2270)\n",
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 2252)\n",
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 1563)\n",
"$('.../plugins/python-ce/helpers/pydev/pydevd.py', 1570)\n",
"$('.../plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py', 18)\n",
"$('.../app/main.py', 69)\n",
"$('.../venv/lib/python3.11/site-packages/uvicorn/main.py', 575)\n",
"$('.../venv/lib/python3.11/site-packages/uvicorn/server.py', 65)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py', 190)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py', 118)\n",
"$('.../app/tools/base/tool.py', 144)\n",
"$('.../venv/lib/python3.11/site-packages/traceloop/sdk/decorators/base.py', 193)\n",
"$('.../app/tools/banking_question/banking_question_tool.py', 108)\n",
"$('.../venv/lib/python3.11/site-packages/oz_logger/decorators.py', 24)\n",
"$('.../app/client/intent_to_answer_client.py', 51)\n",
"$('.../venv/lib/python3.11/site-packages/tenacity/asyncio/init.py', 189)\n",
"$('.../venv/lib/python3.11/site-packages/tenacity/asyncio/init.py', 114)\n",
"$('.../app/client/intent_to_answer_client.py', 74)\n",
"$('.../venv/lib/python3.11/site-packages/aiohttp/client_reqrep.py', 1298)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/init.py', 346)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py', 337)\n",
"$('.../[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py', 353)\n"
]
}
👍 Expected behavior
No memory leak
👎 Actual Behavior with Screenshots
Having memory leak and Pods restart due to OOM exception
🤖 Python Version
3.11.11
📃 Provide any additional context for the Bug.
After disabling opentelemetry-instrumentation-langchain or the call to another service via aiohttp, there's no memory leak
For now, I disabled opentelemetry-instrumentation-langchain and added custom spans using @workflow/@task annotations around the nodes and tools
👀 Have you spent some time to check if this bug has been raised before?
Are you willing to submit PR?
None
The text was updated successfully, but these errors were encountered: