Description
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangChain documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
I have agent graph code
from dotenv import load_dotenv
from langchain_ollama import ChatOllama
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import tools_condition
from langgraph.graph import MessagesState
from src.agents import tools
import os
load_dotenv(".env")
ollama_url = os.getenv("OLLAMA_BASE_URL")
ollama_model = os.getenv("OLLAMA_MODEL")
llm = ChatOllama(model=ollama_model, base_url=ollama_url)
llm_with_tools = llm.bind_tools(tools.tools_list)
memory = MemorySaver() #checkpoint every node state
# Node
def llm_call(state: MessagesState):
return {"messages": [llm_with_tools.invoke(state["messages"])]}
# Build graphа
builder = StateGraph(MessagesState)
builder.add_node("llm_call", llm_call)
builder.add_node("tools", ToolNode(tools.tools_list))
builder.add_edge(START, "llm_call")
builder.add_conditional_edges("llm_call", tools_condition)
builder.add_edge("tools", "llm_call")
graph = builder.compile(memory)
I have tools code
from src.database.db import Database
from src.parsers.habrnews import Habr
from langchain.agents import tool
@tool
def habr():
"""
- Returns the last IT news article on Habr
"""
return Habr.getNews()
@tool
def news_database():
"""
- Returns data about news querry from the vector database
"""
db = Database()
return db.search("News for the last 24h")
# list of tools
tools_list = [news_database]
I have Database code
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_postgres import PGVector
from langchain_core.documents import Document
from dotenv import load_dotenv
import os
import json
class Database:
def __init__(self):
load_dotenv(".env")
name = os.getenv("POSTGRES_NAME")
pwd = os.getenv("POSTGRES_PASSWORD")
ollama_url = os.getenv("OLLAMA_BASE_URL")
ollama_model = os.getenv("OLLAMA_MODEL")
self.ollama = ChatOllama(model=ollama_model, base_url=ollama_url)
connection = f"postgresql+psycopg://{name}:{pwd}@postgresql-pgvector:5432/feedconveyor"
embeddings = OllamaEmbeddings(model=ollama_model, base_url=ollama_url)
self.vector_database = PGVector(
collection_name="store",
connection=connection,
embeddings=embeddings
)
self.vector_database.create_vector_extension()
self.vector_database.create_tables_if_not_exists()
async def store_data(self, data):
self.vector_database.add_documents(data, ids=[doc.metadata["id"] for doc in data])
def search(self, search):
docs: dict[Document] = self.vector_database.similarity_search(search)
return docs
I have Docker compose code (LangGraph Studio and the main app are in the same network)
services:
bot:
build:
context: .
container_name: telegram-bot
develop:
watch:
- action: sync
path: ./src
target: /src
ignore:
- node_modules/
- action: rebuild
path: requirements.txt
depends_on:
- postgresql-pgvector
networks:
- local-network
postgresql-pgvector:
image: ankane/pgvector
container_name: postgresql-pgvector
env_file:
- ".env"
restart: always
environment:
- POSTGRES_DB=feedconveyor
- POSTGRES_USER=${POSTGRES_NAME}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- ./db:/var/lib/postgresql/data
networks:
- local-network
ports:
- 5432:5432
pgadmin:
image: dpage/pgadmin4:latest
container_name: pgadmin
restart: always
env_file:
- ".env"
environment:
- PGADMIN_DEFAULT_EMAIL=${PGADMIN_EMAIL}
- PGADMIN_DEFAULT_PASSWORD=${PGADMIN_PASSWORD}
ports:
- 8080:80
networks:
- local-network
networks:
local-network:
driver: bridge

Error Message and Stack Trace (if applicable)
When I use Ollama:3.1:8b to answer the question by using RAG tools for searching the data I get
1 validation error for Document
metadata
Input should be a valid dictionary [type=dict_type, input_value=Fragment(buf=b'{"id": 1}'), input_type=Fragment]
For further information visit https://errors.pydantic.dev/2.9/v/dict_typeTraceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/langchain_core/tools/base.py", line 657, in run
response = context.run(self._run, *tool_args, **tool_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_core/tools/structured.py", line 80, in _run
return self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/deps/__outer_Feed-Conveyor/src/src/agents/tools.py", line 18, in news_database
return db.search("News for the last 24h")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/deps/__outer_Feed-Conveyor/src/src/database/db.py", line 37, in search
docs: dict[Document] = self.vector_database.similarity_search(search)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 943, in similarity_search
return self.similarity_search_by_vector(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 1498, in similarity_search_by_vector
docs_and_scores = self.similarity_search_with_score_by_vector(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 1043, in similarity_search_with_score_by_vector
return self._results_to_docs_and_scores(results)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_postgres/vectorstores.py", line 1063, in _results_to_docs_and_scores
Document(
File "/usr/local/lib/python3.12/site-packages/langchain_core/documents/base.py", line 285, in __init__
super().__init__(page_content=page_content, **kwargs) # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_core/load/serializable.py", line 125, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Document
metadata
Input should be a valid dictionary [type=dict_type, input_value=Fragment(buf=b'{"id": 1}'), input_type=Fragment]
For further information visit https://errors.pydantic.dev/2.9/v/dict_type
Description
In #28027 (reply in thread) I discussed with AI about the issue:
The _results_to_docs_and_scores method is where the conversion of database results into Document objects occurs. The issue with metadata being returned as a Fragment object instead of a dictionary likely arises here. The metadata is being directly assigned from result.EmbeddingStore.cmetadata.
To resolve the issue, ensure that result.EmbeddingStore.cmetadata is properly deserialized into a dictionary. If cmetadata is stored as a JSON field in the database, it should be automatically deserialized by the ORM (e.g., SQLAlchemy) into a dictionary. However, if it's being returned as a Fragment, you might need to explicitly convert it to a dictionary. Here's a potential fix:
def _results_to_docs_and_scores(self, results: Any) -> List[Tuple[Document, float]]:
"""Return docs and scores from results."""
docs = [
(
Document(
id=str(result.EmbeddingStore.id),
page_content=result.EmbeddingStore.document,
metadata=dict(result.EmbeddingStore.cmetadata), # Ensure it's a dictionary
),
result.distance if self.embeddings is not None else None,
)
for result in results
]
return docs
By wrapping result.EmbeddingStore.cmetadata with dict(), you ensure that the metadata is explicitly converted to a dictionary, which should resolve the validation error.
System Info
Apple M1 14 Pro (15.1 24B83) 16 GB