Array inclusion filter operation

I want to be able to filter vector search results by records that include a specific item in an array type field. The current filtering operators do not support doing the element-wise inclusion test I need. The supported [operators ](https://python.langchain.com/docs/integrations/vectorstores/pgvector/)are listed here:

| Operator | Meaning/Category               |
|----------|--------------------------------|
| $eq      | Equality (==)                  |
| $ne      | Inequality (!=)                |
| $lt      | Less than (<)                  |
| $lte     | Less than or equal (<=)        |
| $gt      | Greater than (>)               |
| $gte     | Greater than or equal (>=)     |
| $in      | Special Cased (in)             |
| $nin     | Special Cased (not in)         |
| $between | Special Cased (between)        |
| $like    | Text (like)                    |
| $ilike   | Text (case-insensitive like)   |
| $and     | Logical (and)                  |
| $or      | Logical (or)                   |

None of the operators from that list does exactly the inclusion test for a list of text elements. The closest is the `$like` operator which only works if the field is text. It doesn't work for array type because it tries to do an exact match for the entire array rather than looking into the elements inside.   

To reproduce this issue you can use this sample code. 

First let's define the schema and push some sample data in:

```
import os
from langchain_openai import AzureOpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_postgres.vectorstores import PGVector
from datetime import datetime, timezone

# Create a sample text file for testing
sample_text = """
This is a sample document to test our Postgres setup.
We'll use this to verify our vector search functionality with filtering.
The text can contain any information you want to search through later. 
"""

with open("sample_test.txt", "w") as f:
    f.write(sample_text)

# Load and process the document
loader = TextLoader("sample_test.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# Initialize Azure OpenAI embeddings
embeddings = AzureOpenAIEmbeddings(
    azure_deployment="myAzureDeployment",  # Replace with your deployment name
    model="text-embedding-ada-002",  # This is typically the model name
    azure_endpoint="https://yourEndpoint.openai.azure.com",
    api_key="yourApiKey",
    chunk_size=1,
)

connection_string = (
    "postgresql+psycopg://postgres:postgres@your_url:5432/test"
)
collection_name = "test_filtering"

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection_string,
    use_jsonb=True,
)

def prepare_metadata(doc):

    return {
        "source": doc.metadata["source"],
        "category": ["A"],
        "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ"),
    }

content_list = []
metadata_list = []

for doc in docs:
    content_list.append(doc.page_content)
    metadata_list.append(prepare_metadata(doc))

ids_added = vectorstore.add_texts(content_list, metadata_list)
print(f"Added {len(ids_added)} documents to the vector store.")
```

Now I use the following code to retrive data from the collection we just ingested:

```
filter = {"category": {"$like": ["A"]}}  # Exact match of the array instead of looking to the elements inside
retriever = vectorstore.as_retriever(search_kwargs={"filter": filter})
retrieved_docs = retriever .invoke("Opensearch", k=5)

print(retrieved_docs) # No results are returned
```

I have implemented another operator for array element-wise inclusion test which will fix this issue. But, I want to report this issue first for your review before creating a PR.

Thanks for reading.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Array inclusion filter operation #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Operator	Meaning/Category
$eq	Equality (==)
$ne	Inequality (!=)
$lt	Less than (<)
$lte	Less than or equal (<=)
$gt	Greater than (>)
$gte	Greater than or equal (>=)
$in	Special Cased (in)
$nin	Special Cased (not in)
$between	Special Cased (between)
$like	Text (like)
$ilike	Text (case-insensitive like)
$and	Logical (and)
$or	Logical (or)

Array inclusion filter operation #143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions