Skip to content

.Net: [MEVD] Allow a raw embedding property to reference a source data property to get generated from #11736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
roji opened this issue Apr 25, 2025 · 0 comments
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code

Comments

@roji
Copy link
Member

roji commented Apr 25, 2025

In #10492, we're adding the ability to map arbitrary properties to a data store vector property, via an IEmbeddingGenerator:

[VectorStoreRecordVector(Dimensions: 3)]
public string Description { get; set; }

This hides the raw embedding, which was a primary goal of the design (the user shouldn't need to deal with or be aware of ReadOnlyMemory<float>). However, it notably means that users cannot fetch back the embedding from the database. We generally agree that this is a niche scenario (it's quite rare for the embedding to actually be useful - possibly for further custom filtering in .NET), and vector databases indeed don't by default return vectors when searching, but we still those niche scenarios to be supported.

Today, users can do this by handling embedding generation themselves:

[VectorStoreRecordVector(Dimensions: 3)]
public ReadOnlyMemory<float> DescriptionEmbedding { get; set; }

This means that they need to use IEmbeddingGenerator themselves outside of MEVD, and call SearchEmbeddingAsync instead of SearchAsync to pass in the raw embedding.

We could make this better by allowing a raw embedding property to reference a source data property:

[VectorStoreRecordData]
public string Description { get; set; }

[VectorStoreRecordVector(Dimensions: 3, SourceProperty: nameof(Description)]
public ReadOnlyMemory<float> DescriptionEmbedding { get; set; }

When the proposed SourceProperty parameter is set, DescriptionEmbedding is treated like an embedding-generated property, just like today: any default IEmbeddingGenerator is picked up and used when upserting new records. When returning records, both properties are populated from the database as usual, and the user can access the embeddings.

Note that SourceProperty refers to a .NET property, and not to a database property. This means that it can refer to a non-persisted property that e.g. concatenates multiple other .NET properties.

AFAICT there's nothing blocking us from adding this in the future (no breaking change).

@roji roji added .NET Issue or Pull requests regarding .NET code msft.ext.vectordata Related to Microsoft.Extensions.VectorData labels Apr 25, 2025
@roji roji moved this to Backlog in Semantic Kernel Apr 25, 2025
@github-actions github-actions bot changed the title [MEVD] Allow a raw embedding property to reference a source data property to get generated from .Net: [MEVD] Allow a raw embedding property to reference a source data property to get generated from Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code
Projects
Status: Backlog
Development

No branches or pull requests

2 participants