Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: New Feature: MongoDB VectorStore - Allow for filters on nested sub-documents #10152

Open
zarusz opened this issue Jan 10, 2025 · 6 comments
Assignees
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code

Comments

@zarusz
Copy link

zarusz commented Jan 10, 2025


name: MongoDB VectorStore - Support Filters on Nested Sub-Documents
about: Currently, MongoDBVectorStoreRecordCollection does not support filtering on nested sub-documents. I am requesting the ability to apply filters on nested fields within MongoDB documents during vector searches.


Feature Request

When working with MongoDBVectorStoreRecordCollection, it’s not possible to apply filters on nested sub-documents. This limitation prevents filtering on fields within embedded objects.

Example Scenario:

Consider the following MongoDB document:

{
  "_id": { "$oid": "673871520bb02bf2a7bb8b4e" },
  "chunkNumber": 1147,
  "url": "https://mywebsite.com",
  "text": "some text",
  "textEmbedding": [],
  "metadata": {
    "source": "gravity9",
    "content_type": "text/css",
    "title": "",
    "targetId": "00000000-0000-0000-0000-000000000000"
  }
}

And this corresponding MongoDB index:

{
  "fields": [
    { "type": "vector", "numDimensions": 1536, "path": "textEmbedding", "similarity": "cosine" },
    { "type": "filter", "path": "metadata.source" },
    { "type": "filter", "path": "metadata.targetId" }
  ]
}

The model in C# is defined as:

public class KnowledgeChunk
{
    [BsonId(IdGenerator = typeof(StringObjectIdGenerator))]
    [BsonRepresentation(BsonType.ObjectId)]
    [VectorStoreRecordKey]
    public required string Id { get; set; }

    [BsonElement("chunkNumber")]
    [VectorStoreRecordData(IsFilterable = true)]
    public int ChunkNumber { get; set; }

    [BsonElement("text")]
    [VectorStoreRecordData]
    public required string Text { get; set; }

    [BsonElement("textEmbedding")]
    [VectorStoreRecordVector(1536, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float>? TextEmbedding { get; set; }

    [BsonElement("metadata")]
    // [VectorStoreRecordData] // Uncommenting this throws an error
    public required KnowledgeChunkMetadata Metadata { get; set; }
}

[BsonIgnoreExtraElements]
public class KnowledgeChunkMetadata
{
    [BsonElement("source")]
    [VectorStoreRecordData(IsFilterable = true)]
    public required string Source { get; set; }

    [BsonElement("targetId")]
    [VectorStoreRecordData(IsFilterable = true)]
    public required string TargetId { get; set; }
}

Problem

  1. Error When Marking Nested Properties as Filterable:
    Uncommenting [VectorStoreRecordData] on the Metadata property throws an exception due to unsupported property types:

    System.ArgumentException: Data properties must be one of the supported types...
    Type of the property 'Metadata' is Gravity9.Service.Agent.Application.Plugins.KnowledgeDb.KnowledgeChunkMetadata.
    
  2. No Way to Filter Nested Fields:
    There’s no clear method to filter on nested fields like Metadata.Source using the VectorSearchFilter API.

    Example attempt (fails):

    var propertyName = nameof(KnowledgeChunk.Metadata); // Tried Metadata.Source too
    VectorSearchFilter? filter = new VectorSearchFilter().EqualTo("gravity9");
    
    var options = new VectorSearchOptions
    {
        IncludeVectors = false,
        IncludeTotalCount = false,
        Top = request.Top,
        Filter = filter,
    };
    
    var items = await collection.VectorizedSearchAsync(contentVector, options: options, cancellationToken: cancellationToken);
  3. Limitation in Supported Data Types:
    Reviewing MongoDBConstants.SupportedDataTypes suggests only primitive data types are filterable, blocking nested object filtering.


Proposed Solution

  • Enhance Filter Support:
    Enable filtering on nested sub-document fields (e.g., metadata.source, metadata.targetId).
  • Flatten or Path-Based Filtering:
    Implement path-based filtering similar to MongoDB’s dot notation (metadata.source), or automatically flatten nested objects for filtering.

Questions

  1. Are there any plans to support filtering on nested sub-documents in MongoDBVectorStoreRecordCollection?
  2. If not currently planned, would the team be open to accepting a community contribution to add this feature?

Thank you for considering this request!


Environment:

  • Library: SemanticKernel
  • Storage: MongoDB VectorStore
  • Language: C# (.NET 8.0)

Impact:
This enhancement would significantly improve filtering flexibility for complex document structures and unlock more advanced search capabilities.

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage msft.ext.vectordata Related to Microsoft.Extensions.VectorData and removed triage labels Jan 10, 2025
@markwallace-microsoft markwallace-microsoft moved this to Backlog: Planned in Semantic Kernel Jan 13, 2025
@markwallace-microsoft markwallace-microsoft self-assigned this Jan 13, 2025
@markwallace-microsoft markwallace-microsoft moved this from Backlog: Planned to Sprint: Planned in Semantic Kernel Jan 13, 2025
@markwallace-microsoft
Copy link
Member

@roji could you consider this in the filtering work you have started.

@roji
Copy link
Member

roji commented Jan 14, 2025

Will do, thanks @markwallace-microsoft.

Note to self: consider exactly what this means in term of modeling and ORM mapping - I'm not sure if and how we support hierarchical models in the current abstraction (/cc @westey-m). Query-wise, if we do LINQ, then users can express accesses to JSON subdocuments via C# - this is nice and easy. But the provider must then be directly aware of the user's POCO, in order to correctly interpret the incoming expression tree and translate it to the Mongo querying language.

@zarusz
Copy link
Author

zarusz commented Jan 14, 2025

Thank you for the update! It looks like there's ongoing work related to filtering functionality in general.

Is there any timeline or estimate for when the next pre-release might include support for filtering based on subdocument values?

In the meantime, I might need to explore some workarounds:

  • Flattening my MongoDB model (if data migration is feasible)
  • Bypassing the VectorStore abstractions and working directly with the MongoDB driver

Appreciate any insights you can share!

@roji
Copy link
Member

roji commented Jan 15, 2025

@zarusz this will definitely take a while, as a lot of design changes are required to make everything work here (I don't think we have a clear architecture in mind for this yet). I'd advise bypassing the abstraction and working directly with the driver for now.

@westey-m
Copy link
Contributor

We currently do not support hierarchical models for any parts of the vector store stack, i.e. reading records, writing records, or creating collections. It is something we have wanted to address, but it is a significant piece of work, and we haven't had the resources yet. The work would including updating RecordDefinitions and how we derive them from the data models, plus updating validation and mappers for each connector to support this.

@zarusz
Copy link
Author

zarusz commented Jan 15, 2025

Okay, that all sounds reasonable. Thanks for the fast response.

Keep up the good work folks, this project is what the .NET community needs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code
Projects
Status: Sprint: Planned
Development

No branches or pull requests

4 participants