[Bug] DeleteDocumentHandler does not delete all records from AI Search #933

gabrielgirbacea · 2024-12-09T14:45:39Z

Context / Scenario

Steps to reproduce:

Set-up Kernel Memory locally with Swagger
Configure it to use AI Search
Import a larger file (size >200 kb - only text document), something to generate more than 50 partitions in AI Search. Ex: lotr.txt
Call the "delete document" endpoint

Expected result: Deletes all the records/partitions from AI Search for a document id.
Actual result: Some records/partitions are not deleted.

A few examples of how many records were deleted per different sizes of the file:

for 83 partitions, 33 remain undeleted. After the second delete, 0.
for 63 partitions, 13 remain undeleted. After the second delete, 0.
for 332 partitions, 143 remain undeleted. After the second delete is triggered, 33 remain undeleted. After the third delete, 0.

Please let me know if I can provide more details.
Thank you.

What happened?

Calling the "delete document" endpoint should delete all records from AI Search for the provided document id, regardless of how many records there are stored.

Note: When importing a file for an existing document id, SaveRecordsHandler is deleting all the existing records, as expected, before adding the new records.

Importance

edge case

Platform, Language, Versions

C# - .NET 8.0
Windows 11
Visual Studio Professional 2022
Kernel Memory (latest - 0.94.241201.1)

Relevant log output

dluc · 2024-12-18T02:34:41Z

@gabrielgirbacea which document store are you using? what you observe could be a result of using a volatile document store, which is the default one.

gabrielgirbacea · 2024-12-18T07:27:11Z

@dluc I am using Azure Blob Storage and for queues, Azure Queues.

dluc · 2024-12-18T19:03:15Z

Do you see any errors in the logs?

Some scenarios I would check:

one of the services is throttling and the number of retries times out
the queue service is losing queues or messages or timing out messages too soon
there are other apps using the same services (queues, blobs, search), writing and or locking records
does the same error occur using a different vector store, e.g. Qdrant? (you can test locally running Qdrant with Docker, see under /tools folder)

gabrielgirbacea · 2024-12-23T12:35:12Z

Hi @dluc,

Following up on this issue:

We reviewed the logs and found no error entries. However, we were able to consistently reproduce the problem by debugging the Kernel Memory code locally while connected to a new instance of Azure AI Search. It's worth noting that no other services were utilizing these resources during our testing.
You mentioned that the queue service may lose messages. But for the delete operation, only a single message should be queued (in the private-delete-document queue). Can you confirm this?
While I haven't tested with Quadrant, I did test with "SimpleVectorDb" and did not encounter this deletion problem.

The issue appears to originate within the DeleteDocumentHandler class, specifically the InvokeAsync method.
To gain further insight, I added some logging to the InvokeAsync method while debugging Kernel Memory:

IAsyncEnumerable<MemoryRecord> records = db.GetListAsync(
    index: pipeline.Index,
    limit: -1,
    filters: [MemoryFilters.ByDocument(pipeline.DocumentId)],
    cancellationToken: cancellationToken);

// Added
var recordsCount = await records.CountAsync(cancellationToken).ConfigureAwait(false);
// Added
_log.LogInformation("Found {RecordsCount} records to delete for '{Index}/{DocumentId}'", recordsCount, pipeline.Index, pipeline.DocumentId);

// Added
int deletedRecordsCount = 0;

await foreach (var record in records.WithCancellation(cancellationToken).ConfigureAwait(false))
{
    await db.DeleteAsync(index: pipeline.Index, record, cancellationToken: cancellationToken).ConfigureAwait(false);

    // Added
    deletedRecordsCount++;
}

// Added
_log.LogInformation("Deleted {DeletedRecordsCount} records for '{Index}/{DocumentId}'", deletedRecordsCount, pipeline.Index, pipeline.DocumentId);

These logs confirm the discrepancy in record counts initially reported. Although GetListAsync seems to retrieve all expected entities, some are not deleted during the process.

gabrielgirbacea · 2025-01-13T14:14:18Z

Hi @dluc ,

Do you have any updates regarding my previous message?

alexmg · 2025-01-15T08:18:11Z

Hi @gabrielgirbacea.

I made a contribution (#877) that allows a Session ID to be used when performing requests to Azure AI Search. This is surfaced as the UseStickySessions property on the AzureAISearchConfig and will be used within the GetListAsync method if set to true.

With this setting enabled the AISearchOptions.SessionId is assigned a unique value for the duration of the GetListAsync invocation. That property will suggest to Azure AI Search that it should hit the same replica to get more consistent results.

A value to be used to create a sticky session, which can help getting more consistent results. As long as the same sessionId is used, a best-effort attempt will be made to target the same replica set.

I used this in some custom code to migrate records from one Azure AI Search instance to another and the final record counts matched unlike prior attempts without the SessionId being set. I assume this setting would also be beneficial while retrieving records to delete as it decreases the chances of a different replica being hit. Unfortunately, Azure AI Search uses an eventual consistency model, so data being durable does not imply that it is immediately searchable, especially with multiple replicas.

I would be interested to know if this setting is helpful with the above problem. I'm wondering if the delay introduced by the call to db.DeleteAsync during async enumeration is resulting in different replicas being hit and inconsistent results being returned. It's a simple bool configuration property so worth a try. 😄

gabrielgirbacea added bug Something isn't working triage labels Dec 9, 2024

dluc added waiting for author Waiting for author to reply or address comments and removed triage labels Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] DeleteDocumentHandler does not delete all records from AI Search #933

[Bug] DeleteDocumentHandler does not delete all records from AI Search #933

gabrielgirbacea commented Dec 9, 2024

dluc commented Dec 18, 2024

gabrielgirbacea commented Dec 18, 2024

dluc commented Dec 18, 2024

gabrielgirbacea commented Dec 23, 2024 •

edited

Loading

gabrielgirbacea commented Jan 13, 2025

alexmg commented Jan 15, 2025

[Bug] DeleteDocumentHandler does not delete all records from AI Search #933

[Bug] DeleteDocumentHandler does not delete all records from AI Search #933

Comments

gabrielgirbacea commented Dec 9, 2024

Context / Scenario

What happened?

Importance

Platform, Language, Versions

Relevant log output

dluc commented Dec 18, 2024

gabrielgirbacea commented Dec 18, 2024

dluc commented Dec 18, 2024

gabrielgirbacea commented Dec 23, 2024 • edited Loading

gabrielgirbacea commented Jan 13, 2025

alexmg commented Jan 15, 2025

gabrielgirbacea commented Dec 23, 2024 •

edited

Loading