Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop processing search requests when _msearch is canceled #17005

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

msfroh
Copy link
Collaborator

@msfroh msfroh commented Jan 11, 2025

Description

Prior to this fix, the _msearch API would keep running search requests even after being canceled. With this change, we explicitly check if the task has been canceled before kicking off subsequent requests.

Related Issues

Resolves #17004

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for b88b6d5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Prior to this fix, the _msearch API would keep running search requests
even after being canceled. With this change, we explicitly check if
the task has been canceled before kicking off subsequent requests.

Signed-off-by: Michael Froh <[email protected]>
Copy link
Contributor

❕ Gradle check result for 30d127a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jan 11, 2025

Codecov Report

Attention: Patch coverage is 70.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 72.27%. Comparing base (8191de8) to head (ff03507).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...arch/action/search/TransportMultiSearchAction.java 70.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17005      +/-   ##
============================================
+ Coverage     72.21%   72.27%   +0.05%     
- Complexity    65289    65296       +7     
============================================
  Files          5301     5301              
  Lines        303725   303766      +41     
  Branches      44008    44018      +10     
============================================
+ Hits         219340   219545     +205     
+ Misses        66394    66256     -138     
+ Partials      17991    17965      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -193,7 +195,7 @@ private void handleResponse(final int responseSlot, final MultiSearchResponse.It
if (responseCounter.decrementAndGet() == 0) {
assert requests.isEmpty();
finish();
} else {
} else if (isCancelled(request.request.getParentTask()) == false) {
Copy link
Collaborator

@reta reta Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense but we need to populate the response slot with the error (task was cancelled or alike), what do you think?

Copy link
Collaborator Author

@msfroh msfroh Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be mistaken, but I think if the parent task gets canceled, then an error response is already sent back to the client.

The customer ticket that prompted this issue involved the client closing the connection (due to timeout). In that case, there is nobody left to receive the response.

I'll try hacking something together with explicit task cancelation to see what the client receives.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with this API, but intuitively, we may have a case when some requests are completed (before cancellation) and some would be canceled (after cancellation). So we won't be returning partial results, right? (just cancellation for the whole request)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were correct!

I debugged it locally (using a breakpoint in the handleResponse method to pause processing between requests). When I canceled the request, no response was returned and the msearching client just waited.

I've updated it to drain the rest of the queue and output a TaskCancelledException as the response for each remaining request.

@reta reta added backport 2.x Backport to 2.x branch v3.0.0 Issues and PRs related to version 3.0.0 v2.19.0 Issues and PRs related to version 2.19.0 labels Jan 11, 2025
I tried cancelling a request and found that the client would hang.
This change reports an exception for each remaining request instead.

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Michael Froh <[email protected]>
Copy link
Contributor

✅ Gradle check result for ff03507: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working Search:Resiliency v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] _msearch API doesn't properly handle task cancellation
2 participants