Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Optimize indexing performance in replica shard #16949

Open
kkewwei opened this issue Jan 5, 2025 · 4 comments
Open

[Feature Request] Optimize indexing performance in replica shard #16949

kkewwei opened this issue Jan 5, 2025 · 4 comments
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Jan 5, 2025

Is your feature request related to a problem? Please describe

It's known thatindexingStrategyForOperation will be invoked by both the primary and replica to ascertain the index strategy, which is very performance-intensive. Given that the data on primary and replica are same, the index strategy must be same between the two. Therefore, rather than the replica computing the strategy independently, index strategy can be passed from primary to replica directly, to avoid this performance-intensive action, thereby improving the index performance of the replica.

Related component

Indexing:Performance

Describe alternatives you've considered

No response

Additional context

No response

@kkewwei kkewwei added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 5, 2025
@soosinha
Copy link
Member

soosinha commented Jan 6, 2025

[Triage - attendees 1 2 3 4]

We discussed in the triage meeting that during failover and internal retry scenarios, it might be useful for the replica to compute the indexing strategy independently. @kkewwei Have you thought about all the cases ?

@soosinha soosinha removed the untriaged label Jan 6, 2025
@kkewwei
Copy link
Contributor Author

kkewwei commented Jan 7, 2025

We discussed in the triage meeting that during failover and internal retry scenarios, it might be useful for the replica to compute the indexing strategy independently. @kkewwei Have you thought about all the cases ?

@soosinha, I have't thought too much. In my side, It has a greater effect in replica indexing, of course including internal retry, It seems worthwhile to implement, and i like have a try in replica indexing first.

@navneet1v
Copy link
Contributor

@kkewwei just out if curiosity how much latency improvement do you expect here if replicas are not creating their own index strategy?

@kkewwei
Copy link
Contributor Author

kkewwei commented Jan 8, 2025

@kkewwei just out if curiosity how much latency improvement do you expect here if replicas are not creating their own index strategy?

@navneet1v, I am not sure either. In the scenario of frequent updates, I see that writing thread is actively engaged here. When indexing, it needs to query all segments of this shard to determine whether the particular doc exists and its version. I will try to use opensearch-benchmark to draw some preliminary conclusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance
Projects
None yet
Development

No branches or pull requests

3 participants