[Feature Request] Paginate snapshot indices status fetching #16985

bugmakerrrrrr · 2025-01-09T13:53:10Z

Is your feature request related to a problem? Please describe

Our customers depend on the snapshot status API to access information about snapshot indices, like store size, number of docs, etc. The TransportSnapshotStatusAction utilizes a single Generic thread to retrieve repository data, snapshot information, snapshot index metadata, and shard snapshot status if the specified snapshot(s) is not currently running. However, when the specified snapshot contains a large number of indices, the execution time for this action becomes significantly prolonged.

In one of the snapshot which has 15000+ shards, snapshot status fetching was taking 8min.

Describe the solution you'd like

Provide a new API (_snapshot/{repository}/{snapshot}/_list/indices) to paginate snapshot indices status like we did in #14258. The new API works only for indexes belonging to a specific snapshot. Since the order of indices in SnapshotInfo is settled, we can simply use from + size to paginate. If the specified snapshot is running, then the paginating parameters will have no effect.

Related component

Storage:Snapshots

Describe alternatives you've considered

Using the snapshot thread pool to parallelize indices snapshot status fetching. But the snapshot thread pool might be blocked on long running tasks. Moreover, the maximum number of threads in the snapshot thread pool is only 5, so the speedup effect may be limited

Additional context

No response

The text was updated successfully, but these errors were encountered:

ashking94 · 2025-01-09T15:11:13Z

Attendees - 1 2 3 4

Thanks for filing this issue, please feel free to submit a pull request.

andrross · 2025-01-11T00:34:48Z

Provide a new API (_snapshot/{repository}/{snapshot}/_list/indices)

I believe #14258 introduced a new top-level _list API concept, like _list/indices and _list/shards/{index}. We'd probably want to follow the same pattern here with something like _list/snapshots/{repository}/{snapshot}/.

bugmakerrrrrr · 2025-01-13T08:43:34Z

@andrross I was thinking of paging the indices section returned by the status API, and still returning the response in JSON format. The list API needs to return the response in CAT or JSON format, which doesn't seem like a good fit. I'm wondering if it's possible to have two APIs for paging, one for snapshot indices and one for snapshot shards.

The API for paging indices is _list/snapshot/{repository}/{snapshot}/indices, the response includes shard stats and snapshot file stats, and the default fields of response are as follows:

index: index name
shards.total: total number of shards included in the snapshot.
shards.done: number of shards that initialized, started, and finalized successfully
shards.failed: number of shards that failed to be included in the snapshot
file_count: total number of files that are referenced by the snapshot
size_in_bytes: total size of files that are referenced by the snapshot
start_time_in_millis: time (in milliseconds) when snapshot creation began
time_in_millis: total time (in milliseconds) that the snapshot took to complete

The API for paging shards is _list/snapshot/{repository}/{snapshot}/shards, the response includes shards part of index objects, and the default fields of response are as follows:

index: index name
shard: the number of shard
stage: the current state of shards in the snapshot
file_count: total number of files that are referenced by the snapshot
size_in_bytes: total size of files that are referenced by the snapshot
start_time_in_millis: time (in milliseconds) when snapshot creation began
time_in_millis: total time (in milliseconds) that the snapshot took to complete

What do you think?

andrross · 2025-01-13T17:56:21Z

@bugmakerrrrrr Adding two new _list APIs make sense to me. Seems like it would be better to create new APIs designed for pagination versus trying to shim it into existing APIs. I think this was the basic reason the top-level _list construct was created.

bugmakerrrrrr added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 9, 2025

github-actions bot added the Storage:Snapshots label Jan 9, 2025

github-project-automation bot added this to Storage Project Board Jan 9, 2025

github-project-automation bot moved this to 🆕 New in Storage Project Board Jan 9, 2025

ashking94 added good first issue Good for newcomers and removed untriaged labels Jan 9, 2025

ashking94 moved this from 🆕 New to Ready To Be Picked in Storage Project Board Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Paginate snapshot indices status fetching #16985

[Feature Request] Paginate snapshot indices status fetching #16985

bugmakerrrrrr commented Jan 9, 2025 •

edited

Loading

ashking94 commented Jan 9, 2025

andrross commented Jan 11, 2025

bugmakerrrrrr commented Jan 13, 2025 •

edited

Loading

andrross commented Jan 13, 2025

[Feature Request] Paginate snapshot indices status fetching #16985

[Feature Request] Paginate snapshot indices status fetching #16985

Comments

bugmakerrrrrr commented Jan 9, 2025 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

ashking94 commented Jan 9, 2025

andrross commented Jan 11, 2025

bugmakerrrrrr commented Jan 13, 2025 • edited Loading

andrross commented Jan 13, 2025

bugmakerrrrrr commented Jan 9, 2025 •

edited

Loading

bugmakerrrrrr commented Jan 13, 2025 •

edited

Loading