-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor FilePrefetchBuffer code #12097
Conversation
41f41b1
to
f73c6de
Compare
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ae06594
to
dc62c1a
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
dc62c1a
to
cee3f5c
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
cee3f5c
to
5040552
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
5040552
to
8cef5f0
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
8cef5f0
to
6d786ac
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
6d786ac
to
873507a
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
873507a
to
07b9367
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
d76e301
to
a44a8c4
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
a44a8c4
to
29666fe
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
79174c1
to
e8c6b2e
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
e8c6b2e
to
3332598
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to leave [existing issue] comments for another time; I just recorded them here since reviewing this PR made them visible.
NOTE : Some of the stress tests are failing for async_io after refactoring. Once that's fixed (most likely tomorrow), I'll address the comments and update the diff on Wednesday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished a complete pass.
// Check if the first buffer has the required offset and the async read is | ||
// still in progress. This should only happen if a prefetch was initiated | ||
// by Seek, but the next access is at another offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is poll supposed to happen if the access is at the same offset? It seems to happen here whenever there's two buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. So the data is overlapping between 2 buffers. Offset lies in first buffer so first buffer has the data but async is in progress for that buffer. So we call Poll here to get the data first and then copy the data to overlap_buf_.
We assumption was Poll is supposed to happen in PrefetchInternal
.
But for seek operation, we call PrefetchAsync
to submit async requests for all L0 files and L1-Ln levels. And then it calls again (second call) FilePrefetchBuffer to actually poll and read the data. It can be possible that second call wasn't made as data block was found in cache. So the buffer is never polled.
This case handle that condition.
In short - If buffer is needed but async is in progress, poll the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be possible that second call wasn't made as data block was found in cache.
The second call when there's a cache miss is like this, right? SeekSecondPass()
-> ... -> TryReadFromCache()
-> ... -> PrefetchInternal()
I am seeing the code here poll even in that case, which is fine but seems to contradict the "next access is at another offset" part of the comment
3332598
to
8eee7ec
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I don't have any concerns but hope we will continue simplifying it
file/file_prefetch_buffer.cc
Outdated
// In case buffers do not align, reset next buffer if requested data needs | ||
// to be read in that buffer. | ||
if (NumBuffersAllocated() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic feels more like it belongs to HandleOverlappingData()
. For the caller that does not also call HandleOverlappingData()
(PrefetchAsync()
), it looks like this might not be needed?
file/file_prefetch_buffer.cc
Outdated
// NOTE - After this poll request, first buffer might be empty because of | ||
// IOError in callback while reading or it may contains the required data. | ||
PollIfNeeded(offset, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above comment says "Call Poll only if data is needed for the second buffer". But here we call it the first buffer.
I sort of suspect the original comment referred to the "second buffer" as the second buffer from which bytes in [offset, offset + length)
are read. At least according to my understanding, at this point, the front buffer would only need to be polled if HandleOverlappingData()
had already freed the buffer it polled (the "first buffer" in my understanding).
file/file_prefetch_buffer.cc
Outdated
// NOTE - After this poll request, first buffer might be empty because of | ||
// IOError in callback while reading or it may contains the required data. | ||
PollIfNeeded(offset, length); | ||
} | ||
|
||
if (copy_to_third_buffer) { | ||
if (copy_to_overlap_buffer) { | ||
offset = tmp_offset; | ||
length = tmp_length; | ||
} | ||
|
||
// 4. After polling and swapping buffers, if all the requested bytes are in | ||
// curr_, it will only go for async prefetching. | ||
// copy_to_third_buffer is a special case so it will be handled separately. | ||
if (!copy_to_third_buffer && DoesBufferContainData(curr_) && | ||
IsDataBlockInBuffer(offset, length, curr_)) { | ||
offset += length; | ||
length = 0; | ||
|
||
// Since async request was submitted directly by calling PrefetchAsync in | ||
// last call, we don't need to prefetch further as this call is to poll | ||
// the data submitted in previous call. | ||
if (explicit_prefetch_submitted_) { | ||
return s; | ||
} | ||
if (!IsSecondBuffEligibleForPrefetching()) { | ||
UpdateStats(/*found_in_buffer=*/true, original_length); | ||
return s; | ||
// After polling, if all the requested bytes are in first buffer, it will only | ||
// go for async prefetching. | ||
if (buf->DoesBufferContainData()) { | ||
if (copy_to_overlap_buffer) { | ||
// Data is overlapping i.e. some of the data has been copied to overlap | ||
// buffer and remaining will be updated below. | ||
size_t initial_buf_size = overlap_buf_->CurrentSize(); | ||
CopyDataToBuffer(buf, offset, length); | ||
UpdateStats( | ||
/*found_in_buffer=*/false, | ||
overlap_buf_->CurrentSize() - initial_buf_size); | ||
|
||
// Length == 0: All the requested data has been copied to overlap buffer | ||
// and it has already gone for async prefetching. It can return without | ||
// doing anything further. Length > 0: More data needs to be consumed so | ||
// it will continue async and sync prefetching and copy the remaining data | ||
// to overlap buffer in the end. | ||
if (length == 0) { | ||
UpdateStats(/*found_in_buffer=*/true, length); | ||
return s; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this belongs in HandleOverlappingData()
. It's hard to say where that function should end and this one picks up. If we say logic specific to overlapping buffers belongs there, then we should probably move this.
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
8eee7ec
to
419fbd5
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
419fbd5
to
01aa328
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
01aa328
to
c4891d8
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@akankshamahajan15 merged this pull request in 5cb2d09. |
Summary - Refactor FilePrefetchBuffer code
FilePrefetchBuffer maintains a deque of free buffers (free_bufs_) of size num_buffers_ and buffers (bufs_) which contains the prefetched data. Whenever a buffer is consumed or is outdated (w.r.t. to requested offset), that buffer is cleared and returned to free_bufs_.
If a buffer is available in free_bufs_, it's moved to bufs_ and is sent for prefetching. num_buffers_ defines how many buffers are maintained that contains prefetched data.
If num_buffers_ == 1, it's a sequential read flow. Read API will be called on that one buffer whenever the data is requested and is not in the buffer.
If num_buffers_ > 1, then the data is prefetched asynchronosuly in the buffers whenever the data is consumed from the buffers and that buffer is freed.
If num_buffers > 1, then requested data can be overlapping between 2 buffers. To return the continuous buffer overlap_bufs_ is used. The requested data is copied from 2 buffers to the overlap_bufs_ and overlap_bufs_ is returned to
the caller.
Test Plan -