-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce NewMultiCfIterator API #12153
Conversation
c21d080
to
0401111
Compare
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
} | ||
} | ||
|
||
TEST_F(MultiCfIteratorTest, DISABLED_IterateAttributeGroups) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will re-enable it when attribute_groups()
are implemented. Just to illustrate what they would return when implemented.
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
cd296f8
to
cfc9225
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
cfc9225
to
34929f9
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -238,9 +238,17 @@ class AttributeGroup { | |||
WideColumns columns_; | |||
}; | |||
|
|||
inline bool operator==(const AttributeGroup& lhs, const AttributeGroup& rhs) { | |||
return lhs.column_family() == rhs.column_family() && | |||
lhs.columns() == rhs.columns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ==
on WideColumns make sense? There is no documented canonical ordering, so the same set of column->value pairs could compare as !=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Columns are always sorted (by name) before added. Not sure if I understand your concern here. Could you provide a quick example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data structure is fully defined in wide_columns.h. It is therefore reusable. This (reusable / general) comparison operator assumes an invariant on the data that the class itself does not describe. That mismatch (or omission) is asking for mismatch in expectations (potential bug).
Should this invariant always be true of WideColumns? I don't know, but that should be clear to users and developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an invariant on the data that the class itself does not describe
column names and values are always compared bytewise and it's commented in the class, but I think we can do better documentation.
.. that should be clear to users and developers.
Are you worried that it is not clear enough for users and developers to know that WideColumns are always sorted by column names (by bytewise comparison) and both name and value of all columns should match for two WideColumns to be considered equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an invariant on the data that the class itself does not describe
column names and values are always compared bytewise and it's commented in the class, but I think we can do better documentation.
Which class? The only reference I see to something like "compared bytewise" is on operator==
for WideColumn, which is outside the class. There is no reference to sorting or ordering.
Are you worried that it is not clear enough for users and developers to know that WideColumns are always sorted by column names
I don't see any documentation (APIs in db.h, wide_columns.h or wiki) indicating that sortedness is expected or provided, or whether perhaps the user ordering provided in PutEntity is preserved. I don't even see clear documentation of whether PutEntity completely overwrites a previous PutEntity on the same key or only on a column-by-column basis.
Nor exactly what Delete() does after PutEntity, just "works the same way as for regular key-values." So does it delete only the default column (since you can't specify a column)?
CC @ltamasi
and both name and value of all columns should match for two WideColumns to be considered equal?
The presumption for operator==
in C++ is structural equality. The question is what relationship that has to semantic equality, which can be different in various contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any documentation (APIs in db.h, wide_columns.h or wiki) indicating that sortedness is expected or provided, or whether perhaps the user ordering provided in PutEntity is preserved. I don't even see clear documentation of whether PutEntity completely overwrites a previous PutEntity on the same key or only on a column-by-column basis.
Nor exactly what Delete() does after PutEntity, just "works the same way as for regular key-values." So does it delete only the default column (since you can't specify a column)?
CC @ltamasi
These points are mostly covered by the API comments at https://github.com/facebook/rocksdb/blob/main/include/rocksdb/db.h#L430-L459 , which, similarly to those of Put
, use the concept of database entries associated with keys. After a PutEntity
, the key is associated with that entity, regardless of any previous value (whether it's a plain KV, an existing entity, or nothing at all); after a Delete
, the key is not associated with any database entry, again regardless of any previous value. There is no expectation that the input WideColumns
structure is sorted, although it does get sorted by RocksDB, and thus anything read back from RocksDB will be sorted. We can extend the comments to clarify this point.
@jaykorean Sorry about the delay! Will review the PR shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltamasi If I was expecting PutEntity to do partial update, I might assume it's sloppy language, overly copying from Put(), even if it's technically correct and consistent. "will be overwritten" doesn't always mean entirely overwritten.
I think part of the confusion is singular "entry" and "entity" when dealing with plural things. Even if the description of Delete() is technically correct, it's not clear that it can deal with plurals like wide columns, especially since there's no DeleteEntity (like PutEntity / GetEntity). A reasonable user would doubt whether it handles new functionality even if its un-updated description technically implies it does.
34929f9
to
a703206
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
a703206
to
1139dab
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
…nified experience
1d33aa5
to
d221bf3
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
461589c
to
8a22393
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the PR title and description (and re-import or copy to phabricator), but otherwise LGTM. (Assumes for now we are moving toward the unified iterator experience.) Thanks!
@jaykorean merged this pull request in 3412195. |
Summary: This PR continues #12153 by implementing the missing `Iterator` APIs - `Seek()`, `SeekForPrev()`, `SeekToLast()`, and `Prev`. A MaxHeap Implementation has been added to handle the reverse direction. The current implementation does not include upper/lower bounds yet. These will be added in subsequent PRs. The API is still marked as under construction and will be lifted after being added to the stress test. Please note that changing the iterator direction in the middle of iteration is expensive, as it requires seeking the element in each iterator again in the opposite direction and rebuilding the heap along the way. The first `Next()` after `SeekForPrev()` requires changing the direction under the current implementation. We may optimize this in later PRs. Pull Request resolved: #12422 Test Plan: The `multi_cf_iterator_test` has been extended to cover the API implementations. Reviewed By: pdillinger Differential Revision: D54820754 Pulled By: jaykorean fbshipit-source-id: 9eb741508df0f7bad598fb8e6bd5cdffc39e81d1
NOTE: The behavior described below has changed in #12480 . Please refer to the PR for the expected behavior of the iterator.
This PR introduces a new public API,
NewMultiCfIterator()
. The new API takes a vector of column family handles to build a cross-column-family iterator, which internally maintains multipleDBIter
s as child iterators from a consistent database state.When a key exists in multiple column families, the iterator selects the value (and wide columns) from the first column family containing the key, following the order provided in theSimilar to the merging iterator, a min heap is used to iterate across the child iterators. Backward iteration and direction change functionalities will be implemented in future PRs.column_families
parameter.The comparator used to compare keys across different column families will be derived from the iterator of the first column family specified in
column_families
. This comparator will be checked against the comparators from all other column families that the iterator will traverse. If there's a mismatch with any of the comparators, the initialization of the iterator will fail.Please note that this PR is not enough for users to start using the new API. The new API and
MultiCfIterator
implementations are still marked as "DO NOT USE - UNDER CONSTRUCTION". This PR is just the first of many PRs that will follow soon.This PR includes the following:
MultiCfIterator
, which implements the genericIterator
interface. The implementation includes the construction of the iterator,SeekToFirst()
,Next()
,Valid()
,key()
,value()
, andcolumns()
.