Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tightening_ratio support #26

Merged
merged 1 commit into from
Dec 6, 2024

Conversation

nnmehta
Copy link
Contributor

@nnmehta nnmehta commented Nov 28, 2024

Add tightening_ratio support for tuning of the TIGHENTING_RATIO to allow scalable bloom objects to have a higher capacity at cost of higher false positive rate.

We allow the TIGHTENING_RATIO argument only if the command is replicated.

@KarthikSubbarao
Copy link
Member

Can this PR be rebased? The digest change is merged in and looks like this PR contains commits from digest (which should be removed after rebasing)

Comment on lines 233 to 242
let tightening_ratio = match input_args[curr_cmd_idx].to_string_lossy().parse::<f64>() {
Ok(num) if num > TIGHTENING_RATIO_MIN && num < TIGHTENING_RATIO_MAX => num,
Ok(num) if !(num > TIGHTENING_RATIO_MIN && num < TIGHTENING_RATIO_MAX) => {
return Err(ValkeyError::Str(utils::ERROR_RATIO_RANGE));
}
_ => {
return Err(ValkeyError::Str(utils::BAD_ERROR_RATIO));
}
};
curr_cmd_idx += 1;
Copy link
Member

@KarthikSubbarao KarthikSubbarao Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be provided as a command argument for the BF.RESERVE command. We can remove this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will remove this

@@ -327,6 +341,18 @@ pub fn bloom_filter_insert(ctx: &Context, input_args: &[ValkeyString]) -> Valkey
}
};
}
"RATIO" => {
Copy link
Member

@KarthikSubbarao KarthikSubbarao Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Renaming to TIGHTENING makes this clearer
  2. Let's add a check to only allow this argument if the command is replicated. If it is not replicated, we should not accept it as command args. This means, if the command is on a primary, it will use the constant from the configs.rs file

@@ -228,6 +230,16 @@ pub fn bloom_filter_reserve(ctx: &Context, input_args: &[ValkeyString]) -> Valke
}
Copy link
Member

@KarthikSubbarao KarthikSubbarao Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to update the arity check at the beginning of the fn to handle the extra TIGHTENING argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this

src/configs.rs Outdated
@@ -32,6 +32,8 @@ lazy_static! {
// Tightening ratio used during scale out for the calculation of fp_rate of every new filter within a bloom object to
// maintain the bloom object's overall fp_rate to the configured value.
pub const TIGHTENING_RATIO: f64 = 0.5;
pub const TIGHTENING_RATIO_MIN: f64 = 0.001;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 0, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

@@ -2,6 +2,7 @@ use crate::bloom;
use crate::bloom::data_type::ValkeyDataType;
use crate::bloom::utils::BloomFilterType;
use crate::configs;
use crate::wrapper::digest::Digest;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are updating RDB Load, we need to update RDB Save to add the tightening ratio at the right place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update here in rdb_save

fn debug_digest(&self, mut dig: Digest) {
dig.add_long_long(self.expansion.into());
dig.add_string_buffer(&self.fp_rate.to_le_bytes());
// dig.add_string_buffer(&self.tightening_ratio.to_le_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add the tightening ratio to the digest. Why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing out something else and commented this briefly, will uncomment in future revision

@nnmehta nnmehta closed this Dec 3, 2024
@nnmehta nnmehta reopened this Dec 3, 2024
@nnmehta nnmehta force-pushed the tightening_ratio branch 2 times, most recently from 99acf6f to 078534f Compare December 3, 2024 21:20
@@ -299,14 +298,15 @@ pub fn bloom_filter_reserve(ctx: &Context, input_args: &[ValkeyString]) -> Valke
pub fn bloom_filter_insert(ctx: &Context, input_args: &[ValkeyString]) -> ValkeyResult {
let argc = input_args.len();
// At the very least, we need: BF.INSERT <key> ITEMS <item>
if argc < 4 {
if argc < 5 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on ctx.get_flags().contains(ContextFlags::REPLICATED), right?

If replica, < 5. If primary, < 4. Can we update it?

@@ -94,9 +98,11 @@ impl BloomFilterType {
// Create the bloom filter and add to the main BloomFilter object.
let bloom = BloomFilter::new(fp_rate, capacity);
let filters = vec![bloom];
let tightening_ratio = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be passed in as arguments from new_reserved. This should not be hard coded

Copy link
Contributor Author

@nnmehta nnmehta Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added it in this change but since you mentioned above:

 not be provided as a command argument for the BF.RESERVE command

I had to remove it wherever we use new_reserved

@@ -227,7 +227,6 @@ pub fn bloom_filter_reserve(ctx: &Context, input_args: &[ValkeyString]) -> Valke
return Err(ValkeyError::Str(utils::BAD_ERROR_RATE));
}
};
curr_cmd_idx += 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this back. It might have been accidentally removed in your PR

@@ -779,6 +794,7 @@ mod tests {

// verify new_bf and bf
assert_eq!(bf.fp_rate, new_bf.fp_rate);
assert_eq!(bf.tightening_ratio, new_bf.tightening_ratio);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the verify_restored_items function, can we assert on this?
You can also do this assert for the false positive rate

        assert_eq!(
            restored_bloom_filter_type.tightenting_ratio,
            original_bloom_filter_type.tightenting_ratio
        );

@@ -298,15 +302,17 @@ pub fn bloom_filter_reserve(ctx: &Context, input_args: &[ValkeyString]) -> Valke

pub fn bloom_filter_insert(ctx: &Context, input_args: &[ValkeyString]) -> ValkeyResult {
let argc = input_args.len();
let validate_size_limit = ctx.get_flags().contains(ContextFlags::REPLICATED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let validate_size_limit = ctx.get_flags().contains(ContextFlags::REPLICATED);
let replicated_cmd = ctx.get_flags().contains(ContextFlags::REPLICATED);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

Comment on lines 336 to 343
"TIGHTENING" => {
if !validate_size_limit {
return Err(ValkeyError::Str(utils::ERROR));
}
Copy link
Member

@KarthikSubbarao KarthikSubbarao Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"TIGHTENING" => {
if !validate_size_limit {
return Err(ValkeyError::Str(utils::ERROR));
}
"TIGHTENING" if replicated_cmd => {
if idx >= (argc - 1) {
return Err(ValkeyError::WrongArity);
}

// At the very least, we need: BF.INSERT <key> ITEMS <item>
if argc < 4 {
if !validate_size_limit && argc < 4 || validate_size_limit && argc < 5 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !validate_size_limit && argc < 4 || validate_size_limit && argc < 5 {
if argc < 4 {

This should handle it already. It is not required to contain tightening ratio on the replicated command - it is optional and we should handle cases where it provided and not provided.

The check above is sufficient

@@ -95,9 +100,11 @@ impl BloomFilterType {
// Create the bloom filter and add to the main BloomFilter object.
let bloom = BloomFilter::new(fp_rate, capacity);
let filters = vec![bloom];
let tightening_ratio = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Removing it

Suggested change
let tightening_ratio = 0.5;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed to remove this, it is not required

@@ -34,7 +34,7 @@ def test_replication_behavior(self):
('BF.ADD', 'BF.ADD key item', 'BF.ADD key item1', 2),
('BF.MADD', 'BF.MADD key item', 'BF.MADD key item1', 2),
('BF.RESERVE', 'BF.RESERVE key 0.001 100000', 'BF.ADD key item1', 1),
('BF.INSERT', 'BF.INSERT key items item', 'BF.INSERT key items item1', 2),
('BF.INSERT', 'BF.INSERT key items item tightening 0.5', 'BF.INSERT key items item1 tightening 0.5', 2),
Copy link
Member

@KarthikSubbarao KarthikSubbarao Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('BF.INSERT', 'BF.INSERT key items item tightening 0.5', 'BF.INSERT key items item1 tightening 0.5', 2),
('BF.INSERT', 'BF.INSERT key items item', 'BF.INSERT key items item1', 2),

This is an invalid test. The tightening ratio is coming after ITEM so it is treated as an item. It is not treated as a property. Also, we cannot use this command on the primary, since we only allow this arg during replication :D

Let's just use a normal BF.INSERT command.

Once we support the tightening ratio config, we can then (1) update the config on the primary node, (2) Use BF.INSERT (3) Validate Digest matches on the replica even though the tightening ratio config is different

For now, let us leave it as it is because it is essentially untested code (unless we use a monitor on replica during bloom creation on the primary node)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, will revert this change accordingly

@@ -642,12 +672,18 @@ mod tests {
let rand_prefix = random_prefix(7);
// 1 in every 1000 operations is expected to be a false positive.
let expected_fp_rate: f64 = 0.001;
let expected_tightening_ratio: f64 = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let expected_tightening_ratio: f64 = 0.5;
let tightening_ratio: f64 = 0.5;

@@ -584,13 +608,19 @@ mod tests {
let rand_prefix = random_prefix(7);
// 1 in every 1000 operations is expected to be a false positive.
let expected_fp_rate: f64 = 0.001;
let expected_tightening_ratio: f64 = 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let expected_tightening_ratio: f64 = 0.5;
let tightening_ratio: f64 = 0.5;

We are not asserting against this. so there is no need to call it "expected"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change accordingly

@@ -22,6 +25,8 @@ pub const BAD_EXPANSION: &str = "ERR bad expansion";
pub const BAD_CAPACITY: &str = "ERR bad capacity";
pub const BAD_ERROR_RATE: &str = "ERR bad error rate";
pub const ERROR_RATE_RANGE: &str = "ERR (0 < error rate range < 1)";
pub const BAD_ERROR_RATIO: &str = "ERR bad error ratio";
pub const ERROR_RATIO_RANGE: &str = "ERR (0 < error ratio range < 1)";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would having a better error message i.e. something like ratio must be in the range (0, 1) or is following convention for error rate range better?

@KarthikSubbarao KarthikSubbarao merged commit f10a4e5 into valkey-io:unstable Dec 6, 2024
6 checks passed
@nnmehta nnmehta deleted the tightening_ratio branch December 6, 2024 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants