Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staking Elections: Consider removing validators with no points #5674

Open
kianenigma opened this issue Sep 11, 2024 · 8 comments · May be fixed by #7128
Open

Staking Elections: Consider removing validators with no points #5674

kianenigma opened this issue Sep 11, 2024 · 8 comments · May be fixed by #7128
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK.

Comments

@kianenigma
Copy link
Contributor

kianenigma commented Sep 11, 2024

As reported recently by @eskimor, some both Polkadot and Kusama have some bad validators that produce no blocks. This could be because of slow hardware.

In principle, these validators are sub-optimal to nominate, because they produce no staking rewards. So we hope nominators will filter them out. A mechanism like polkadot-fellows/RFCs#104 could help further.

Nonetheless, pallet-staking can become slightly more proactive, and itself chill validators who were elected for a number of eras, but didn't gain any points.

This help Polkadot be super sure that the average block time is not going above 6s.

Bad validators, you've got 14 400 blocks a day, around 48 blocks per validator, so 1 validator bad you get 48 missed blocks.The spikes seem to show 3 bad validators and normally seems to be around 48.

This shows me there is one bad validator in polkadot now: https://apps.turboflakes.io/?chain=polkadot#/insights, that is not getting any points so it is not producing any blocks

This guys has been getting 0 points in the past 32 eras: https://apps.turboflakes.io/?chain=polkadot#/validator/5DoG4qkLsAQBj69i2Uo2k2LBfRrWp7BhqjJuGcPQrSr6yE6P?mode=history.Most likely unmaintained rather than malicious, so kicking him out would probably help here.
Interesting this validator got nominated by a single account with 3M dots: https://polkadot.subscan.io/validator/12jZDB1QiwffAdADz7r2tBALX3rAWQjqvE3PRuNmQXsd9pnwAnd the account nominated around 16 validators: https://polkadot.subscan.io/nominator/14Ns6kKbCoka3MS4Hn6b7oRw9fFejG8RH5rq5j63cWUfpPDJ?tab=vote, so it really won't notice its rewards dropping a bit because the rest of nomination would still get him rewards, so it is an argument for having some automated logic.

Potential solution: see #5674 (comment)

@sandreim
Copy link
Contributor

@kianenigma Is this being worked on or planning to start soon ? I think it should be prioritised given how often this bites us.

@burdges
Copy link

burdges commented Sep 21, 2024

Yes, this sounds useful. Can they unchill themselves easily? Or do we do it when they change their session keys?

@kianenigma
Copy link
Contributor Author

I don't see the bandwidth for it in the Runtime function atm, but I can offer two options:

  1. @maciejhirsz this is closely related to work you plan to do around https://github.com/paritytech-secops/srlabs_findings/issues/417. It is in the same code path, and tackling this issue can be a great warm-up task. WDYT?
  2. I can assign a new joiner to work on this, if permitted. This is less reliable.

@kianenigma
Copy link
Contributor Author

In my original issue, I said "chill validators that produce no staking reward".

This is the harsher, radical approach. It is easier weight-wise to implement: we can have a #[pallet::task] or on_idle that regularly looks at validators who were active (a key in Exposures storage), but have no reward points in the past eras, and force chill them.

A more mild one would be that we let them remain a validator, so that people can nominate them and such, but in the validator snapshot aka. pre-sorting step, we remove them. This is more difficult to implement weight-wise.

@gpestana can advise you if the weight will be an issue.

@burdges
Copy link

burdges commented Oct 25, 2024

I do like the idea intuitively, but..

I'd suggest "too few points from X, Y, and Z" instead of "no points" per se, although disputes slashing could be replaced by negative points, which maybe confuses this. Anyways..

As a rule, relay chain block production matters somewhat, but you can miss your few slots easily. I'd ignore backing rewards because approvals matters far more and backing must never be more benefitial than approvals. See polkadot-fellows/RFCs#119

Approvals require the median computation given in polkadot-fellows/RFCs#119 but then removing would become a simple majority vote. yikes! Instead, we could remove when the same computation finds too poor an approval score at the 2/3rd percentile. I'll caution this runs like 1 full session later, so occurs only after one full session.

We've no rewards for grandpa or beefy, but maybe in the future, and they must work like approvals via median computations.

As RFCs, I'd suggest merging this into polkadot-fellows/RFCs#119 or making a followup RFC, because likely this should use the approval rewards, at least initially.

@Ank4n
Copy link
Contributor

Ank4n commented Nov 4, 2024

In my original issue, I said "chill validators that produce no staking reward".

This is the harsher, radical approach. It is easier weight-wise to implement: we can have a #[pallet::task] or on_idle that regularly looks at validators who were active (a key in Exposures storage), but have no reward points in the past eras, and force chill them.

A more mild one would be that we let them remain a validator, so that people can nominate them and such, but in the validator snapshot aka. pre-sorting step, we remove them. This is more difficult to implement weight-wise.

@gpestana can advise you if the weight will be an issue.

Another possible alternative:

A new extrinsic, chill_inactive, that accepts a validator and a proof of zero era points. The proof can simply be a vec of x eras within the last 84 eras where they have zero points, where x is the threshold to chill an inactive validator.

This would be straightforward to implement, and since such cases should be rare, there’s no need for on-chain logic to actively detect them. A side effect of this approach is that, say if we set x = 2, any validator with 0 points for 2 eras can be chilled by anyone until those 0-point eras are cleared from the state. This could serve as a nice punishment; however, if this is a problem, it should be trivial to store the era in which a validator set their validation intention and only check for zero-point eras occurring post that.

@Ank4n
Copy link
Contributor

Ank4n commented Nov 4, 2024

@michalisFr reported another validator that might have gone permanently offline but never removed their intention to validate.

This validator issued their validation intention 4 years ago and since then there's been no activity in the account. It doesn't seem they participated much in the active set after that, until era 1608 last week, when they generated 0 era points.

@kianenigma kianenigma added C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK. labels Nov 5, 2024
@aurexav
Copy link
Contributor

aurexav commented Jan 10, 2025

A new extrinsic, chill_inactive, that accepts a validator and a proof of zero era points. The proof can simply be a vec of x eras within the last 84 eras where they have zero points, where x is the threshold to chill an inactive validator.

Would like to pick this issue up. And take this approach.

@aurexav aurexav linked a pull request Jan 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. C2-good-first-issue A task for a first time contributor to become familiar with the Polkadot-SDK.
Projects
Status: 📕 Backlog
Development

Successfully merging a pull request may close this issue.

5 participants