Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection preassure for peers with long chainsync timeout. #4244

Open
karknu opened this issue Dec 28, 2022 · 2 comments · May be fixed by #4980
Open

Selection preassure for peers with long chainsync timeout. #4244

karknu opened this issue Dec 28, 2022 · 2 comments · May be fixed by #4980
Assignees
Labels
chain-sync high-priority high priority issues / PRs

Comments

@karknu
Copy link
Contributor

karknu commented Dec 28, 2022

When a connection is promoted to hot a timeout will be randomly picked from the array [90, 135, 180, 224, 269] to be used by the chainsync protocol. When there is a gap in block production the timeout will trigger and the peer will be demoted to cold. The idea is that during a gap in block production only a subset of peers will be replaced. This scheme works fine with the static peers when running in non-p2p mode.

In the p2p case there is a tendency for the set of hot peers to contain more and more peers with long chainsync timeout.
Example:
A node starts with 20 hot peers with the following timeouts [4 x 90, 4 x 135, 4 x 180, 4 x 224, 4 x 269].
There is a 91s long gap in block production. This means that the four peers with 90s timeout are replaced with four new peers with random timeouts. This happens for all p2p nodes, timed out peers are replaced with peers with new random timeouts.

This means that peers with large timeout accumulates in the set of hot peers in all nodes. When a 224s gap finally happens it isn't 20% of all peers being replaced but it could be 30% or 40%.

Instead of using a constant timeout for the lifetime of the connections it would be better if a timeout could be randomly picked by the chainsync protocol as it prepares to wait for the peer to present it with a new tip.

@karknu karknu changed the title Selection preassure for peers with long chainsync timeout. (p2p) Selection preassure for peers with long chainsync timeout. Dec 28, 2022
@karknu
Copy link
Contributor Author

karknu commented Dec 28, 2022

For the non-p2p case, a connection with an expired short timeout is replaced with a new connection to the same peer with new random timeout. This means that the drive to end up with connections with long chainsync timeout is present in the non-p2p case too.

@coot
Copy link
Contributor

coot commented Dec 29, 2022

Nice discovery! Yes we should indeed draw the timeout in StMustReply state of chain-sync mini-protocol.

@karknu karknu mentioned this issue Mar 28, 2023
11 tasks
@coot coot moved this to In Progress in Ouroboros Network Oct 3, 2024
@coot coot self-assigned this Oct 3, 2024
@coot coot linked a pull request Oct 3, 2024 that will close this issue
9 tasks
@coot coot added the high-priority high priority issues / PRs label Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chain-sync high-priority high priority issues / PRs
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants