You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm operating multiple Base and Optimism archive nodes. During this process, I am encountering persistent issues where block synchronization slows down significantly.
Describe the solution you'd like
I would like to improve the synchronization speed of both the Base Mainnet and Optimism Mainnet archive nodes.
Describe alternatives you've considered
While observing the logs of the op-node during periods of slow synchronization, I noticed the following warning message appearing multiple times:
lvl=warn msg="Engine temporary error" err="temporarily cannot insert new safe block: failed to create new block via forkchoice: context deadline exceeded"
This error occurs when the engine_forkchoiceUpdatedV3 RPC call exceeds the configured timeout (5 seconds) and results in a request timeout. This behavior is intentional and expected, but when the request times out, the op-node retries the same RPC request, which introduces additional delay.
To address this, I experimented by increasing the timeout from 5 seconds to 30 seconds while synchronizing the Base Mainnet archive node. This change proved effective. The experiment involved three Base Mainnet archive nodes:
Blue Node: The node that was synchronizing well (no special settings were applied to it other than displaying the latest block number).
Yellow Node: The node with synchronization issues (timeout increased from 5 seconds to 30 seconds).
Green Node: The node using the official image without any modifications.
After one week of monitoring synchronization, as shown in the attached graph:
The Yellow Node completed synchronization up to the latest block.
The Green Node still had not synchronized because its synchronization speed matched the block generation rate.
Further investigation through trace logs revealed that for blocks containing transactions with hundreds of logs, the engine_forkchoiceUpdatedV3 RPC call could take more than 10 seconds. (DEBUG[01-19|05:13:53.027] Served engine_forkchoiceUpdatedV3 reqid=4941 duration=20.337986076s)
Based on these findings, I suggest the following improvements:
Add a flag to make the timeout adjustable, allowing users to set it according to their needs.
For full node operators, this change may not have a significant impact since they are already synchronizing well.
However, for archive node operators experiencing slow synchronization, this flexible timeout could be beneficial.
Importantly, this change should not negatively impact any operators.
Additional context
The versions used for testing were op-node v1.10.2 and op-geth v1.101411.4, with all other specs and flags remaining the same except for the timeout. The experiment was conducted on an in-house server.
I am running nodes both on my own server and in an AWS EC2 environment, and both environments are experiencing similar synchronization speed degradation. However, one unusual observation is that once the node catches up to the latest block, synchronization degradation does not occur frequently. But when the node is restarted, the blocks start to fall behind again. The proposed changes were helpful in this scenario.
Additionally, I also attempted to run archive nodes with reth, but due to its behavior of syncing multiple blocks at once, I observed that the gap would grow by about 10 blocks, then close again, repeating this cycle. As a result, I'm currently using op-geth.
The text was updated successfully, but these errors were encountered:
I’ve created a #13853 PR that implements the proposed changes. It adds a configurable timeout flag for the L2 engine to improve synchronization flexibility. Please have a look and let me know your thoughts!
Is your feature request related to a problem? Please describe.
I'm operating multiple Base and Optimism archive nodes. During this process, I am encountering persistent issues where block synchronization slows down significantly.
Describe the solution you'd like
I would like to improve the synchronization speed of both the Base Mainnet and Optimism Mainnet archive nodes.
Describe alternatives you've considered
While observing the logs of the op-node during periods of slow synchronization, I noticed the following warning message appearing multiple times:
This error occurs when the
engine_forkchoiceUpdatedV3
RPC call exceeds the configured timeout (5 seconds) and results in a request timeout. This behavior is intentional and expected, but when the request times out, the op-node retries the same RPC request, which introduces additional delay.To address this, I experimented by increasing the timeout from 5 seconds to 30 seconds while synchronizing the Base Mainnet archive node. This change proved effective. The experiment involved three Base Mainnet archive nodes:
Blue Node: The node that was synchronizing well (no special settings were applied to it other than displaying the latest block number).
Yellow Node: The node with synchronization issues (timeout increased from 5 seconds to 30 seconds).
Green Node: The node using the official image without any modifications.
After one week of monitoring synchronization, as shown in the attached graph:
The Yellow Node completed synchronization up to the latest block.
The Green Node still had not synchronized because its synchronization speed matched the block generation rate.
Further investigation through trace logs revealed that for blocks containing transactions with hundreds of logs, the
engine_forkchoiceUpdatedV3
RPC call could take more than 10 seconds. (DEBUG[01-19|05:13:53.027] Served engine_forkchoiceUpdatedV3 reqid=4941 duration=20.337986076s
)Based on these findings, I suggest the following improvements:
Add a flag to make the timeout adjustable, allowing users to set it according to their needs.
For full node operators, this change may not have a significant impact since they are already synchronizing well.
However, for archive node operators experiencing slow synchronization, this flexible timeout could be beneficial.
Importantly, this change should not negatively impact any operators.
Additional context
The versions used for testing were op-node v1.10.2 and op-geth v1.101411.4, with all other specs and flags remaining the same except for the timeout. The experiment was conducted on an in-house server.
I am running nodes both on my own server and in an AWS EC2 environment, and both environments are experiencing similar synchronization speed degradation. However, one unusual observation is that once the node catches up to the latest block, synchronization degradation does not occur frequently. But when the node is restarted, the blocks start to fall behind again. The proposed changes were helpful in this scenario.
Additionally, I also attempted to run archive nodes with reth, but due to its behavior of syncing multiple blocks at once, I observed that the gap would grow by about 10 blocks, then close again, repeating this cycle. As a result, I'm currently using op-geth.
The text was updated successfully, but these errors were encountered: