Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread 'tokio-runtime-worker' panicked at 'Storage root must match that calculated.' #655

Closed
alex-drocks opened this issue Jul 22, 2024 · 11 comments · Fixed by #714
Closed
Labels
bug Something isn't working

Comments

@alex-drocks
Copy link

alex-drocks commented Jul 22, 2024

Describe the bug

Local subtensor node found out of sync this morning, with an error log like this:

2024-07-22 05:01:38 Evicting failed runtime instance error=Runtime panicked: Storage root must match that calculated.
2024-07-22 05:01:38 Block prepare storage changes error: Error at calling runtime api: Execution failed: Runtime panicked: Storage root must match that calculated.
2024-07-22 05:01:41 Report 12D3KooWCqAEBKQ5w9B7KRifD5MooAkDKrVoMTePy5vAD2GPWKLD: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.
2024-07-22 05:01:41 :broken_heart: Error importing block 0x10efd9db2309de9477a7574bc5a85676104d58196177440972a1893f70a83cb0: block has an unknown parent

Then restarted it and it was throwing tons of errors like this after the restart:

2024-07-22 14:39:52 StakeRemoved( hotkey:a02cc4cd145f455f02c88fdc7c5f94275bdddbe5472f933caa3f7f4ef1ba6e53 (5FgipPNQ...), stake_to_be_removed:9839315 )
2024-07-22 14:39:52 do_remove_stake( origin:1c7d4b5248f2ae72cb0c5c6c03f274d12ec7efbb1ee34628b0736ad8fd79e26b (5Ci4RCiG...) hotkey:e8498f8ba1ea025cfba7bee38a803c6e272aa02eb2f19de12e7aa26fa381d90c (5HKGpRj4...), stake_to_be_removed:0 )
Hash: given=2ad52ae1bec16b59f330ae22c3a0e74306e3a9829bc92b8a9d4447f6785f3994, expected=aae074ac2f1d2a2e978c18a374a260653093271c438f2173fa9eb0ae90c64d93

====================

Version: 4.0.0-dev-unknown

   0: sp_panic_handler::set::{{closure}}
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic_handler::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: rust_begin_unwind
   5: core::panicking::panic_fmt
   6: <node_subtensor_runtime::Runtime as sp_api::runtime_decl_for_core::CoreV5<sp_runtime::generic::block::Block<sp_runtime::generic::header::Header<u32,sp_runtime::traits::BlakeTwo256>,sp_runtime::generic::unchecked_extrinsic::UncheckedExtrinsic<sp_runtime::multiaddress::MultiAddress<<<sp_runtime::MultiSignature as sp_runtime::traits::Verify>::Signer as sp_runtime::traits::IdentifyAccount>::AccountId,()>,node_subtensor_runtime::RuntimeCall,sp_runtime::MultiSignature,(frame_system::extensions::check_non_zero_sender::CheckNonZeroSender<node_subtensor_runtime::Runtime>,frame_system::extensions::check_spec_version::CheckSpecVersion<node_subtensor_runtime::Runtime>,frame_system::extensions::check_tx_version::CheckTxVersion<node_subtensor_runtime::Runtime>,frame_system::extensions::check_genesis::CheckGenesis<node_subtensor_runtime::Runtime>,frame_system::extensions::check_mortality::CheckMortality<node_subtensor_runtime::Runtime>,node_subtensor_runtime::check_nonce::CheckNonce<node_subtensor_runtime::Runtime>,frame_system::extensions::check_weight::CheckWeight<node_subtensor_runtime::Runtime>,pallet_transaction_payment::ChargeTransactionPayment<node_subtensor_runtime::Runtime>,pallet_subtensor::SubtensorSignedExtension<node_subtensor_runtime::Runtime>,pallet_commitments::CommitmentsSignedExtension<node_subtensor_runtime::Runtime>)>>>>::execute_block
   7: node_subtensor_runtime::api::dispatch
   8: sc_executor::executor::WasmExecutor<H>::with_instance::{{closure}}
   9: <sc_executor::executor::NativeElseWasmExecutor<D> as sp_core::traits::CodeExecutor>::call
  10: sp_state_machine::execution::StateMachine<B,H,Exec>::execute
  11: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
  12: <node_subtensor_runtime::RuntimeApiImpl<__SrApiBlock__,RuntimeApiImplCall> as sp_api::Core<__SrApiBlock__>>::__runtime_api_internal_call_api_at::{{closure}}
  13: <node_subtensor_runtime::RuntimeApiImpl<__SrApiBlock__,RuntimeApiImplCall> as sp_api::Core<__SrApiBlock__>>::__runtime_api_internal_call_api_at
  14: sp_api::Core::execute_block
  15: <&sc_service::client::client::Client<B,E,Block,RA> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  16: <sc_consensus_grandpa::import::GrandpaBlockImport<BE,Block,Client,SC> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  17: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  18: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  19: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  20: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  21: tokio::runtime::task::raw::poll
  22: std::sys_common::backtrace::__rust_begin_short_backtrace
  23: core::ops::function::FnOnce::call_once{{vtable.shim}}
  24: std::sys::pal::unix::thread::Thread::new::thread_start
  25: start_thread
  26: clone


Thread 'tokio-runtime-worker' panicked at 'Storage root must match that calculated.', /root/.cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/7049c3c/substrate/frame/executive/src/lib.rs:834

This is a bug. Please report it at:

        support.anonymous.an

2024-07-22 14:39:52 do_remove_stake( origin:b6f85d613d4c2b961e171644c4ee2de0998c309539a86b0bcfd883fd05074b5e (5GCcMN81...) hotkey:00420dfed7ec724228e8551d64827ffbb1f00ad39f14c64b36ef1a26c23ad87a (5C53UmH5...), stake_to_be_removed:53921232 )
2024-07-22 14:39:52 StakeRemoved( hotkey:00420dfed7ec724228e8551d64827ffbb1f00ad39f14c64b36ef1a26c23ad87a (5C53UmH5...), stake_to_be_removed:53921232 )
2024-07-22 14:39:52 Evicting failed runtime instance error=Runtime panicked: Storage root must match that calculated.
2024-07-22 14:39:52 Block prepare storage changes error: Error at calling runtime api: Execution failed: Runtime panicked: Storage root must match that calculated.
2024-07-22 14:39:52 💔 Error importing block 0x16774dea580287b84a2216cdaf0ff767bee8718a8e0c4b8c5e62788b23137794: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Runtime panicked: Storage root must match that calculated.
2024-07-22 14:39:52 do_set_weights( origin:522f432f51287df57711a3a00f6c7a334b0c813ff178c22649f01566ef4a5d48 (5DvTpini...) netuid:7

To Reproduce

Running local subtensor node on ryzen 7950x Debian with this:

cd ~/subtensor/
git fetch --all
git reset --hard origin/main
git checkout main
git pull
docker compose down
docker system prune --all
docker volume rm $(docker volume ls -q --filter name=sub*)
sudo ./scripts/run/subtensor.sh -e docker --network mainnet --node-type lite
# logs
docker logs -f --tail 10 subtensor-mainnet-lite

Expected behavior

No panick error logs

Screenshots

image

Environment

Linux Debian 12

Additional context

Node was up and running for a few days.

This morning at 5am EST, I tried btcli w balance command and saw outdated data so I inspected the subtensor logs.

I had this error in the logs:

2024-07-22 05:01:38 Evicting failed runtime instance error=Runtime panicked: Storage root must match that calculated.
2024-07-22 05:01:38 Block prepare storage changes error: Error at calling runtime api: Execution failed: Runtime panicked: Storage root must match that calculated.
2024-07-22 05:01:41 Report 12D3KooWCqAEBKQ5w9B7KRifD5MooAkDKrVoMTePy5vAD2GPWKLD: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.
2024-07-22 05:01:41 :broken_heart: Error importing block 0x10efd9db2309de9477a7574bc5a85676104d58196177440972a1893f70a83cb0: block has an unknown parent

After seeing that it was broken (didn't sync to the chain properly afterwards) I just restarted the node and then it had thousands of lines per second containing the error:

2024-07-22 14:39:52 StakeRemoved( hotkey:a02cc4cd145f455f02c88fdc7c5f94275bdddbe5472f933caa3f7f4ef1ba6e53 (5FgipPNQ...), stake_to_be_removed:9839315 )
2024-07-22 14:39:52 do_remove_stake( origin:1c7d4b5248f2ae72cb0c5c6c03f274d12ec7efbb1ee34628b0736ad8fd79e26b (5Ci4RCiG...) hotkey:e8498f8ba1ea025cfba7bee38a803c6e272aa02eb2f19de12e7aa26fa381d90c (5HKGpRj4...), stake_to_be_removed:0 )
Hash: given=2ad52ae1bec16b59f330ae22c3a0e74306e3a9829bc92b8a9d4447f6785f3994, expected=aae074ac2f1d2a2e978c18a374a260653093271c438f2173fa9eb0ae90c64d93

====================

Version: 4.0.0-dev-unknown

   0: sp_panic_handler::set::{{closure}}
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic_handler::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: rust_begin_unwind
   5: core::panicking::panic_fmt
   6: <node_subtensor_runtime::Runtime as sp_api::runtime_decl_for_core::CoreV5<sp_runtime::generic::block::Block<sp_runtime::generic::header::Header<u32,sp_runtime::traits::BlakeTwo256>,sp_runtime::generic::unchecked_extrinsic::UncheckedExtrinsic<sp_runtime::multiaddress::MultiAddress<<<sp_runtime::MultiSignature as sp_runtime::traits::Verify>::Signer as sp_runtime::traits::IdentifyAccount>::AccountId,()>,node_subtensor_runtime::RuntimeCall,sp_runtime::MultiSignature,(frame_system::extensions::check_non_zero_sender::CheckNonZeroSender<node_subtensor_runtime::Runtime>,frame_system::extensions::check_spec_version::CheckSpecVersion<node_subtensor_runtime::Runtime>,frame_system::extensions::check_tx_version::CheckTxVersion<node_subtensor_runtime::Runtime>,frame_system::extensions::check_genesis::CheckGenesis<node_subtensor_runtime::Runtime>,frame_system::extensions::check_mortality::CheckMortality<node_subtensor_runtime::Runtime>,node_subtensor_runtime::check_nonce::CheckNonce<node_subtensor_runtime::Runtime>,frame_system::extensions::check_weight::CheckWeight<node_subtensor_runtime::Runtime>,pallet_transaction_payment::ChargeTransactionPayment<node_subtensor_runtime::Runtime>,pallet_subtensor::SubtensorSignedExtension<node_subtensor_runtime::Runtime>,pallet_commitments::CommitmentsSignedExtension<node_subtensor_runtime::Runtime>)>>>>::execute_block
   7: node_subtensor_runtime::api::dispatch
   8: sc_executor::executor::WasmExecutor<H>::with_instance::{{closure}}
   9: <sc_executor::executor::NativeElseWasmExecutor<D> as sp_core::traits::CodeExecutor>::call
  10: sp_state_machine::execution::StateMachine<B,H,Exec>::execute
  11: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
  12: <node_subtensor_runtime::RuntimeApiImpl<__SrApiBlock__,RuntimeApiImplCall> as sp_api::Core<__SrApiBlock__>>::__runtime_api_internal_call_api_at::{{closure}}
  13: <node_subtensor_runtime::RuntimeApiImpl<__SrApiBlock__,RuntimeApiImplCall> as sp_api::Core<__SrApiBlock__>>::__runtime_api_internal_call_api_at
  14: sp_api::Core::execute_block
  15: <&sc_service::client::client::Client<B,E,Block,RA> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  16: <sc_consensus_grandpa::import::GrandpaBlockImport<BE,Block,Client,SC> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  17: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  18: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  19: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  20: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  21: tokio::runtime::task::raw::poll
  22: std::sys_common::backtrace::__rust_begin_short_backtrace
  23: core::ops::function::FnOnce::call_once{{vtable.shim}}
  24: std::sys::pal::unix::thread::Thread::new::thread_start
  25: start_thread
  26: clone


Thread 'tokio-runtime-worker' panicked at 'Storage root must match that calculated.', /root/.cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/7049c3c/substrate/frame/executive/src/lib.rs:834

This is a bug. Please report it at:

        support.anonymous.an

2024-07-22 14:39:52 do_remove_stake( origin:b6f85d613d4c2b961e171644c4ee2de0998c309539a86b0bcfd883fd05074b5e (5GCcMN81...) hotkey:00420dfed7ec724228e8551d64827ffbb1f00ad39f14c64b36ef1a26c23ad87a (5C53UmH5...), stake_to_be_removed:53921232 )
2024-07-22 14:39:52 StakeRemoved( hotkey:00420dfed7ec724228e8551d64827ffbb1f00ad39f14c64b36ef1a26c23ad87a (5C53UmH5...), stake_to_be_removed:53921232 )
2024-07-22 14:39:52 Evicting failed runtime instance error=Runtime panicked: Storage root must match that calculated.
2024-07-22 14:39:52 Block prepare storage changes error: Error at calling runtime api: Execution failed: Runtime panicked: Storage root must match that calculated.
2024-07-22 14:39:52 💔 Error importing block 0x16774dea580287b84a2216cdaf0ff767bee8718a8e0c4b8c5e62788b23137794: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Runtime panicked: Storage root must match that calculated.
2024-07-22 14:39:52 do_set_weights( origin:522f432f51287df57711a3a00f6c7a334b0c813ff178c22649f01566ef4a5d48 (5DvTpini...) netuid:7
@alex-drocks alex-drocks added the bug Something isn't working label Jul 22, 2024
@alex-drocks
Copy link
Author

Similar to #635

@liho00
Copy link

liho00 commented Jul 22, 2024

same issue here +1

@alex-drocks
Copy link
Author

alex-drocks commented Jul 25, 2024

This issue came back again after wiping my folders and using the rust based subtensor instead of docker 😦
Node breaks after a certain point 😦

I got the problem with a ryzen 7950x Debian using Docker subtensor (original issue) AND then wiped out disk to reinstall with Rust based subtensor also.

Just also found two other servers on different providers local subtensor out of sync.

All same cause

@Thanos30
Copy link

Facing the same problem

@tavurth
Copy link

tavurth commented Jul 30, 2024

This is a huge problem for us. We've been seeing this on our own subtensor node, and across many validators on the subnet, i.e. this is a bittensor-breaking issue which should be addressed ASAP to maintain the integrity of the entire chain.

So with that said, let's try to be constructive here:

  1. Is there a known workaround?
  2. Is there a known working commit?

Currently, latest main is not runnable on our machines, that's using docker or directly compiling from main...

@distributedstatemachine any ideas?

@Thanos30
Copy link

Thanos30 commented Jul 30, 2024

For anyone still experiencing this issue, I pulled Release v1.1.1, cleaned blockchain data, rebuilt cargo and started the subtensor again and it works fine.

I am using the subtensor only to fetch data from the chain and for no other reason. If you are using the subtensor for any other reason, I can't guarantee this release is gonna work smoothly for you.

Workaround until this issue is fixed.

@alex-drocks
Copy link
Author

For anyone still experiencing this issue, I pulled Release v1.1.1, cleaned blockchain data, rebuilt cargo and started the subtensor again and it works fine.

I am using the subtensor only to fetch data from the chain and for no other reason. If you are using the subtensor for any other reason, I can't guarantee this release is gonna work smoothly for you.

Workaround until this issue is fixed.

My node will run for about 24h normally before it encounters the error again when starting from scratch with a clean install.
Have yours been working for longer than this?

@Thanos30
Copy link

Thanos30 commented Jul 30, 2024

The past week that's what was happening for me as well, but yesterday it wouldn't even work for a while, it immediately started spamming the same errors.

With the old Release, no issues for 24 hours so far, and the node is syncing fine.

@tavurth
Copy link

tavurth commented Jul 31, 2024

With the old Release, no issues for 24 hours so far, and the node is syncing fine.

Yep can also confirm 1.1.1 is working for me:

cd subtensor
git fetch --all
git reset --hard 703bdbd178c5d6e64edd4ffcf69aa49d63349290
cargo build --release --features pow-faucet

... Restart

703bdbd

@orriin
Copy link
Contributor

orriin commented Jul 31, 2024

Confirming a workaround is to use an old node version, it is fully compatible with the latest runtime. We're using a625834e on a test machine and it syncs fine.

We're continuing to investigate this, thanks for bearing with us.

@orriin
Copy link
Contributor

orriin commented Aug 5, 2024

This is fixed in #714

A node will attempt to use its native runtime in substitute for the on-chain Wasm runtime if all of spec_name, spec_version, and authoring_version are the same between the Wasm and native binaries. This was causing nodes compiled from main to use a runtime slightly different compared to what is on-chain, and write invalid stuff to storage.

In the future, CI should prevent us from ever pushing to main a spec version that matches what is on-chain (cc @sam0x17)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants