Domain won't start - Failed to process consensus block=Unknown Block

After renaming paritydb folder to back it up, (renamed due to to OOM Killer) node synched okay, but now that operator re-registered, operator fails to start and crashes with:

Dec 05 17:23:00 mmvt1 subspace-node[2191]: 2023-12-05T22:23:00.402805Z [Consensus] ⚙️  Syncing  0.0 bps, target=#525569 (40 peers), best: #409761 (0xf8c1…d11f), finalized #376724 (0x17d4…cfe4), ⬇ 106.7kiB/s ⬆ 13.1kiB/s
Dec 05 17:23:02 mmvt1 subspace-node[2191]: 2023-12-05T22:23:02.144594Z [Domain] Failed to process consensus block error=UnknownBlock("Header was not found in the database: 0x1e27b40e3142420329ab58b9e42fcf484f3ede9ee13ce4a5b7f853154f21a78a")
Dec 05 17:23:02 mmvt1 subspace-node[2191]: 2023-12-05T22:23:02.144731Z [Domain] Essential task `domain-operator-worker` failed. Shutting down service.
Dec 05 17:23:02 mmvt1 subspace-node[2191]: 2023-12-05T22:23:02.144873Z [Domain] Domain starter exited with an error Other("Essential task failed.")
Dec 05 17:23:02 mmvt1 subspace-node[2191]: 2023-12-05T22:23:02.144881Z [Domain] Essential task `domain` failed. Shutting down service.
Dec 05 17:23:02 mmvt1 subspace-node[2191]: Error: SubstrateService(Other("Essential task failed."))
Dec 05 17:23:02 mmvt1 systemd[815]: subspace-node.service: Main process exited, code=exited, status=1/FAILURE
Dec 05 17:23:02 mmvt1 systemd[815]: subspace-node.service: Failed with result 'exit-code'.

@ning I think this is because node database wasn’t renamed or something, right?

I’m not sure, but I have tested locally that if I rename the whole --base-path folder the operator can restart successfully.

A few questions/things I need @jrwashburn to help with to locate the problem:

  • Have you renamed the whole --base-path or just the chains/subspace_gemini_3g/paritydb
  • Is the operator failed immediately after starting the node or after the node syncing for some time
  • Plz run this command and let me know the result: subspace-node check-block 0x1e27b40e3142420329ab58b9e42fcf484f3ede9ee13ce4a5b7f853154f21a78a --chain gemini-3g --base-path <PATH>

I renamed subsapce_gemini_3g/partitydb and subspace_gemini_3g_evm_domain/paritydb.

If fails within a few seconds; logs: failed-consensus-block.log - Google Drive

I renamed them both again and re-synced overnight, and the node is running okay this time. Would I need to take down the node, restore the old paritydb folders and then run the check-block? And if I do that, will I be able to just rename back to the good paritydb folders and not have to sync all over again?

Would I need to take down the node

No need to as your node is running fine this time, but please do check if your domain node’s best block match the RPC endpoint node by

  • Get the best block from the log (i.e. #160648 (0x3928…a566) in the following log):
[Domain] 💤 Idle (0 peers), best: #160648 (0x3928…a566), finalized #0 (0xf886…aeb8)
  • Check the same block number (i.e. #160648) has the same hash (i.e. 0x3928…a566) as in the the RPC endpoint node

restore the old paritydb folders and then run the check-block?

If your old paritydb folders still exist (i.e. have the exact same data as it first shut down due to OOM), you can run the command directly in the old folder

So the only option is to re-sync from scratch?

r9@r9:~$ /home/r9/subspace/target/production/subspace-node check-block --chain gemini-3g --base-path /media/nvme1/subspace-node/chains/subspace_gemini_3g/paritydb 0xad31eb63f0b0ccfd5dfd85c440bb62a40a9abc380bf9d16db9d9a34e7e46dcd0
2024-01-31 13:54:10+03:00 🔨 Initializing Genesis block/state (state: 0x09b5…b0b4, header-hash: 0x4180…180b)
Error: SubstrateCli(Service(Other("Unknown block")))

My error is

янв 31 14:10:49 r9 subspace-node[58599]: 2024-01-31T11:10:49.685629Z [Domain] Failed to process consensus block error=UnknownBlock("Header was not found in the database: 0xad31eb63f0b0ccfd5dfd85c440bb62a40a9abc380bf9d16db9d9a34e7e46dcd0")
янв 31 14:10:49 r9 subspace-node[58599]: 2024-01-31T11:10:49.685664Z [Domain] Essential task `domain-operator-worker` failed. Shutting down service.
янв 31 14:10:49 r9 subspace-node[58599]: 2024-01-31T11:10:49.685747Z [Domain] Domain starter exited with an error Other("Essential task failed.")
янв 31 14:10:49 r9 subspace-node[58599]: 2024-01-31T11:10:49.685760Z [Domain] Essential task `domain` failed. Shutting down service.
янв 31 14:10:49 r9 subspace-node[58599]: Error: SubstrateService(Other("Essential task failed."))
янв 31 14:10:49 r9 systemd[1]: subspace.service: Main process exited, code=exited, status=1/FAILURE
янв 31 14:10:49 r9 systemd[1]: subspace.service: Failed with result 'exit-code'.
янв 31 14:10:49 r9 systemd[1]: subspace.service: Consumed 5min 12.622s CPU time.

Yes, domain must always start from consensus genesis or else you’ll run into issues. Support for starting at any time is not implemented yet.

OK, then how to re-sync domain?

You wipe all node data (consensus and domain) and start from scratch so both sync together

1 Like