Too many shards are missing

Issue Report

Node has problem syncing.

Environment

  • Operating System: Win 10
  • CPU Architecture: Intel(R) Core™ i7-10700 CPU @ 2.90GHz
  • RAM: 32GB ram
  • Storage: 320GB SSD
  • Plot Size: 100GB
  • Subspace Deployment Method: Advanced CLI

Problem

Node gives the following error:

←[2m2023-08-29 11:42:44←[0m [Consensus] ❌ Error while dialing /dns/telemetry.subspace.network/tcp/443/x-parity-wss/%2Fsubmit%2F: Custom { kind: Other, error: Timeout }
←[2m2023-08-29 11:43:03←[0m [Consensus] ⚙️  Syncing  0.0 bps, target=#187450 (40 peers), best: #176801 (0x0cb3…3211), finalized #88734 (0xb4fa…0f2a), ⬇ 64.8kiB/s ⬆ 2.1kiB/s
←[2m2023-08-29 11:43:07←[0m [Consensus] Error when syncing blocks from DSN error=Other: Error during data shards reconstruction: Impossible to recover, too many shards are missing

Hi! This message is harmless. It sounds much worse than it is.

As long as your best is going up, you are syncing fine.

Thank you for your answer. But my node is still at finalized block 88734 and not moving for few hours now:

←[2m2023-08-29 13:02:04←[0m [Consensus] 💤 Idle (40 peers), best: #188308 (0xae74…4a91), finalized #88734 (0xb4fa…0f2a), ⬇ 38.6kiB/s ⬆ 40.0kiB/s

Is that normal?

Yep, Gemini 3f does not use finalized in the same way as other Substrate projects.

Reconstruction errors are generally not the end of the world.

They are not great and shouldn’t really happen often on healthy network, but node and farmer will retry until they succeed in retrieving what they need anyway. We are still tuning retries and things of that nature to improve success rate + big chunk of the network still uses outdated buggy software, but we’re getting there.

@nazar-pc can you explain what happens here? Why does the node not sync any blocks until it reaches this error (Error when syncing blocks from DSN error=Other: Error during data shards reconstruction: Impossible to recover, too many shards are missing) As soon as it reaches this error it starting to sync, I see this across many nodes (sep-11). What’s the logic behind it?

Blockquote
2023-09-22 09:23:33 [Consensus] :gear: Syncing 0.0 bps, target=#541619 (208 peers), best: #541499 (0x5ad8…c864), finalized #464770 (0x8d26…fec7), :arrow_down: 206.6kiB/s :arrow_up: 99.6kiB/s
2023-09-22 09:23:38 [Consensus] :gear: Syncing 0.0 bps, target=#541620 (185 peers), best: #541499 (0x5ad8…c864), finalized #464770 (0x8d26…fec7), :arrow_down: 135.1kiB/s :arrow_up: 3.1kiB/s
2023-09-22 09:23:43 [Consensus] :gear: Syncing 0.0 bps, target=#541620 (177 peers), best: #541499 (0x5ad8…c864), finalized #464770 (0x8d26…fec7), :arrow_down: 152.8kiB/s :arrow_up: 7.2kiB/s
2023-09-22 09:23:43 [Consensus] Error when syncing blocks from DSN error=Other: Error during data shards reconstruction: Impossible to recover, too many shards are missing
2023-09-22 09:23:48 [Consensus] :gear: Syncing 0.0 bps, target=#541622 (191 peers), best: #541499 (0x5ad8…c864), finalized #464770 (0x8d26…fec7), :arrow_down: 323.2kiB/s :arrow_up: 7.5kiB/s
2023-09-22 09:23:53 [Consensus] :gear: Syncing 0.0 bps, target=#541623 (198 peers), best: #541499 (0x5ad8…c864), finalized #464770 (0x8d26…fec7), :arrow_down: 259.2kiB/s :arrow_up: 4.8kiB/s
2023-09-22 09:23:58 [Consensus] :gear: Preparing 2.6 bps, target=#541623 (208 peers), best: #541512 (0x2a58…8219), finalized #464770 (0x8d26…fec7), :arrow_down: 231.9kiB/s :arrow_up: 7.1kiB/s
2023-09-22 09:24:03 [Consensus] :gear: Preparing 4.8 bps, target=#541625 (208 peers), best: #541536 (0x378f…cbe0), finalized #464770 (0x8d26…fec7), :arrow_down: 250.0kiB/s :arrow_up: 5.0kiB/s
2023-09-22 09:24:08 [Consensus] :gear: Preparing 3.8 bps, target=#541626 (209 peers), best: #541555 (0x3a51…ca31), finalized #464770 (0x8d26…fec7), :arrow_down: 160.3kiB/s :arrow_up: 5.0kiB/s
2023-09-22 09:24:13 [Consensus] :gear: Preparing 4.0 bps, target=#541627 (199 peers), best: #541575 (0xa72f…b261), finalized #464770 (0x8d26…fec7), :arrow_down: 128.1kiB/s :arrow_up: 8.6kiB/s
2023-09-22 09:24:18 [Consensus] :gear: Preparing 3.8 bps, target=#541628 (186 peers), best: #541594 (0x1baa…b7ea), finalized #464770 (0x8d26…fec7), :arrow_down: 182.8kiB/s :arrow_up: 4.3kiB/s
2023-09-22 09:24:23 [Consensus] :gear: Preparing 5.0 bps, target=#541629 (184 peers), best: #541619 (0x1e28…dded), finalized #464770 (0x8d26…fec7), :arrow_down: 221.3kiB/s :arrow_up: 4.2kiB/s
2023-09-22 09:24:23 [Consensus] :broken_heart: Error importing block 0xb0dc73c41216478904a983f99dfd94a8002373b539d65ee25b35b2b3205ccc7a: block has an unknown parent

There are 2 sync mechanisms in Subspace: one is sync from DSN from deep history, the other is Substrate sync for recent history. When that error happens DSN sync ends because there is nothing left for it to do and regular Substrate sync kicks in. But depending on the height of the local chain regular sync may not be able to sync at all and will result in node banning on network level, see Node not able to sync for days - #10 by nazar-pc