Error - Recovering missing piece

Issue Report

Environment

  • Ubuntu 22.04.3 LTS
    -Advanced CLI

I’m experiencing errors the last two days that I haven’t been able to figure out. This is happening on nov-09 but is also happening with nov-07 as well. It is happening on multiple machines (Same OS but different drives) when starting a new plot. This is not happening on any existing plots. I’ve let it run its course for over eight hours and no change.

2023-11-12T10:03:58.115495Z  INFO subspace_farmer::commands::farm: Connecting to node RPC url=ws://127.0.0.1:9944
2023-11-12T10:03:58.153304Z  INFO subspace_networking::constructor: DSN instance configured. allow_non_global_addresses_in_dht=false peer_id=12D3KooWHcdzbrF3KHjCGqPVXMD22GmmwX9crnYFiu5s48d38pHz protocol_version=/subspace/418040fc282f5e5ddd432c46d05297636f6f75ce68d66499ff4cbda69ccd180b
2023-11-12T10:03:58.755966Z  INFO subspace_farmer::commands::farm: Collecting already plotted pieces (this will take some time)...
2023-11-12T10:03:58.756048Z  INFO subspace_farmer::commands::farm: Finished collecting already plotted pieces successfully
2023-11-12T10:03:58.756009Z  INFO subspace_farmer::piece_cache: Initializing piece cache
2023-11-12T10:03:58.756858Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::farming: Subscribing to slot info notifications
2023-11-12T10:03:58.756972Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::reward_signing: Subscribing to reward signing notifications
2023-11-12T10:03:58.762843Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Subscribing to archived segments
2023-11-12T10:03:58.770300Z  INFO subspace_farmer::commands::farm::dsn: DSN listening on /ip4/127.0.0.1/udp/10525/quic-v1/p2p/12D3KooWHcdzbrF3KHjCGqPVXMD22GmmwX9crnYFiu5s48d38pHz
2023-11-12T10:03:58.771314Z  INFO subspace_farmer::commands::farm::dsn: DSN listening on /ip4/172.20.105.1/udp/10525/quic-v1/p2p/12D3KooWHcdzbrF3KHjCGqPVXMD22GmmwX9crnYFiu5s48d38pHz
2023-11-12T10:03:58.771417Z  INFO subspace_farmer::commands::farm::dsn: DSN listening on /ip4/127.0.0.1/tcp/10525/p2p/12D3KooWHcdzbrF3KHjCGqPVXMD22GmmwX9crnYFiu5s48d38pHz
2023-11-12T10:03:58.771431Z  INFO subspace_farmer::commands::farm::dsn: DSN listening on /ip4/172.20.105.1/tcp/10525/p2p/12D3KooWHcdzbrF3KHjCGqPVXMD22GmmwX9crnYFiu5s48d38pHz
2023-11-12T10:04:10.069880Z  INFO subspace_farmer::piece_cache: Synchronizing piece cache
2023-11-12T10:07:15.145695Z  INFO subspace_farmer::piece_cache: Finished piece cache synchronization
2023-11-12T10:07:15.168525Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::single_disk_farm::plotting: Plotting sector (0.00% complete) sector_index=0
2023-11-12T10:07:47.314618Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=1118
2023-11-12T10:07:47.343065Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=1118 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:48.680232Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=2317
2023-11-12T10:07:48.705395Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=2317 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:48.902612Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=1573
2023-11-12T10:07:48.928965Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=1573 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:49.684918Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=1080
2023-11-12T10:07:49.700145Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=1080 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:49.841554Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=660
2023-11-12T10:07:49.856494Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=660 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:49.967360Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=1092
2023-11-12T10:07:49.979533Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=1092 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:50.067209Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=2489
2023-11-12T10:07:50.080777Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=2489 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:50.482195Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=438
2023-11-12T10:07:50.502775Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=438 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:50.645755Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=1390
2023-11-12T10:07:50.661758Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=1390 received_pieces=0 required_pieces_number=128
2023-11-12T10:07:50.661865Z  INFO single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece... missing_piece_index=641
2023-11-12T10:07:50.672987Z ERROR single_disk_farm{disk_farm_index=0}: subspace_farmer_components::segment_reconstruction: Recovering missing piece failed. missing_piece_index=641 received_pieces=0 required_pieces_number=128

1 Like

Looks like your node is not able to retrieve any pieces at all, so there is some kind of networking issue happening.

cc @shamil

I found the cause.

To greatly speed up the initial sync process when starting / restarting a node, I have been using the dns_bootstrap_nodes command option. This has been working great and allows existing plots to continue plotting (with an an almost immediate connection to the network). That said, it also stops new plots from starting with the errors show above. With this command option removed, initial connection and sync takes considerably longer, but new plots will start.

Is this expected behavior?

It is expected that it should work with configuration by default, if you specify incorrect options it can indeed cause issues like you have seen.

You have likely specified it incorrectly, but previously there was no need to reach out to DSN, so it appeared to work. Now that you do need to reach DSN, you have noticed it didn’t actually work properly.

I don’t believe I used any incorrect options. According to the node help documentation dns_bootstrap_nodes is a valid option and was used like --dsn-bootstrap-nodes /ip4/<ipv4 address>/tcp/30433/p2p/<node id>.

Existing plots that were created prior to using that option have no issue continuing to plot as expected, even when the node has been restarted. The issue is with new plots.

Ideally that option would not block the ability to create new plots. It is beneficial to quickly sync to the network when starting a new node or restarting a new node to keep up with new releases.

I can’t imagine this is expected behavior. If my syntax is incorrect, I’d appreciate some pointers as there is not much provided in the help section of the binary.

Thanks!

The IP might be unreachable even if specified correctly. There could also be networking issues that we are working on lately, so the option might start working in the future.

The IP address supplied in the config are reachable and connect (as can be seen in the near immediate connection and sync with the network and firewall logs).

Hopefully this this looked at and can start working without this unfortunate side effect soon.

I appreciate you looking into this!

Try today’s release, it should work significantly better than previous releases in networking department

Unfortunately this is not fixed in nov-13. Every plot that has already been created and has at least one sector seems to start just fine and continue plotting, but new plots will fail with the same errors.

I’m getting this error on two of my farms as well now. I shut those two farms down for some unrelated maintenance on the systems they are on. Those boxes went through multiple reboots over about three hours.

Once everything was done, when I started the farms up again, both of them are taking a good 45 minutes or so to finish Synchronizing the piece cache, and then I start getting the missing piece errors. Some of them fail. Both farms gave me the Replotting 0.0% lines, but did not actually advance replotting beyond that point while I was getting the missing piece messages. I was signing hashes successfully while this was going on (even while synchronizing piece cache, before the finished sync message). I recently restarted both farms to see if that would help, but it is frozen again on Synchronizing piece cache. I assume that’ll finish in 45m-an hour or so again, and I’ll start getting missing piece recovery messages after that.

The two other boxes that I did not need to shut down (one of those hosts the node) continued to sync and farm and replot without issue.

Node and farms are all on the Nov 14 build. Both boxes having the issue are Ubuntu 20.04. Neither system is overclocked and have had no corruption or memory issues in over 2 years, including creating thousands of chia plots.

I cannot provide logs now but will do so when I can.

As predicted, it once again took about 45 minutes to finish syncing cache. Lots of successful hash signing during that time (I thought it wasnt supposed to farm till that was done?). First box has successfully recovered a couple of missing pieces and no fails yet. I’m going to let it run overnight and see if it ever manages to recover

EDIT: Never mind, I issued scrub command incorrectly. Will report when it’s done.

I will leave this up though. Whenever I control-c these farms after this issue started, I get hundreds if not thousands of this message, for different indexes:

2023-11-15T01:42:06.866569Z WARN subspace_networking::utils::piece_provider: get_providers returned an error piece_index=3599 key=Multihash { code: 11770128, size: 8, digest: [15, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] } err=SendCommand(SendError { kind: Disconnected })

Another thing I am noting is that even when it is claiming to be replotting - and I did get a small advance in that regard, then it just stops - my CPU usage is very low, in the 3% range. I am used to much higher when plotting.