This is going to be a long post. A lot of information to share. First, let me give you my setup. I have 4 machines involved in subspace.
#1: 10900 CPU, Ubuntu 20.04, running node and 4TB farm (single drive). When working correctly, 5 minute sectors.
#2: 10850k CPU, Ubuntu 20.04, running 2x2TB farm (two 2TB drives) connected via rcp to #1 node. When working correctly, 5 minute sectors.
#3: 10850k, effectively identical to #2 in every respect.
#4: 7950x, Windows 10, running 3TB farm (one 2TB plot and one 1TB plot on same drive) connected via rcp to #1 node. When working correctly, 3m20s sectors.
Okay. Here’s my full experience with subspace since Gemini 2a. A week before 3g, I wanted to get reacquainted with the process so I practiced trying to set up 3f. My experience was terrible. On all four farms, It would take me nearly an hour to go from “Synchronizing piece cache” to “Finished piece cache synchronization”. Plotting a sector started out taking over 3 hours and very slowly improved to maybe one every 30 minutes. CPU usage was always low, in the 3-5% range at most.
And any time I would control-c a farm, I’d get hundreds of error messages (“piece_provider : get providers returned an error piece” for hundreds of indexes).
After a week or so of this frustration, 3g launched, and it worked great! I got all the times described above as “working correctly”, 5 minute sector times at worst, and very consistent. Piece cache would synchronize almost instantly. Never ever got an error message control-c’ing a farm, I’d always get the “SIGINT received” trap and shutting down working properly. Been working like a dream.
Until yesterday (November 14th).
This morning I updated to the November 14 builds as soon as the announcement came out. Worked great, node syncing definitely improved btw. No issues at all with the Nov 14 build on either node or farmers.
But then later in the day, for reasons totally unrelated to subspace (actually to chia), I had a need to physically switch 13 HDDs between the #2 and #3 10850k boxes. I also wanted to label the drives, so I did about a dozen reboots of each box unplugging an HDD one at a time to identify it so I could label them. That’s ALL I did (well, that and necessary /etc/fstab updates, obviously). Eventually all the HDDs were labeled and switched. And all was good in that respect. Everything went fine as far as my chia stuff was concerned.
But then when I went to restart the farms (which I did shut down properly with control-c before my first reboot and didn’t attempt to restart again until everything was done), they’re now behaving just like they did under 3f. Though I am now also seeing a few “missing piece” messages I don’t recall seeing in 3f, but that’s about the only difference and it’s possible I missed them previously or forgot them, not seeing a lot of them even now and not always).
I ran scrub on both drive farms on the #3 box (which does NOTHING else), and scrub reported both farms were fine, no errors.
After a few reboots and restarts of the #3 farm, the piece cache now synchronizes immediately. But - I no longer sign hashes successfully. Ever. There are no error messages on node or farm, I just haven’t signed a hash in over 8 hours. Even the #2 box, which is taking forever to do anything (Piece cache sync at 59.52% was 11 hours ago) is still farming properly and signing hashes, but #3 box will not.
I should note that my move of HDDs from #2 to #3 does mean that I am now farming nossd/chia plots across a samba network share, where previously #2 box was farming those shares locally. So this could be causing a lot more network traffic between the two boxes. The problem with that theory is that the problems persist even when I completely shut nossd/chia farming down. It’s been shut off for 2 hours now and it has made no difference on either box.
#2 box, btw, I have not shut down or rebooted since the problem started. I’m afraid to at this point, because while it can’t replot, it can at least win rewards, which is better than #3 which after several reboots and restarts now appears fine in all respects except the new inability to actually win anything (previously, it was winning rewards like #2, rebooting and restarting fixed piece cache issues but disabled wins).
For the record, box #1 (where the node and a farm are) and #4 boxes have worked fine throughout, can have farms restarted with no issues, I have not rebooted them.
Please help re: boxes #2 and #3.