Error: Failed to submit solutions response: Request timeout

Hello, on sep-11 the farmer process died after running into the error posted below. There were no other non-INFO in the log than what’s posted below

Blockquote
2023-09-17T01:47:55.776025Z INFO single_disk_farm{disk_farm_index=1}: subspace_farmer::reward_signing: Successfully signed reward hash 0x3e3492a5a636231ed2d476b9c7c611a6c18e94b3146716b98006eb6e40efe124
2023-09-17T01:48:32.382008Z INFO single_disk_farm{disk_farm_index=0}: subspace_farmer::reward_signing: Successfully signed reward hash 0xc95eb4344f471ac2870eb42dddf12a110d9a37c739d19420ad619d4aafca1426
2023-09-17T01:54:33.702866Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703378Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703387Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703395Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703404Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703410Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703417Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703424Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:54:33.703431Z WARN jsonrpsee_core::client::async_client::helpers: Subscription Str(“GI5Aq6rHp2ZGtOMd”) is not active
2023-09-17T01:57:35.222617Z INFO subspace_networking::node_runner: Added observed address as external: /ip4/192.168.3.1/tcp/30005
2023-09-17T01:57:48.812868Z INFO single_disk_farm{disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Sector replotted successfully sector_index=83
Error: Failed to submit solutions response: Request timeout
[srv_subspace@node-2 subspace]$

Looks like node was so slow it wasn’t able to process a request, this typically happens when machine is severely overloaded, in particular on disk I/O side from my experience.

I take your word for it but as for disk I/O I yet have to see heavy load, maybe we define overloaded as per disk I/O?

I am not sitting on top of each machine all the time, but the occasional disk stat never revealed anything concerning. Attached a few minutes of disk IO of nvme drives that are currently plotting/farming. Even if the ~18 MB/s was all random IO (bad for HDD), still no concern at all for flash (sata, sas or nvme)

I have not seen it for a long time, but especially consumer SSDs that are not high-end can “stutter” or “lock up” periodically under heavy load as controller decides what to do with all that data and requests. So generally it would be related to random I/O or general controller slowness. Hard to provide more specifics, if you have monitoring you might want to check load average there.

Somewhat related to that, earlier this year, I was asking if a benchmark cli can be provided, something that would help to assess and support optimizing hardware configuration.