Piece getter general error

Issue Report

Environment

Ubuntu 22.03
Advanced CLI

Problem

thread 'plotting-1.0' panicked at /home/subspace/crates/subspace-farmer-components/src/plotting.rs:540:48:
Piece getter must returns valid pieces of history that contain proper scalar bytes; qed: "Invalid scalar"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-02-13T13:52:24.988357Z  WARN single_disk_farm{disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Failed to send sector index for initial plotting error=send failed because receiver is gone
Error: Background task plotting-2 panicked
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <core::pin::Pin<P> as core::future::future::Future>::poll
   4: subspace_farmer::single_disk_farm::plotting::plotting::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}
   5: tokio::runtime::context::runtime::enter_runtime
   6: tokio::runtime::scheduler::multi_thread::worker::block_in_place
   7: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
   8: rayon_core::registry::WorkerThread::wait_until_cold
   9: rayon_core::registry::ThreadBuilder::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-02-13T14:02:41.081200Z  WARN single_disk_farm{disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Failed to send sector index for initial plotting error=send failed because receiver is gone
Error: Background task plotting-2 panicked

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: std::sys_common::backtrace::__rust_begin_short_backtrace
   2: core::ops::function::FnOnce::call_once{{vtable.shim}}
   3: std::sys::pal::unix::thread::Thread::new::thread_start
   4: <unknown>
   5: <unknown>

Most likely the same as Thread 'plotting-1.1' panicked

Can you tell me the specific reason why this problem occurs?
What mechanism produces this error?
I will help you solve this problem

There is no problem to solve for me, this is a problem with your hardware. Please read linked thread and threads mentioned in there carefully.

This doesn’t seem to be a hardware issue , I’m very sure

As you wish, but the error you’re getting indicates in-memory data corruption. I checked the code path and so far I see no other explanation for this.

Are you running on an overclocked or undervolted system? Some users in Discord have tweaked these settings and had success with similar issues.

My server is not using low voltage memory modules or overclocking.
A similar situation occurs on three of my seven servers.
My monitoring process redirects all logs.
when the problem occurs, it should be after the plot sector is completed.

the probability of this problem occurring is rare。

thread 'plotting-1.0' panicked at /home/subspace/crates/subspace-farmer-components/src/plotting.rs:540:48:
Piece getter must returns valid pieces of history that contain proper scalar bytes; qed: "Invalid scalar"
stack backtrace:
   0:     0x555555d9f89f - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h854e3d9599d23b9b
   1:     0x555555846a20 - core::fmt::write::hdaa13832d911494b
   2:     0x555555d66e0e - std::io::Write::write_fmt::ha2d6d8f909a702b7
   3:     0x555555da192e - std::sys_common::backtrace::print::h78d1eab0d976c677
   4:     0x555555da0ac7 - std::panicking::default_hook::{{closure}}::h3f8628a95270c213
   5:     0x555555da21ab - std::panicking::rust_panic_with_hook::hd1b06f3095c8ec01
   6:     0x555555da1ca0 - std::panicking::begin_panic_handler::{{closure}}::hb82004c56d4db4fa
   7:     0x555555da1bf6 - std::sys_common::backtrace::__rust_end_short_backtrace::h17b40b71bb1ece3d
   8:     0x555555da1be3 - rust_begin_unwind
   9:     0x555555669384 - core::panicking::panic_fmt::h9bd50ad4fc2ca95e
  10:     0x555555669932 - core::result::unwrap_failed::h861383bd8d19e70e
  11:     0x555555e72b0e - <core::pin::Pin<P> as core::future::future::Future>::poll::he4816b870cadeba8
  12:     0x555555eda91c - subspace_farmer::single_disk_farm::plotting::plotting::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::h2e5f89001d0adedf
  13:     0x555555f79836 - tokio::runtime::context::runtime::enter_runtime::h2070983c4c1f0457
  14:     0x555556146b4f - tokio::runtime::scheduler::multi_thread::worker::block_in_place::hcdd8dd12015f21ef
  15:     0x555556065b9e - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb6804824d14ce268
  16:     0x5555556a386f - rayon_core::registry::WorkerThread::wait_until_cold::h774ae0930d9e0a00
  17:     0x555555bb0c22 - rayon_core::registry::ThreadBuilder::run::hbfca6208f57be30c
  18:     0x555555de3749 - std::sys_common::backtrace::__rust_begin_short_backtrace::h352810b4fc8f3ff6
  19:     0x555555de4783 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h31ae30d80733f54d
  20:     0x555555da3cd5 - std::sys::pal::unix::thread::Thread::new::thread_start::hd551cfa6ba15fff0
  21:     0x7ffff7d1bac3 - <unknown>
  22:     0x7ffff7dad850 - <unknown>
  23:                0x0 - <unknown>
2024-02-15T12:13:56.931045Z  WARN single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Failed to send sector index for initial plotting error=send failed because receiver is gone
Error: Background task plotting-1 panicked

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: std::sys_common::backtrace::__rust_begin_short_backtrace
   2: core::ops::function::FnOnce::call_once{{vtable.shim}}
   3: std::sys::pal::unix::thread::Thread::new::thread_start
   4: <unknown>
   5: <unknown>

a new error occurred, directly destroying the 1T plot drive.

2024-02-15T09:22:04.842107Z ERROR subspace_farmer::utils::farmer_piece_getter: Failed to retrieve first segment piece from node error=Parse error: invalid value: integer `1149`, expected u8 piece_index=47992
2024-02-15T09:22:35.105039Z ERROR subspace_farmer::utils::farmer_piece_getter: Failed to retrieve first segment piece from node error=Cannot convert piece. PieceIndex=47974 piece_index=47974

I do not believe anything is destroyed. You can run scrub on it to fix errors. However, everything indicates that you seem to have hardware issues. The fact that you don’t use low-power or overclocked RAM doesn’t guarantee you don’t have stability issues.

There is not much application can do if bits change in memory unexpectedly. Since you are the only one so far with such errors (with thousands of users plotting petabytes of space) and me not seeing issues in code, I strongly recommend to thoroughly check your hardware.

I need to check the hardware. The server uses with register ECC memory. Can ECC find the memory bit error?

In general - yes, even ECC memory can be faulty. To the best of my understanding so far your farmer received something either from disk or from network, checked it and it was good, but by the time it got to the plotting process, it turned out to be invalid. It is possible that glitch happened during piece cache sync and farmer wrote and created valid checksum for corrupted piece, in which case you may hit that error sometimes. Removing piece cache file and syncing piece cache again might fix that (scrub doesn’t check contents beyond a few checksums for performance reasons).

you are right. It is more likely that there is some signal interference in the memory operation. removing piece cache file and syncing piece cache can fix it