Piece getter general error

Nacho-Neko · February 13, 2024, 1:58pm

Issue Report

Environment

Ubuntu 22.03
Advanced CLI

Problem

thread 'plotting-1.0' panicked at /home/subspace/crates/subspace-farmer-components/src/plotting.rs:540:48:
Piece getter must returns valid pieces of history that contain proper scalar bytes; qed: "Invalid scalar"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-02-13T13:52:24.988357Z  WARN single_disk_farm{disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Failed to send sector index for initial plotting error=send failed because receiver is gone
Error: Background task plotting-2 panicked

Nacho-Neko · February 13, 2024, 2:03pm

stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <core::pin::Pin<P> as core::future::future::Future>::poll
   4: subspace_farmer::single_disk_farm::plotting::plotting::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}
   5: tokio::runtime::context::runtime::enter_runtime
   6: tokio::runtime::scheduler::multi_thread::worker::block_in_place
   7: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
   8: rayon_core::registry::WorkerThread::wait_until_cold
   9: rayon_core::registry::ThreadBuilder::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-02-13T14:02:41.081200Z  WARN single_disk_farm{disk_farm_index=2}: subspace_farmer::single_disk_farm::plotting: Failed to send sector index for initial plotting error=send failed because receiver is gone
Error: Background task plotting-2 panicked

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: std::sys_common::backtrace::__rust_begin_short_backtrace
   2: core::ops::function::FnOnce::call_once{{vtable.shim}}
   3: std::sys::pal::unix::thread::Thread::new::thread_start
   4: <unknown>
   5: <unknown>

nazar-pc · February 13, 2024, 9:29pm

Most likely the same as Thread 'plotting-1.1' panicked

Nacho-Neko · February 14, 2024, 6:11am

Can you tell me the specific reason why this problem occurs?
What mechanism produces this error?
I will help you solve this problem

nazar-pc · February 14, 2024, 10:30am

There is no problem to solve for me, this is a problem with your hardware. Please read linked thread and threads mentioned in there carefully.

Nacho-Neko · February 14, 2024, 1:41pm

This doesn’t seem to be a hardware issue , I’m very sure

nazar-pc · February 14, 2024, 2:17pm

As you wish, but the error you’re getting indicates in-memory data corruption. I checked the code path and so far I see no other explanation for this.

Jim-Subspace · February 14, 2024, 3:06pm

Are you running on an overclocked or undervolted system? Some users in Discord have tweaked these settings and had success with similar issues.

Nacho-Neko · February 14, 2024, 5:21pm

My server is not using low voltage memory modules or overclocking.
A similar situation occurs on three of my seven servers.
My monitoring process redirects all logs.
when the problem occurs, it should be after the plot sector is completed.

Nacho-Neko · February 14, 2024, 5:31pm

the probability of this problem occurring is rare。

Nacho-Neko · February 15, 2024, 12:10pm

thread 'plotting-1.0' panicked at /home/subspace/crates/subspace-farmer-components/src/plotting.rs:540:48:
Piece getter must returns valid pieces of history that contain proper scalar bytes; qed: "Invalid scalar"
stack backtrace:
   0:     0x555555d9f89f - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h854e3d9599d23b9b
   1:     0x555555846a20 - core::fmt::write::hdaa13832d911494b
   2:     0x555555d66e0e - std::io::Write::write_fmt::ha2d6d8f909a702b7
   3:     0x555555da192e - std::sys_common::backtrace::print::h78d1eab0d976c677
   4:     0x555555da0ac7 - std::panicking::default_hook::{{closure}}::h3f8628a95270c213
   5:     0x555555da21ab - std::panicking::rust_panic_with_hook::hd1b06f3095c8ec01
   6:     0x555555da1ca0 - std::panicking::begin_panic_handler::{{closure}}::hb82004c56d4db4fa
   7:     0x555555da1bf6 - std::sys_common::backtrace::__rust_end_short_backtrace::h17b40b71bb1ece3d
   8:     0x555555da1be3 - rust_begin_unwind
   9:     0x555555669384 - core::panicking::panic_fmt::h9bd50ad4fc2ca95e
  10:     0x555555669932 - core::result::unwrap_failed::h861383bd8d19e70e
  11:     0x555555e72b0e - <core::pin::Pin<P> as core::future::future::Future>::poll::he4816b870cadeba8
  12:     0x555555eda91c - subspace_farmer::single_disk_farm::plotting::plotting::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::{{closure}}::h2e5f89001d0adedf
  13:     0x555555f79836 - tokio::runtime::context::runtime::enter_runtime::h2070983c4c1f0457
  14:     0x555556146b4f - tokio::runtime::scheduler::multi_thread::worker::block_in_place::hcdd8dd12015f21ef
  15:     0x555556065b9e - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb6804824d14ce268
  16:     0x5555556a386f - rayon_core::registry::WorkerThread::wait_until_cold::h774ae0930d9e0a00
  17:     0x555555bb0c22 - rayon_core::registry::ThreadBuilder::run::hbfca6208f57be30c
  18:     0x555555de3749 - std::sys_common::backtrace::__rust_begin_short_backtrace::h352810b4fc8f3ff6
  19:     0x555555de4783 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h31ae30d80733f54d
  20:     0x555555da3cd5 - std::sys::pal::unix::thread::Thread::new::thread_start::hd551cfa6ba15fff0
  21:     0x7ffff7d1bac3 - <unknown>
  22:     0x7ffff7dad850 - <unknown>
  23:                0x0 - <unknown>
2024-02-15T12:13:56.931045Z  WARN single_disk_farm{disk_farm_index=1}: subspace_farmer::single_disk_farm::plotting: Failed to send sector index for initial plotting error=send failed because receiver is gone
Error: Background task plotting-1 panicked

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
   1: std::sys_common::backtrace::__rust_begin_short_backtrace
   2: core::ops::function::FnOnce::call_once{{vtable.shim}}
   3: std::sys::pal::unix::thread::Thread::new::thread_start
   4: <unknown>
   5: <unknown>

Nacho-Neko · February 15, 2024, 12:13pm

a new error occurred, directly destroying the 1T plot drive.

2024-02-15T09:22:04.842107Z ERROR subspace_farmer::utils::farmer_piece_getter: Failed to retrieve first segment piece from node error=Parse error: invalid value: integer `1149`, expected u8 piece_index=47992
2024-02-15T09:22:35.105039Z ERROR subspace_farmer::utils::farmer_piece_getter: Failed to retrieve first segment piece from node error=Cannot convert piece. PieceIndex=47974 piece_index=47974

nazar-pc · February 15, 2024, 12:30pm

I do not believe anything is destroyed. You can run scrub on it to fix errors. However, everything indicates that you seem to have hardware issues. The fact that you don’t use low-power or overclocked RAM doesn’t guarantee you don’t have stability issues.

There is not much application can do if bits change in memory unexpectedly. Since you are the only one so far with such errors (with thousands of users plotting petabytes of space) and me not seeing issues in code, I strongly recommend to thoroughly check your hardware.

Nacho-Neko · February 15, 2024, 12:37pm

I need to check the hardware. The server uses with register ECC memory. Can ECC find the memory bit error?

nazar-pc · February 15, 2024, 12:52pm

In general - yes, even ECC memory can be faulty. To the best of my understanding so far your farmer received something either from disk or from network, checked it and it was good, but by the time it got to the plotting process, it turned out to be invalid. It is possible that glitch happened during piece cache sync and farmer wrote and created valid checksum for corrupted piece, in which case you may hit that error sometimes. Removing piece cache file and syncing piece cache again might fix that (scrub doesn’t check contents beyond a few checksums for performance reasons).

Nacho-Neko · February 15, 2024, 1:17pm

you are right. It is more likely that there is some signal interference in the memory operation. removing piece cache file and syncing piece cache can fix it

Nacho-Neko · February 29, 2024, 5:52am

This problem becomes more frequent when I use Table:generate_parallel in multiple threads at the same time。
I’ve created a Lazy

that will generate 50 PosTables simultaneously in multiple threads. After the generation is completed, I pass the ownership to Hash, and then enter get(SBucket of u16) find_proof in parallel Hash through Lazy, access it from the Table generated in Lazy, and check the probability of Invalid scalar happening in this way. It will be extremely high

It happens about once every 6 hours

Nacho-Neko · February 29, 2024, 5:54am

#[derive(Clone)]
struct LazyTable {
    current_index: u16,
    max_index: u16,
}

impl LazyTable {
    fn new(max_index : u16) -> LazyTable{
        LazyTable {
            current_index: 0,
            max_index : max_index,
        }
    }
    fn get_current(&mut self) -> u16 {
        return self.current_index;
    }

    fn next<PosTable: Table>(&mut self,tables_maps : &mut Option<HashMap<u16, PosTable>>, generator_vec : &Vec<std::sync::Mutex<<PosTable as Table>::Generator>> ,sector_id: &SectorId ,farmer_protocol_info : FarmerProtocolInfo)  {
        tables_maps.take();
        let mut end_index = self.current_index + generator_vec.len() as u16;
        if end_index > self.max_index { end_index = self.max_index  }
        let result : HashMap<u16,PosTable> = (self.current_index..end_index)
        .into_par_iter()
        .enumerate()
        .map(|(index, current)| {
            let generator_opt = generator_vec.get(index);
            if let Some(generator_lock) = generator_opt{
                let mut generator = generator_lock.lock().unwrap();
                let piece = PieceOffset::from(current);
                let pos_table = generator.generate_parallel(
                    &sector_id.derive_evaluation_seed(piece, farmer_protocol_info.history_size),
                );
                (current,pos_table)
            } else {
                let mut generator = PosTable::generator();
                let piece = PieceOffset::from(current);
                warn!("generator table from index: {} PosTable {}",index, current);
                let pos_table = generator.generate_parallel(
                    &sector_id.derive_evaluation_seed(piece, farmer_protocol_info.history_size),
                );
                (current,pos_table)
            }
        }).collect();
        *tables_maps = Some(result);
        self.current_index = end_index;
    }
}

    let mut lazy_blocks = LazyTable::new(1000); 
    let mut _tables_maps: Option<HashMap<u16, PosTable>>  = Some(HashMap::with_capacity(0)); 
    lazy_blocks.next::<PosTable>(&mut _tables_maps,generator_vec,&sector_id,farmer_protocol_info);

for ((piece_offset, record), mut encoded_chunks_used) in (PieceOffset::ZERO..)
        .zip(raw_sector.records.iter_mut())
        .zip(sector_contents_map.iter_record_bitfields_mut())
    {
        // Derive PoSpace table (use parallel mode because multiple tables concurrently will use
        // too much RAM)
        let index: u16 = piece_offset.get();
        if  index >= lazy_blocks.get_current() {
            lazy_blocks.next::<PosTable>(&mut _tables_maps,generator_vec,&sector_id,farmer_protocol_info);
        } 
        let table: &HashMap<u16, PosTable> =  _tables_maps.as_ref().unwrap();
        let pos_table_cache: PosTable;
        let pos_table : &PosTable ;
        if let Some(pos) = table.get(&index) {
            pos_table = pos;
        } else {
            warn!("table_generator.generate_parallel index  : {}",index);
            pos_table_cache = table_generator.generate_parallel(
                &sector_id.derive_evaluation_seed(piece_offset, farmer_protocol_info.history_size),
            );
            pos_table = &pos_table_cache;
        }
    ... Some Think ...
}

Nacho-Neko · February 29, 2024, 5:56am

Why does my method trigger compared to directly using ASYNC to generate Table::generate_parallel?

Piece getter must returns valid pieces of history that contain proper scalar bytes; qed: "Invalid scalar"

nazar-pc · February 29, 2024, 10:33am

As I already mentioned before, this is a hardware issue, there is no point in finding software issues here as far as I’m concerned. You’re free to do so, but it is a waste of my time unless proven otherwise.

Topic		Replies	Views
Error - Recovering missing piece Support	11	159	November 15, 2023
Farmer Errors with 0 Rewards Support	7	450	May 3, 2023
Run farmer cultivator error Support	3	237	March 24, 2023
My node syncs blocks fine but farmer never draws, this has been going on for a long time. Support cli , error , farmer	2	230	September 14, 2023
Farmers are unable to plot with the Advanced CLI Support farmer , gemini-3e , advanced-cli	9	381	July 14, 2023

Piece getter general error

Issue Report

Environment

Problem

Related Topics