Index DB open error: Corruption: Corrupted Key: '<redacted>'

Please provide as much information as possible to help the community/ team resolve your issue quicker


Error was faced by user @xorinox and reported in Discord.

Jul 17 06:47:39 node-6d subspace-farmer[68092]: 2022-07-17T10:47:39.703017Z  INFO subspace_farmer::farming: Subscribing to slot info notifications
Jul 17 06:47:39 node-6d subspace-farmer[68092]: 2022-07-17T10:47:39.735619Z  INFO subspace_farmer::farming: Farming stopped!
...truncated a lot of lines so you can see next
Jul 17 06:47:39 node-6d subspace-farmer[68092]: 2022-07-17T10:47:39.781752Z  INFO subspace_farmer::farming: Farming stopped!
Jul 17 06:47:41 node-6d subspace-farmer[68092]: Error: Index DB open error: Corruption: Corrupted Key: '<redacted>' seq:2989609, type:8
Jul 17 06:47:42 node-6d systemd[1]: subspace-farmer.service: Main process exited, code=exited, status=1/FAILURE
Jul 17 06:47:42 node-6d systemd[1]: subspace-farmer.service: Failed with result 'exit-code'.
Jul 17 06:47:42 node-6d systemd[1]: subspace-farmer.service: Consumed 34min 38.274s CPU time.

You guys saw this before? Since I have it configured to run as daemons (systemd), it restarted immediately after the error and succeeded this time.

but now I see a lot of these errors

096449303, type = 1  in /chia/scratch/disk01/subspace/plot3/plot-index-to-offset/000017.sst offset 41499645 size 4063
Jul 17 06:52:24 node-6d subspace-farmer[80757]: 2022-07-17T10:52:24.068887Z  INFO subspace_farmer::archiving: Plotted segment segment_index=15347
Jul 17 06:52:24 node-6d subspace-farmer[80757]: 2022-07-17T10:52:24.178509Z ERROR subspace_farmer::plotting: Failed to write encoded pieces error=Corruption: block checksum mismatch: stored = 216074205, computed = 1096449303, type = 1  in /chia/scratch/disk01/subspace/plot3/plot-index-to-offset/000017.sst offset 41499645 size 4063
Jul 17 06:52:24 node-6d subspace-farmer[80757]: 2022-07-17T10:52:24.963661Z  INFO subspace_farmer::archiving: Plotted segment segment_index=15348
Jul 17 06:52:25 node-6d subspace-farmer[80757]: 2022-07-17T10:52:25.123088Z ERROR subspace_farmer::plotting: Failed to write encoded pieces error=Corruption: block checksum mismatch: stored = 216074205, computed = 1096449303, type = 1  in /chia/scratch/disk01/subspace/plot3/plot-index-to-offset/000017.sst offset 41499645 size 4063

This happened after I stopped the farmer service. I think the code doesn’t “sync” to storage.

please take a look @ivan-subspace

Looking at the file that went corrupt, it was not opened with O_SYNC.

lsof +fg /chia/scratch/disk01/subspace/plot3/plot-index-to-offset/000017.sst
COMMAND      PID         USER   FD   TYPE FILE-FLAG DEVICE SIZE/OFF        NODE NAME
subspace- 134267 srv_subspace  273r   REG     LG,CX  254,6 67377874 45101533594 /chia/scratch/disk01/subspace/plot3/plot-index-to-offset/000017.sst

Going to change subspace/crates/subspace-farmer/src/plot.rs add O_SYNC and see if this makes any difference…

 705 impl PieceOffsetToIndexDb {
 706     pub fn open(path: impl AsRef<Path>) -> io::Result<Self> {
 707         OpenOptions::new()
 708             .read(true)
 709             .write(true)
 710             .create(true)
 711             // https://docs.rs/libc/0.2.126/libc/constant.O_SYNC.html
 712             // pub const O_SYNC: c_int = 1052672;
 713             .custom_flags(1052672)
 714             .open(path)
 715             .map(Self)
 716     }
 762 impl PlotWorker<File> {
 763     fn from_base_directory(
 764         base_directory: impl AsRef<Path>,
 765         address: PublicKey,
 766         max_piece_count: u64,
 767     ) -> Result<Self, PlotError> {
 768         let plot = OpenOptions::new()
 769             .read(true)
 770             .write(true)
 771             .create(true)
 772             // https://docs.rs/libc/0.2.126/libc/constant.O_SYNC.html
 773             // pub const O_SYNC: c_int = 1052672;
 774             .custom_flags(1052672)
 775             .open(base_directory.as_ref().join("plot.bin"))
 776             .map_err(PlotError::PlotOpen)?;
 777         Self::with_plot_file(plot, base_directory, address, max_piece_count)
 778     }
 779 }
lsof +fg /chia/scratch/disk01/subspace/plot3/plot.bin
COMMAND      PID         USER   FD   TYPE    FILE-FLAG DEVICE    SIZE/OFF        NODE NAME
subspace- 165784 srv_subspace   53u   REG RW,SYN,LG,CX  254,6 76229378048 42949773276 /chia/scratch/disk01/subspace/plot3/plot.bin

SYN flag now available, but I don’t think I have found already all instances of when a file is opened…

Maybe plot.bin should not be opened with O_SYNC, there is a performance hit, obviously. But all the other smaller files, they appear to be meta data or index types of files?

Thanks for reporting the issue
I suspect that the issue is that fsync wasn’t enabled for rocksdb. Here is the pull request which you can follow for updates: