Farming / Plotting issue over network

Issue Report

Environment

  • Operating System:
  • Ubuntu 22 over proxmox 8.1
  • 10 G network NAT
  • 8G network WAN
  • Nas are openmediavault over proxmox 8.1

Ubuntu server 22, on proxmox 8.1.

  • Pulsar/Advanced CLI/Docker: CLI

Problem

Hello !

I’m starting to farm / plot over network. I was having two node + farmer on two different VM. One was with ryzen 5900 wich plot like a hammer, and the other was an old E5-2680-V4 wich was not good. So i swapped disk from slow plotter to new one.

  • First issue, the plot is not reconized apparently.
  • Second issue, farmer do not start, it does nothing and stay stuck into disk init.

This is my farmer services (/opt is local drive, /mnt are only nfs)

# /etc/systemd/system/subspace_farmer.service

[Unit]
Description="subspace farmer"
After=network-online.target

[Service]
User=chimera
Group=chimera
WorkingDirectory=/opt/chimera/farmer/
ExecStart=subspace_farmer farm \
    --node-rpc-url ws://127.0.0.1:9944 \
     path=/opt/chimera/farmer/ssd,size=800G \
     path=/opt/chimera/farmer/nvme,size=900G \
     path=/mnt/hydras_alpha_crucial_2TO_baie/alpha,size=900G \
     path=/mnt/hydras_alpha_fikwot4T_baie_top_left/alpha,size=900G \
     path=/mnt/subspace_001,size=3.5T \
     --reward-address st8VCipcz7xezUnM73T7szCiyNau51YYJAGovKGETUjuPy5oj \
    --listen-on /ip4/0.0.0.0/tcp/30533 \
    --prometheus-listen-on 0.0.0.0:9081

StandardOutput=append:/var/log/chimera/subspace_farmer/farmer.log
StandardError=append:/var/log/chimera/subspace_farmer/farmer.log
LimitNOFILE=infinity

Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

this is my fstab (nfs part)

192.168.1.19:/export/hydras_alpha_crucial_2TO_baie /mnt/hydras_alpha_crucial_2TO_baie nfs rw,noatime,rsize=32768,wsize=32768,proto=tcp,nolock,intr,hard,timeo=600 0 0
192.168.1.19:/export/hydras_alpha_fikwot4T_baie_top_left /mnt/hydras_alpha_fikwot4T_baie_top_left nfs rw,noatime,rsize=32768,wsize=32768,proto=tcp,nolock,intr,hard,timeo=600 0 0


192.168.1.58:/export/subspace_001 /mnt/subspace_001 nfs rw,noatime,rsize=32768,wsize=32768,proto=tcp,nolock,intr,hard,timeo=600 0 0


About logs, well it’s stuck here. nothing more.

2024-02-22T16:57:00.818657Z  INFO subspace_farmer::commands::farm: Connecting to node RPC url=ws://127.0.0.1:9944
2024-02-22T16:57:00.831604Z  INFO subspace_networking::constructor: DSN instance configured. allow_non_global_addresses_in_dht=false peer_id=12D3KooWCeNFYtX6py5gXdZzNt3Pg4qHsEJ1fuUBkpZT5zh9RwbE protocol_version=/subspace/2/0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
2024-02-22T16:57:00.837985Z  INFO libp2p_swarm: local_peer_id=12D3KooWCeNFYtX6py5gXdZzNt3Pg4qHsEJ1fuUBkpZT5zh9RwbE
2024-02-22T16:57:00.839683Z  INFO subspace_metrics: Metrics server started. endpoints=[0.0.0.0:9081]
2024-02-22T16:57:00.840675Z  INFO actix_server::builder: starting 2 workers
2024-02-22T16:57:00.841829Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime
2024-02-22T16:57:01.250036Z  INFO subspace_farmer::commands::farm: Multiple L3 cache groups detected l3_cache_groups=2
Single disk farm 0:
  ID: 01HP7WQYY69BBGRVAXPK2T45VK
  Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
  Public key: 0x7a51291a03a6d13093e7654141c1bdf6e66e4860a4f9538532a50696d0647a36
  Allocated space: 745.1 GiB (800.0 GB)
  Directory: /opt/chimera/farmer/ssd
Single disk farm 1:
  ID: 01HP7WQYY8B1N2GCYT7Z1C8F6K
  Genesis hash: 0x0c121c75f4ef450f40619e1fca9d1e8e7fbabc42c895bc4790801e85d5a91c34
  Public key: 0x8eba47742a1c288093081b70fe1a95f7fa5a430632739d5ece236dc4f5e71e30
  Allocated space: 838.2 GiB (900.0 GB)
  Directory: /opt/chimera/farmer/nvme

This is my iperf stats.

oot@chimera-zeta:/mnt/hydras_alpha_crucial_2TO_baie# iperf -c 192.168.1.19
------------------------------------------------------------
Client connecting to 192.168.1.19, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  1] local 192.168.1.40 port 41162 connected with 192.168.1.19 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0180 sec  11.5 GBytes  9.83 Gbits/sec
root@chimera-zeta:/mnt/hydras_alpha_crucial_2TO_baie# iperf -c 192.168.1.58
------------------------------------------------------------
Client connecting to 192.168.1.58, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.40 port 38026 connected with 192.168.1.58 port 5001
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0121 sec  11.4 GBytes  9.76 Gbits/sec

If you want to have view on node & global os, link to my grafana

https://mythologic.fr/d/bdbz3s3a4ho1sb/subspace-overview?orgId=7
https://mythologic.fr/d/QX3P3t7iz/olympus-os-overview?orgId=7&var-Host=chimera-zeta.mythologic.fr

local bench from nfs server

root@hydras-alpha:~# sh bench.sh
Bench /export/hydras_alpha_crucial_2TO_baie
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 8.11153 s, 129 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 11.4204 s, 91.8 MB/s
bench /export/hydras_alpha_fikwot4T_baie_top_left
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.6219 s, 400 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.89975 s, 362 MB/s

bench from susbpace server to nas over nfs

root@chimera-zeta:~# sh bench.sh
Bench /mnt/hydras_alpha_crucial_2TO_baie
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.07517 s, 148 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 10.8822 s, 96.4 MB/s
bench /mnt/hydras_alpha_fikwot4T_baie_top_left
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.14284 s, 489 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 3.23389 s, 324 MB/s

Sounds like it wasn’t mounted properly, I have no other plausible explanation for it.

I never had the patience to get farming to work (10G, 7TB plot file).

Plotting works without issue when you create a new plot file (and do not farm it).

nfsstat and nfsiostat can give some valuable info.
The I/O required for farming might take too long to execute, you end up with a backlog.

The easier fix might be to reconfigure your servers to a fitting layout.

Well, it seems fixed now…

The only issue I had was the plot that i swapped from farmer A to farmer B. @enzimes told me to try to detroy it and restart farming. All work properly now.

Well, going back into this topic.

Having hard time on nfs, and pretty noob with it as said earlier. do you know nice tools to diagnose issue with nfs mount point ?

used this cfg from omv side 10Gbe / NFS Tuning - NFS - openmediavault