Few optimizations for nodes and farmers

From my own experience, I have identified for myself a number of optimizations and tips that can help you. Most of the above was tested by me in practice. I’m open to additions, corrections and etc…

Note, this article is largely designed for Linux users, since I myself am one. Keep it in mind while reading this.

1. Using an external journal on the SSD for the file system on HDD. In the real world (out of benchmarks), I’ve only tested this on XFS, and it improves performance significantly. On simple benchmarks, ext4 performs 2 times better than XFS, but usually XFS outperforms ext4 with highly competitive IO. So it makes sense to do some real-world tests with both. So it’s up to you what to use.

On an SSD, I would recommend using XFS.

Creating a file system with an external journal:

For XFS and for ext4, you need to get a separate partition on a fairly fast drive (SATA SSD, for example).

Create XFS with external journal:

Once XFS is created, it is currently not possible to change the log type (external or internal). Also XFS has a maximum log size of 2038 MiB. Such a large log size does not always make sense, but in modern days, allocating 2 GB of SSD memory should usually not be a problem.

To format the disk, run the command:

mkfs.xfs -l logdev=/dev/nvme0n1p1,size=2038M /dev/sda

Where /dev/nvme0n1p1 is the path to the partition you
created for the external journal and /dev/sda is the path
to the hard drive. This command will set the log size to a maximum of 2038 MiB.

Then add logdev=/dev/nvme0n1p1 to the mount flags. Here is an example:

mount -o logdev=/dev/nvme0n1p1 /dev/sda /mnt/subspace

To idetify logdev, I recommend referring to the partition by ID. It can be obtained with the command:

ls -l /dev/disk/by-id/

Don’t forget to update your /etc/fstab.

Create ext4 with external journal:

First you need to format the partition for the external journal:

mke2fs -O journal_dev /dev/nvme0n1p1

I recommend sticking to a partition size of around 2 GB for highly loaded drives.

Ext4 provides more flexibility for the user so that an existing filesystem can be augmented with an external journal:

tune2fs -J device=/dev/nvme0n1p1 /dev/sda

In order to create a file system immediately with an external journal, you need to run the command:

mke2fs -J device=/dev/nvme0n1p1 /dev/sda

You don’t have to add any mount flags, as the file system will automatically find the external journal. However, I would still recommend using the following mount flags:

noatime,nombcache,commit=600,journal_async_commit,nodelalloc,data=journal

They can significantly increase file system performance. However, commit=600 is associated with an increased risk of data loss. If you’re worried about this, you can change its value to something more conservative – for example, commit=100.

Don’t forget to update your /etc/fstab.

2. When using a file system that actively uses Copy-on-Write (for example, btrfs), it must be disabled on the node and farmer’s data directories sinceit causes significant performance regressions. It is also worth turning off defragmentation on the fly. Without CoW, most of the “new generation” functionality of btrfs is disabled. Therefore, I do not see the point in using it, since it is better to give preference to a file system with the ability to have the log on a separate device, such a ext4 or XFS.

3. Be sure to have the farmer’s database and node on a SSD. Without it, you are probably doomed to failure.

To have the farmer database on another drive, you need to use the following:

subpsace-farmer --farm=hdd=/mnt/subspace,ssd=/mnt/ssd,size=100G farm ...

4. When placing a large number of farmers on one HDD, it is worth setting the --disk-concurrency 1 parameter to reduce disk load.

For example:

subspace-farmer farm --disk-concurrency 1 ... 

When plotting on NVMe, it is recommended to set this parameter to values ​​of 10 or more, since NVMe SSD only benefits from parallelism. Otherwise, you may see a high iowait.

5. It’s experimental. But I noticed that when doing initial plotting on SSD with subsequent transfer to HDD, you should not use the old farmer’s database. When using it, the disk was heavily loaded and the farmer worked very slowly. A simple reboot could result in hours of waiting for the post-sync to complete. If old database is not used, then the disk is almost not loaded, and 25 simultaneously standing farmers can recover in a few minutes after a half-day downtime. Rewards begin to arrive somewhere in a day or two after the transfer procedure. So it might be a good way to improve SSD plotting with subsequent transfer to HDD.

6. Reducing the maximum number of peers for nodes. This can be achieved using the --out-peers and --in-peers options. The optimal values, in my opinion, are 10 and 20, respectively. For example:

subspace-node --out-peers 10 --in-peers 20 --name Kalliope ...

On Linux, it is also worth increasing the maximum number of simultaneous connections. To do this, you need to change the sysctl value of net.core.somaxconn depending on the expected number of peers.

In order to change the value til reboot, you can use the command:

sudo sysctl -w net.core.somaxconn=4096

To make the changes permanent you need to add the line net.core.somaxconn=4096 to the /etc/sysctl.conf file.

7. Using zram can help a lot when running out of RAM. I recommend using the lz4 algorithm, setting zram size limits from 50% of total RAM and above, and setting the vm.page-cluster = 0 parameter using sysctl. Just keep in mind that the option vm.page-cluster = 0 when using a swap partition on an SSD or HDD can cause a noticeable performance degradation when working with it. Perhaps you should give up the swap partitions on the drives or try zswap.

Zram setup:

I will show a method for setting up zram using the systemd-zram-generator package.

In order to install it on Ubuntu you need to run the command:

sudo apt install systemd-zram-generator

For Arch Linux:

sudo pacman -S zram-generator

The zram is configured through the /etc/systemd/zram-generator.conf file. For example, here are the settings that I find optimal:

[zram0]
zram-size = ram * 0.5
compression-algorithm = lz4

This configuration will limit the maximum size of the zram swap file to half of the total RAM and use the lz4 algorithm for compression.

After all you just have to reboot and changes will be accepted automatically on boot.

Zswap setup:

Zswap is used when there is already a swap partition on the physical media. In general, zswap performs worse than zram, but if you don’t want to give up a swap partition, then it will be better than nothing.

To enable and configure zswap you need to add the following boot options:

zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=z3fold

This will enable zswap, set the pool limit to 20% of total RAM, lz4 as the algorithm and z3fold as an allocator. Z3fold is an allocator for storing compressed memory and allows up to 3 to 1 compression. It may not be available by default on some distributions. This is not so bad, because the lz4 algorithm does not provide a very large compression ratio.

You are probably using Grub as your bootloader. In order to update the boot options for Linux, you need to:

Open /etc/default/grub file;

And then add to the line GRUB_CMDLINE_LINUX_DEFAULT=... needed boot parameters.

For example, if the line looks like this:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet splash"

Then after adding it should look like this:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet splash zswap.enabled=1 zswap.compressor=lz4 zswap.max_pool_percent=20 zswap.zpool=z3fold"

Then you need to update the bootloader configuration.

On Ubuntu:

sudo update-grub

On Arch Linux and many other distributions:

sudo grub-mkconfig -o /boot/grub/grub.cfg

8. Compile from sources with optimizations for your hardware. This can, albeit a little, but increase the speed of the node and the farmer. To do this, before compiling, run the following commands:

export RUSTFLAGS="-C target-cpu=native"
export CFLAGS="-march=native -O3"
export CXXFLAGS="$CFLAGS"

9. The native Linux kernel CFS task scheduler is configured for use on interactive systems such as laptops and desktops. Fine-tuning its parameters can significantly increase performance under high loads, especially with highly concurrency I/O.

I suggest changing the following settings:

sched_latency = 90000000
sched_min_granularity = 10000000
sched_wakeup_granularity = 15000000
sched_migration_cost = 50000000
kernel.sched_autogroup_enabled = 0

On older kernels, it is possible to configure all these parameters via sysctl, but I will not describe this process here. I will describe 2 main ones that users of relatively new kernels (5.10-5.13 and newer) can use.

Now only the parameter kernel.sched_autogroup_enabled = 0 can be changed via sysctl.

Via debugfs:

The simplest way is to configure these parameters – this is through debugfs:

echo 90000000 | sudo tee > /dev/null /sys/kernel/debug/sched/latency_ns
echo 10000000 | sudo tee > /dev/null /sys/kernel/debug/sched/min_granularity_ns
echo 15000000 | sudo tee > /dev/null /sys/kernel/debug/sched/wakeup_granularity_ns
echo 50000000 | sudo tee > /dev/null /sys/kernel/debug/sched/migration_cost_ns

Changes will be saved until the next reboot, so you should make sure that these commands are run when the system boots.

Note that debugfs may not be enabled by default. In order to enable it, you need to add the boot option debugfs=on.

If you don’t want to use debugfs, for example, for security reasons you can build the Linux kernel with a patch that makes these changes at the kernel code level. This method is described in the next section.

Via kernel patch (for advanced users):

I’ll try to post this patch later…

10. Setting the pruning of a node can significantly reduce the disk space it occupies and this does not affect sync speed or rewards in any way. To configure pruning, use the --state-pruning 1024 and --keep-blocks 1024 options. Such values ​​of these parameters, in general, are optimal.

For example:

subspace-node --state-pruning 1024 --keep-blocks 1024 --name Kalliope ...

11. General kernel parameters tuning through sysctl:

Increase the maximum number of open file descriptors (the optimal value is the number of node and farmer pairs multiply by 20000 and plus 20000):

fs.file-max = 2500000

Increase the intervals between cache synchronization with the drive (XFS only). Please note that this increases the risk of data loss in the event of a sudden power outage:

fs.xfs.xfssyncd_centisecs = 10000

The received frames will be stored in this queue after taking them from the ring buffer on the network card. Increasing this value for high speed network cards may help prevent losing packets:

net.core.netdev_max_backlog = 16384

Enabling and tuning TCP Window Scaling. Note, that these settings are designed for owners of a gigabit channel:

net.core.rmem_max = 1073725440
net.core.wmem_max = 1073725440
net.core.rmem_default = 262140
net.core.wmem_default = 262140
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 65535 262140 1073725440
net.ipv4.tcp_wmem = 65535 262140 1073725440
net.ipv4.tcp_window_scaling = 1

Enabling TCP Fast Open, that can significally improve network latency:

net.ipv4.tcp_fastopen = 3

Enabling BBR congestion control algorithm, that can help achieve higher bandwidths and lower latencies for internet traffic:

net.core.default_qdisc = fq_codel
net.ipv4.tcp_congestion_control = bbr

Enabling whether TCP should reuse an existing connection in the TIME-WAIT state for a new outgoing connection. This helps avoid from running out of available network sockets:

net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

Disabling whether TCP should start at the default window size only for new connections or also for existing connections that have been idle for too long. This setting kills persistent single connection performance and could be turned off:

net.ipv4.tcp_slow_start_after_idle = 0

Enabling whether the system will prioritize low latency over high throughput. It is worth using with a heavy load on the system, but with weak or medium network consumption:

net.ipv4.tcp_low_latency = 1

Decrease the virtual file system (VFS) cache parameter value. Since caching is good for performance, we want to store cached data in memory longer:

vm.vfs_cache_pressure = 50

This option represents the percentage of the free memory before activating swap. Since the cache will grow larger, we still want to reduce swapping so that it does not cause increased swap I/O. 10 is the optimal value when using zram or a swap partition on the HDD. If you have a swap partition on an SSD and you want to prevent it from wear, then set the value to 1:

vm.swappiness = 10 

This parameter is responsible for how many kilobytes of RAM the system will try to keep free (without cache). It is recommended to set this parameter to 1-3% of the total RAM (1% of 4 GiB of RAM is 83886 KiB):

vm.min_free_kbytes = 83886

Just add the lines with the desired values ​​to the /etc/sysctl.conf file. To apply it is worth restarting the server.

12. Turning off mitigation is an effective, but very dangerous way to increase your sync speed significally. Under no circumstances should you use this if there are other nodes on the server, the server has a graphical interface and/or is used as a desktop system, or if you store important or sensitive information. There is a risk of being leaked or hacked through a page in a web browser, so think twice! You can read more about it here.

To disable mitigations you need to add mitigations=off to your kernel boot options and then update the bootloader configuration.

After a reboot, mitigations should be disabled. You can check this with the command:

grep -H '' /sys/devices/system/cpu/vulnerabilities/*

Most parameters should be Vulnerable or Not affected.

13. As a way to easily manage nodes and farmers, I offer my service files for systemd-based distributions.

For nodes:

Just add the followings to the /etc/systemd/system/subspace-node@.service file:

[Unit]
Description=Subspace Node %i
After=network.target

[Service]
Type=simple
User=subspace
Environment=NAME_PREFIX=biba
Environment=NODES_DIR=/home/subspace/nodes
Environment=NODE_BIN=/home/subspace/subspace-node
Environment=NODES_DIR_PREFIX=sn
ExecStart=/bin/bash -c 'exec ${NODE_BIN} \
    --chain gemini-2a \
    --execution native-else-wasm \
    --base-path ${NODES_DIR}/${NODES_DIR_PREFIX}-%i \
    --state-pruning 1024 \
    --keep-blocks 1024 \
    --port $$((30332 + %i)) \
    --ws-port $$((9965 + %i)) \
    --validator  \
    --out-peers 10 \
    --in-peers 20 \
    --name ${NAME_PREFIX}-%i'
KillSignal=SIGINT
LimitNOFILE=10000
Restart=on-failure
RestartSec=10
Nice=-5

[Install]
WantedBy=multi-user.target

You need to set the proper environment variables and change the username to yours. Then this service file will independently set the ports, name and path to the node database.

To enable node 1, you need to run the command:

sudo systemctl enable --now subspace-node@1

To enable nodes 2-10, you need to run the command:

sudo systemctl enable --now subspace-node@{2..10}

To view logs of fifth node, you need to run the command:

sudo journalctl -f -o cat -u subspace-node@5

To disable all of them, you need to run the command:

sudo systemctl disable --now subspace-node@{1..10}

For farmers (HDD):

Just add the followings to the /etc/systemd/system/subspace-farmer@.service file:

[Unit]
Description=Subspace Farmer %i
After=network.target

[Service]
Type=simple
User=subspace
Environment=PLOT_DIR=/mnt
Environment=PLOT_DIR_PREFIX=subspace
Environment=HDD_COUNT=1
Environment=DB_DIR=/mnt
Environment=DB_DIR_PREFIX=ssd
Environment=SSD_COUNT=1
Environment=PLOT_SIZE=100G
Environment=FARMER_BIN=/home/subspace/subspace-farmer
Environment=ADDRESSES_FILE=/home/subspace/addresses
ExecStartPre=/bin/bash -c "/bin/mkdir -p ${PLOT_DIR}/${PLOT_DIR_PREFIX}-$$(( (%i - 1) % ${HDD_COUNT} + 1))/farmer-%i"
ExecStartPre=/bin/bash -c "/bin/mkdir -p ${DB_DIR}/${DB_DIR_PREFIX}-$$(( (%i - 1) % ${SSD_COUNT} + 1))/mappings-hdd/farmer-%i"
ExecStart=/bin/bash -c 'exec ${FARMER_BIN} \
    --farm hdd=${PLOT_DIR}/${PLOT_DIR_PREFIX}-$$(( (%i - 1) % ${HDD_COUNT} + 1))/farmer-%i,ssd=${DB_DIR}/${DB_DIR_PREFIX}--$$(( (%i - 1) % ${SSD_COUNT} + 1))/mappings-hdd/farmer-%i,size=${PLOT_SIZE} \
    farm \
    --node-rpc-url ws://127.0.0.1:$$((9965 + %i)) \
    --listen-on /ip4/0.0.0.0/tcp/$((40332 + %i)) \
    --reward-address $$(sed -n %i\p ${ADDRESSES_FILE}) \
    --disk-concurrency 1 \
    --plot-size ${PLOT_SIZE}'
KillSignal=SIGINT
LimitNOFILE=10000
Restart=always
RestartSec=10
Nice=-5

[Install]
WantedBy=multi-user.target

You need to set the proper environment variables and change the username to yours. Then this service file will independently set the ports, paths to plot and farmer database. It is necessary to adhere to a certain style of naming directories. It is able to independently distribute plots and databases between different disks. This is preferable to using software RAID.

To enable farmer 1, you need to run the command:

sudo systemctl enable --now subspace-farmer@1

To enable farmers 2-10, you need to run the command:

sudo systemctl enable --now subspace-farmer@{2..10}

To view logs of fifth farmer, you need to run the command:

sudo journalctl -f -o cat -u subspace-farmer@5

To disable all of them, you need to run the command:

sudo systemctl disable --now subspace-farmer@{1..10}

For farmers (SSD):

Just add the followings to the /etc/systemd/system/subspace-farmer-ssd@.service file:

[Unit]
Description=Subspace Farmer SSD %i
After=network.target
Wants=subspace-node@%i\.service
After=subspace-node@%i\.service
 
[Service]
Type=simple
User=subspace
Environment=DB_DIR=/mnt
Environment=DB_DIR_PREFIX=ssd
Environment=SSD_COUNT=1
Environment=PLOT_DIR=/mnt
Environment=PLOT_DIR_PREFIX=ssd
Environment=PLOT_SSD_COUNT=1
Environment=CONCURRENCY=5
Environment=PLOT_SIZE=100G
Environment=FARMER_BIN=/home/subspace/subspace-farmer
Environment=ADDRESSES_FILE=/home/subspace/addresses
ExecStartPre=/bin/bash -c '/bin/mkdir -p ${PLOT_DIR}/${PLOT_DIR_PREFIX}-$$(( (%i - 1) % ${PLOT_SSD_COUNT} + 1))/farmers/farmer-%i'
ExecStartPre=/bin/bash -c '/bin/mkdir -p ${DB_DIR}/${DB_DIR_PREFIX}-$$(( (%i - 1) % ${SSD_COUNT} + 1))/mappings/farmer-%i'
ExecStart=/bin/bash -c 'exec ${FARMER_BIN} \ 
    --farm hdd=${PLOT_DIR}/${PLOT_DIR_PREFIX}-$$(( (%i - 1) % ${PLOT_SSD_COUNT} + 1))/farmers/farmer-%i,ssd=${DB_DIR}/${DB_DIR_PREFIX}-$$(( (%i - 1) % ${SSD_COUNT} + 1))/mappings/farmer-%i,size=${PLOT_SIZE} \
    farm \
    --node-rpc-url ws://127.0.0.1:$$((9965 + %i)) \
    --listen-on /ip4/0.0.0.0/tcp/$((40332 + %i)) \
    --reward-address $$(sed -n %i\p ${ADDRESSES_FILE}) \
    --plot-size ${PLOT_SIZE} \
    --disk-concurrency ${CONCURRENCY}'
KillSignal=SIGINT
LimitNOFILE=10000
Restart=always
RestartSec=10
Nice=-5

[Install]
WantedBy=multi-user.target

You need to set the proper environment variables and change the username to yours. Then this service file will independently set the ports, paths to plot and farmer database, disk concurrency. Increase the CONCURRENCY value to 10 or more if using NVMe, and if using a SATA SSD, you can either leave it or lower it slightly. It is necessary to adhere to a certain style of naming directories. It is able to independently distribute plots and databases between different disks. This is preferable to using software RAID.

To enable farmer 1, you need to run the command:

sudo systemctl enable --now subspace-farmer-ssd@1

To enable farmers 2-10, you need to run the command:

sudo systemctl enable --now subspace-farmer-ssd@{2..10}

To view logs of fifth SSD farmer, you need to run the command:

sudo journalctl -f -o cat -u subspace-farmer-ssd@5

To disable all of them, you need to run the command:

sudo systemctl disable --now subspace-farmer-ssd@{1..10}
6 Likes

Very good alternate guide thanks for your effort @SeryogaLeshii

1 Like

I’ve played with vm.page-cluster sysctl option and I noticed that even when using zram and under heavy load, sometimes a value of 3 can slightly increase performance at cost of latency. Try it yourself.

And more about mitigations. Do not disable them on VPS/VDS.

This is really amazing. I am bookmarking this for later - it is a legendary Subspace Forum post.

1 Like

I will be very grateful if I am allowed to edit this article further. I noticed some typos.

This is all great information that adds real value for the community. Many thanks for your effort here. I’ve bookmarked as there’s a lot to digest and I want to come back to reference the optimisation methods when I apply them to my nodes.

2 Likes

I want to make it clear that the information here is noticeably out of date.

hi Seryoga How would you bypass the known Parted glitch with ‘Warning: The resulting partition is not properly aligned for best performance: 12635340s % 2048s != 0s’ error while aligning new small partitions manually ? :face_with_raised_eyebrow:

For ones intented to do this with Parted - dont waste time! Just use gdisk for GPT partition table and enjoy perfect partitioning software (for linux)

1 Like