Network Latency & Timeouts after Starting Node & Farmer

Issue Report

Environment

  • Operating System: Ubuntu Server
  • Pulsar/Advanced CLI/Docker: Docker

Problem

I am encountering an issue where my network slows to a crawl whenever I start my node + farmer. It appears my network bandwidth is fine as things like YouTube play HD video just fine once the page actually loads. The problem is it will take 10-15 seconds for a webpage to load, and sometimes it will completely timeout.

I have an ISP Router/Modem set to bridge mode → Unifi Dream Machine Pro → Unifi Switch → Host with Node/Farmer

I am running pihole in docker, however this issue started prior to running pihole. I added it in order to see if it would help the issue. It does not seem to have helped.

Currently I’m in the Piece Cache syncing stage. I was running many node/farmers on 3g no issues.

The slowdown happens locally as well, not just queries to outside my network. Pinging other machines on my network will sometimes timeout. I am happy to perform any tests or diagnostics to solve this issue. As it stands right now I cannot participate in 3h.

192.168.2.103 is another host on my network. 172.25.0.2 is my pihole

dig 192.168.2.103
dev@alpha:~$ dig 192.168.2.103
;; communications error to 172.25.0.2#53: timed out
;; communications error to 172.25.0.2#53: timed out

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> 192.168.2.103
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 6462
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;192.168.2.103.                 IN      A

;; AUTHORITY SECTION:
.                       86399   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2024020300 1800 900 604800 86400

;; Query time: 15 msec
;; SERVER: 172.25.0.2#53(172.25.0.2) (UDP)
;; WHEN: Sat Feb 03 07:22:56 MST 2024
;; MSG SIZE  rcvd: 117

This is an attempt at hitting something external:

dig google.com
dev@alpha:~$ dig google.com
;; communications error to 172.25.0.2#53: timed out
;; communications error to 172.25.0.2#53: timed out

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39550
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             197     IN      A       192.178.49.14

;; Query time: 7 msec
;; SERVER: 172.25.0.2#53(172.25.0.2) (UDP)
;; WHEN: Sat Feb 03 07:10:20 MST 2024
;; MSG SIZE  rcvd: 55

If I change my DNS server to 1.1.1.1 or 8.8.8.8 the same issue happens - where I will get timeouts when running dig.

Here is an example of just connecting to this forum - it took 23 seconds for this call to complete:

Some more browsing - huge delays in requests. Apologies if it’s hard to read, but it is 11785ms and 1

Browser screenshot

Screenshot 2024-02-03 081950

1 Like

If you click on the resource that took longest, can you share the Network Timing tab for that resource?

Screenshot 2024-02-03 at 11.07.59 AM

1 Like
Screenshot

Screenshot 2024-02-03 191531

Dig seems to be working much better. I have the pfsense in resolver mode - and forwarding is not enabled. This has improved browsing speed but the network is still very slow.

dig weather.com
hakedev@bravo:~$ dig weather.com

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> weather.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39321
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;weather.com.                   IN      A

;; ANSWER SECTION:
weather.com.            20      IN      A       104.103.188.38

;; Query time: 479 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Sat Feb 03 19:21:07 MST 2024
;; MSG SIZE  rcvd: 56

The second one query time is 0ms which makes me think caching is working fine

dig weather.com
hakedev@bravo:~$ dig weather.com

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> weather.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39177
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;weather.com.                   IN      A

;; ANSWER SECTION:
weather.com.            12      IN      A       104.103.188.38

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Sat Feb 03 19:21:15 MST 2024
;; MSG SIZE  rcvd: 56
1 Like

Let’s make a step back and identify where the issue is more precisely.

Can you try to ping your router? If ping there is low but there are networking issues, then the issue is further down the stack. Try to ping 8.8.8.8 then. The goal here is to avoid DNS and check if the network in general works and where issues happen.

If router pings fine and 8.8.8.8 not then try to use mtr or similar tool to see where the issue happens:

mtr 8.8.8.8

If you start seeing issues directly on your ISP level then the issue is probably in your router. Please attach mtr results so we can see what next steps in debugging will be.

1 Like

Router Ping:

PING 192.168.10.1 (192.168.10.1) 56(84) bytes of data.
64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=0.100 ms
64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=0.185 ms
64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=0.205 ms
64 bytes from 192.168.10.1: icmp_seq=4 ttl=64 time=0.092 ms
64 bytes from 192.168.10.1: icmp_seq=5 ttl=64 time=0.135 ms
64 bytes from 192.168.10.1: icmp_seq=6 ttl=64 time=0.086 ms
64 bytes from 192.168.10.1: icmp_seq=7 ttl=64 time=0.085 ms

Pinging 8.8.8.8

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=5.65 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=4.43 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=118 time=4.51 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=118 time=4.45 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=118 time=4.45 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=118 time=4.83 ms

mtr 8.8.8.8

May or may not be helpful but just some other observations:
Pages like github take literally 4-5 minutes to load - sometimes failing entirely.
Watching YouTube works just fine, definitely not a bandwidth issue, more so latency.
Piece Cache starts to sync VERY slow. Like initially I could sync 10% of piece cache in a few minutes, but now it takes 7 hours to get to 6%.

I just thought about maybe trying mtr for github - the results are MUCH different:

1 Like

Just to clarify, you did above tests while you had significant networking issues, right?

1 Like

Yes correct - just added a new image.

1 Like

Just posting this on a whim, but same ip (4.68.38.185) shows up in some World of Warcraft forums with winMTR showing packet loss as well. I am on century link as well:

1 Like

So I see GitHub is perfectly reachable, meaning Internet overall should be working just fine assuming IP address is known.

Can you try to resolve DNS name with Google’s or Cloudflare’s server while the issue is happening?:

dig @8.8.8.8 weather.com

I’m not sure how exact you’ve changes DNS servers to Google before and what layers were in between dig and DNS server, but ^ will make request straight to 8.8.8.8.

1 Like
hakedev@alpha:~$ dig @8.8.8.8 weather.com
;; communications error to 8.8.8.8#53: timed out
;; communications error to 8.8.8.8#53: timed out

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> @8.8.8.8 weather.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11263
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;weather.com.                   IN      A

;; ANSWER SECTION:
weather.com.            20      IN      A       23.45.39.154

;; Query time: 15 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Feb 04 07:19:02 MST 2024
;; MSG SIZE  rcvd: 56

I see you keep getting these in all tests, did google respond quickly or it was actually super slow and timing out?

It did not respond quickly - but there was maybe only 1 second delay between each time out showing up. However, with all nodes turned off this response is instant and no time out messages.

Okay, so ICMP was quick, but UDP (DNS uses UDP) was not.

Let’s try DNS over HTTPS (HTTP/2 that uses TCP in this case) then (first request may result in DNS resolve, but should be faster next time assuming you have DNS caching):

time curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=weather.com&type=A'
hakedev@alpha:~$ time curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=weather.com&type=A'
{"Status":0,"TC":false,"RD":true,"RA":true,"AD":false,"CD":false,"Question":[{"name":"weather.com","type":1}],"Answer":[{"name":"weather.com","type":1,"TTL":14,"data":"184.85.65.207"}]}
real    0m0.149s
user    0m0.073s
sys     0m0.018s
hakedev@alpha:~$ time curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=weather.com&type=A'
{"Status":0,"TC":false,"RD":true,"RA":true,"AD":false,"CD":false,"Question":[{"name":"weather.com","type":1}],"Answer":[{"name":"weather.com","type":1,"TTL":9,"data":"184.85.65.207"}]}
real    0m2.525s
user    0m0.082s
sys     0m0.004s
hakedev@alpha:~$ time curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=weather.com&type=A'
{"Status":0,"TC":false,"RD":true,"RA":true,"AD":false,"CD":false,"Question":[{"name":"weather.com","type":1}],"Answer":[{"name":"weather.com","type":1,"TTL":6,"data":"184.85.65.207"}]}
real    0m1.231s
user    0m0.106s
sys     0m0.008s
hakedev@alpha:~$ 

I see from previous conversation you’re using pfSense, can I see general metrics of it when this is happening? Is pfSense instance sharing resources with something (like when running in a VM)?

Also ping uses very few bytes by default, try bigger size:

ping -s 1024 1.1.1.1

It is suspicious that DSN resolution is slow, but ping to Google is not slow.

My pfsense is a dedicated box. Specs:
Intel(R) Pentium(R) CPU J3710 @ 1.60GHz
Current: 1600 MHz, Max: 1601 MHz
4 CPUs: 1 package(s) x 4 core(s)

These are the metrics while network is slow:
image

hakedev@alpha:~$ ping -s 1024 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 1024(1052) bytes of data.
1032 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=10.9 ms
1032 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=10.9 ms
1032 bytes from 1.1.1.1: icmp_seq=3 ttl=58 time=11.2 ms
1032 bytes from 1.1.1.1: icmp_seq=4 ttl=58 time=10.9 ms
1032 bytes from 1.1.1.1: icmp_seq=5 ttl=58 time=10.9 ms
1032 bytes from 1.1.1.1: icmp_seq=6 ttl=58 time=10.9 ms
1032 bytes from 1.1.1.1: icmp_seq=7 ttl=58 time=11.5 ms
^C
--- 1.1.1.1 ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6009ms
rtt min/avg/max/mdev = 10.864/11.021/11.525/0.229 ms

How high is bandwidth usage and what NICs are you using there?

I see state table size seems to be healthy and RAM usage is reasonable just like CPU usage.

The pfsense is a protectli, here are the specs:

  • THE VAULT (FW4C): Secure your network with a compact, fanless & silent firewall. Comes with US-based Support & 30-day money back guarantee!
  • CPU: Intel J3710 Celeron Quad Core / 4 Thread at 1.6 GHz (Burst to 2.6 GHz), Intel AES-NI hardware support
  • PORTS: 4 Intel 2.5 Gigabit Ethernet NIC ports, 2x USB 3.0, 2x HDMI, 1x RJ45 COM Port
  • COMPONENTS: 8GB RAM, 120GB SSD

Keep in mind this happened before I installed pfsense. My network started out as:
ISP Router (bridge mode) => Unifi Dream Machine => Unifi Switch => Servers
Now its:
ISP Router (bridge mode) => pfsense => Unifi Dream Machine => Unifi Switch => Servers

image