Can the --farming-thread-pool-size only use up to 32 threads at most? For some large EPYC processors, isn't that too few?

No idea to be honest. Last block is on canonical chain and I see no errors. The only explanation that comes to mind is very bad Internet connectivity that prevents blocks from being downloaded in time, so it tries to sync, but fails.

You can try to start node with RUST_LOG=info,sync=trace to collect A LOT of details if this persists after restart, but you probably wouldn’t want to keep that set all the time.

Is there any problem with the timekeepers?

I suspect that the 0bps issue may still be due to a problem with the step where the farm remotely connects to the node.

I see no issues with timekeeper in that log, but I also don’t know what CLI options did you use to start it.

2024-03-10T07:03:37.399414Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3197259 to_next_slot=3197260

in the node running Timekeepers, I haven’t seen this type of error. It’s clear that starting Timekeepers can indeed solve this problem. However, other nodes in the same data center are still experiencing this error. If, as you say, Timekeepers propagate through the network, then logically, other nodes in the same data center should not have this error. I’ve tried using the --reserved-nodes and --dsn-reserved-peers parameters to connect them, but it still doesn’t work.

Logical network topology is not the same as geographical topology.

You don’t need to touch DSN and --reserved-nodes does work the way it is supposed to. Note that you need to specify it on both ends.

Following your instructions, if both parties set up connections, it seems this issue could be resolved. However, is there a way to enable one-sided connections? I want to assist more Chinese users by letting them manually connect to my node, but it’s not feasible for me to manually add parameters to my node each time.

No, in one-sided case there there are zero guarantees.

I understand and appreciate that, but the way I see it we need to understand why is it happening in the first place and address that instead.

Could you not run timekeeper and set environment variable RUST_LOG=info,quinn_udp=error,sc_proof_of_time=trace?
This will print a bunch of information about how gossip works and might help us understanding what is going on.

Also having SSH access to machines (VM is fine) in China would be a big help. We have one in Alibaba Cloud, but I have heard it is not necessarily representative of what regular users would experience networking-wise.

My node runs Timekeeper, and I have a 14900K machine that operates perfectly without any issues. I am now considering whether my Timekeeper can help other users in mainland China because many people have reported this error with their nodes.

Yes, Alibaba Cloud’s network is among the better ones in China. Most household internet providers in mainland China do not provide a public IP; they use LAN instead. Moreover, the internet for most Chinese users is typically of the type with 1000M downlink and 20M uplink.

As for the reason for the problem, I think it’s likely due to network issues causing a high delay in receiving challenges?

That is not a solution though and it leads to centralization. I mean it can help you and few others, but I’d rather fix it for everyone without extra configuration. Collecting relevant logs helps with that.

No idea, I’d need access to the environment to do some debugging to figure that out.

I can provide a Ubuntu server for you, do you need it?

Or you can also tell me how to run tests to investigate the reason for the following warning:

2024-03-11T05:15:53.595539Z WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3278297 to_next_slot=3278298

A physical server or even a VM would be great (ideally with port forwarding).
You can find my SSH public key here: https://github.com/nazar-pc.keys

I don’t know how to use these, I will send you the SSH information directly via private message.

I have sent you a private message. You can test it as you like. This is a server with a public IP.

Is there a way to test and detect the latency of receiving challenges? I want to check how long it takes for me to receive these challenges.

No, but if you set RUST_LOG=info,sc_proof_of_time=trace you should see messages about new proofs of time checkpoints almost exactly each second.

Here are warning logs from the machine you have shared access with:

Proof of time chain was extended from block import
2024-03-12T19:18:42.970333Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3417177 to_next_slot=3417178
2024-03-12T23:03:53.841201Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3430883 to_next_slot=3430884
2024-03-12T23:14:28.649307Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3431527 to_next_slot=3431528
2024-03-12T23:44:24.767989Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3433349 to_next_slot=3433350
2024-03-12T23:54:49.690373Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3433983 to_next_slot=3433984
2024-03-13T00:34:31.930028Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3436383 to_next_slot=3436384
2024-03-13T00:49:24.102653Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3437288 to_next_slot=3437289
2024-03-13T01:51:29.358894Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3441067 to_next_slot=3441068
2024-03-13T02:51:59.943461Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3444750 to_next_slot=3444751
2024-03-13T03:01:51.410814Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3445350 to_next_slot=3445351
2024-03-13T03:07:26.560577Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3445690 to_next_slot=3445691
2024-03-13T03:43:37.177761Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3447892 to_next_slot=3447893
2024-03-13T03:55:43.723378Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3448629 to_next_slot=3448630
2024-03-13T04:08:55.304595Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3449432 to_next_slot=3449433
2024-03-13T04:29:14.733894Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3450669 to_next_slot=3450670
2024-03-13T05:30:54.331070Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3454422 to_next_slot=3454423

As can be seen, it is always 1 slot difference and it doesn’t happen all the time, just from time to time.

The way protocol works is the following:

  • slot arrives
  • farmer does the audit and in case there is a solution candidate will try to generate a proof
  • with solution ready to go node waits for future slot (+4 from slot for which solution was generated)
  • once future proof of time arrives, block is created, signed and sent to the network

Now let’s see the logs when things work fine:

Good
2024-03-12T18:21:22.580703Z TRACE Consensus: sc_proof_of_time::source::gossip: Superficial verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413688
2024-03-12T18:21:22.724102Z TRACE Consensus: sc_proof_of_time::source: Block import didn't result in proof of time chain changes best_slot=3413687
2024-03-12T18:21:22.724134Z  INFO Consensus: substrate: ✨ Imported #602616 (0xa5c9…23e7)
2024-03-12T18:21:23.245389Z DEBUG Consensus: sc_proof_of_time::source::gossip: Full verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413688

We can see that future proof of time checkpoints for slot 3413688 were received just before block that contains future proof of time slot 3413687. This is how things are supposed to work.

Here is an example where this wasn’t the case:

Less good
2024-03-12T18:21:38.363184Z TRACE Consensus: sc_proof_of_time::source::gossip: Superficial verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413704
2024-03-12T18:21:38.765137Z  INFO Consensus: substrate: 💤 Idle (40 peers), best: #602619 (0x1a10…f52b), finalized #522672 (0xc9e8…1f33), ⬇ 42.6kiB/s ⬆ 94.6kiB/s
2024-03-12T18:21:39.038338Z DEBUG Consensus: sc_proof_of_time::source::gossip: Full verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413704
2024-03-12T18:21:39.038604Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3413704 to_next_slot=3413705
2024-03-12T18:21:39.038860Z  INFO Consensus: substrate: ✨ Imported #602620 (0xef01…625c)
2024-03-12T18:21:39.336517Z TRACE Consensus: sc_proof_of_time::source::gossip: Superficial verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413705
2024-03-12T18:21:39.997988Z DEBUG Consensus: sc_proof_of_time::source::gossip: Full verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413705

Here future proof of time in the block was received slightly before the checkpoints from gossip.

To be clear, this is not great, but not catastrophe either. Basically it means that when this happens and there is a fork, it is likely that the block you’ve produced will not win the fork. However, if there are no forks or you produce a vote, there should be not difference.

So strictly speaking you are in a bit of disadvantage when this happens, and this likely happens to different participants from time to time (I saw it in my logs a few times as well) due to how time sensitive this stuff is, but it is not the end of the world. Running fast timekeeper helps, but is not a strict requirement.

Have you found the reason why the rewards are low in mainland China?

Not really. If there are proving issues then they need to be addressed separately. If it is related to forking and loosing fork race then as mentioned timekeeper will help with that.

Would be helpful to quantify “low” against “normal”.

But not everyone has the conditions to run timekeeper