Can the --farming-thread-pool-size only use up to 32 threads at most? For some large EPYC processors, isn't that too few?

nazar-pc · March 10, 2024, 3:50am

No idea to be honest. Last block is on canonical chain and I see no errors. The only explanation that comes to mind is very bad Internet connectivity that prevents blocks from being downloaded in time, so it tries to sync, but fails.

You can try to start node with RUST_LOG=info,sync=trace to collect A LOT of details if this persists after restart, but you probably wouldn’t want to keep that set all the time.

z_W · March 10, 2024, 4:11am

Is there any problem with the timekeepers?

I suspect that the 0bps issue may still be due to a problem with the step where the farm remotely connects to the node.

nazar-pc · March 10, 2024, 4:14am

I see no issues with timekeeper in that log, but I also don’t know what CLI options did you use to start it.

z_W · March 10, 2024, 11:14am

2024-03-10T07:03:37.399414Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3197259 to_next_slot=3197260

in the node running Timekeepers, I haven’t seen this type of error. It’s clear that starting Timekeepers can indeed solve this problem. However, other nodes in the same data center are still experiencing this error. If, as you say, Timekeepers propagate through the network, then logically, other nodes in the same data center should not have this error. I’ve tried using the --reserved-nodes and --dsn-reserved-peers parameters to connect them, but it still doesn’t work.

nazar-pc · March 10, 2024, 11:18pm

Logical network topology is not the same as geographical topology.

You don’t need to touch DSN and --reserved-nodes does work the way it is supposed to. Note that you need to specify it on both ends.

z_W · March 11, 2024, 3:58am

Following your instructions, if both parties set up connections, it seems this issue could be resolved. However, is there a way to enable one-sided connections? I want to assist more Chinese users by letting them manually connect to my node, but it’s not feasible for me to manually add parameters to my node each time.

nazar-pc · March 11, 2024, 4:25am

No, in one-sided case there there are zero guarantees.

I understand and appreciate that, but the way I see it we need to understand why is it happening in the first place and address that instead.

Could you not run timekeeper and set environment variable RUST_LOG=info,quinn_udp=error,sc_proof_of_time=trace?
This will print a bunch of information about how gossip works and might help us understanding what is going on.

Also having SSH access to machines (VM is fine) in China would be a big help. We have one in Alibaba Cloud, but I have heard it is not necessarily representative of what regular users would experience networking-wise.

z_W · March 11, 2024, 4:40am

My node runs Timekeeper, and I have a 14900K machine that operates perfectly without any issues. I am now considering whether my Timekeeper can help other users in mainland China because many people have reported this error with their nodes.

Yes, Alibaba Cloud’s network is among the better ones in China. Most household internet providers in mainland China do not provide a public IP; they use LAN instead. Moreover, the internet for most Chinese users is typically of the type with 1000M downlink and 20M uplink.

As for the reason for the problem, I think it’s likely due to network issues causing a high delay in receiving challenges?

nazar-pc · March 11, 2024, 4:59am

That is not a solution though and it leads to centralization. I mean it can help you and few others, but I’d rather fix it for everyone without extra configuration. Collecting relevant logs helps with that.

No idea, I’d need access to the environment to do some debugging to figure that out.

z_W · March 11, 2024, 5:18am

I can provide a Ubuntu server for you, do you need it?

Or you can also tell me how to run tests to investigate the reason for the following warning:

2024-03-11T05:15:53.595539Z WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3278297 to_next_slot=3278298

nazar-pc · March 11, 2024, 5:39am

A physical server or even a VM would be great (ideally with port forwarding).
You can find my SSH public key here: https://github.com/nazar-pc.keys

z_W · March 11, 2024, 6:25am

I don’t know how to use these, I will send you the SSH information directly via private message.

z_W · March 11, 2024, 6:31am

I have sent you a private message. You can test it as you like. This is a server with a public IP.

z_W · March 12, 2024, 2:54pm

Is there a way to test and detect the latency of receiving challenges? I want to check how long it takes for me to receive these challenges.

nazar-pc · March 12, 2024, 3:05pm

No, but if you set RUST_LOG=info,sc_proof_of_time=trace you should see messages about new proofs of time checkpoints almost exactly each second.

nazar-pc · March 13, 2024, 9:33am

Here are warning logs from the machine you have shared access with:

Proof of time chain was extended from block import

2024-03-12T19:18:42.970333Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3417177 to_next_slot=3417178
2024-03-12T23:03:53.841201Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3430883 to_next_slot=3430884
2024-03-12T23:14:28.649307Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3431527 to_next_slot=3431528
2024-03-12T23:44:24.767989Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3433349 to_next_slot=3433350
2024-03-12T23:54:49.690373Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3433983 to_next_slot=3433984
2024-03-13T00:34:31.930028Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3436383 to_next_slot=3436384
2024-03-13T00:49:24.102653Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3437288 to_next_slot=3437289
2024-03-13T01:51:29.358894Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3441067 to_next_slot=3441068
2024-03-13T02:51:59.943461Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3444750 to_next_slot=3444751
2024-03-13T03:01:51.410814Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3445350 to_next_slot=3445351
2024-03-13T03:07:26.560577Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3445690 to_next_slot=3445691
2024-03-13T03:43:37.177761Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3447892 to_next_slot=3447893
2024-03-13T03:55:43.723378Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3448629 to_next_slot=3448630
2024-03-13T04:08:55.304595Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3449432 to_next_slot=3449433
2024-03-13T04:29:14.733894Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3450669 to_next_slot=3450670
2024-03-13T05:30:54.331070Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3454422 to_next_slot=3454423

As can be seen, it is always 1 slot difference and it doesn’t happen all the time, just from time to time.

The way protocol works is the following:

slot arrives
farmer does the audit and in case there is a solution candidate will try to generate a proof
with solution ready to go node waits for future slot (+4 from slot for which solution was generated)
once future proof of time arrives, block is created, signed and sent to the network

Now let’s see the logs when things work fine:

Good

2024-03-12T18:21:22.580703Z TRACE Consensus: sc_proof_of_time::source::gossip: Superficial verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413688
2024-03-12T18:21:22.724102Z TRACE Consensus: sc_proof_of_time::source: Block import didn't result in proof of time chain changes best_slot=3413687
2024-03-12T18:21:22.724134Z  INFO Consensus: substrate: ✨ Imported #602616 (0xa5c9…23e7)
2024-03-12T18:21:23.245389Z DEBUG Consensus: sc_proof_of_time::source::gossip: Full verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413688

We can see that future proof of time checkpoints for slot 3413688 were received just before block that contains future proof of time slot 3413687. This is how things are supposed to work.

Here is an example where this wasn’t the case:

Less good

2024-03-12T18:21:38.363184Z TRACE Consensus: sc_proof_of_time::source::gossip: Superficial verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413704
2024-03-12T18:21:38.765137Z  INFO Consensus: substrate: 💤 Idle (40 peers), best: #602619 (0x1a10…f52b), finalized #522672 (0xc9e8…1f33), ⬇ 42.6kiB/s ⬆ 94.6kiB/s
2024-03-12T18:21:39.038338Z DEBUG Consensus: sc_proof_of_time::source::gossip: Full verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413704
2024-03-12T18:21:39.038604Z  WARN Consensus: sc_proof_of_time::source: Proof of time chain was extended from block import from_next_slot=3413704 to_next_slot=3413705
2024-03-12T18:21:39.038860Z  INFO Consensus: substrate: ✨ Imported #602620 (0xef01…625c)
2024-03-12T18:21:39.336517Z TRACE Consensus: sc_proof_of_time::source::gossip: Superficial verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413705
2024-03-12T18:21:39.997988Z DEBUG Consensus: sc_proof_of_time::source::gossip: Full verification succeeded sender=12D3KooWMssTp9dMeTqm5TKE9Hjawv5kYoZ3u2YZjvuH5T9MbKZH slot=3413705

Here future proof of time in the block was received slightly before the checkpoints from gossip.

To be clear, this is not great, but not catastrophe either. Basically it means that when this happens and there is a fork, it is likely that the block you’ve produced will not win the fork. However, if there are no forks or you produce a vote, there should be not difference.

So strictly speaking you are in a bit of disadvantage when this happens, and this likely happens to different participants from time to time (I saw it in my logs a few times as well) due to how time sensitive this stuff is, but it is not the end of the world. Running fast timekeeper helps, but is not a strict requirement.

z_W · March 13, 2024, 2:52pm

Have you found the reason why the rewards are low in mainland China?

nazar-pc · March 13, 2024, 3:14pm

Not really. If there are proving issues then they need to be addressed separately. If it is related to forking and loosing fork race then as mentioned timekeeper will help with that.

Would be helpful to quantify “low” against “normal”.

z_W · March 13, 2024, 3:28pm

But not everyone has the conditions to run timekeeper

Topic		Replies	Views
Add a reading pool size from farming pool Feedback & Suggestions	1	34	April 26, 2024
Epyc 至强等多核心cpu推荐配置 Non-English	2	1227	March 4, 2024
Does the number of physical cores in the processor affect the number or size of farmers Support	4	126	March 1, 2024
Should a multi-core CPU have multiple farm processes enabled to increase speed? Support	2	282	December 15, 2023
What parameters of nodes and farmers should be adjusted to improve speed when using high-performance hardware and large bandwidth? Support	3	265	November 21, 2023

Can the --farming-thread-pool-size only use up to 32 threads at most? For some large EPYC processors, isn't that too few?

Related Topics