Faster way to sync non-archival nodes

With growing blockchain size sync process takes longer and longer, which is a big user experience problem for farmers.

There are fast and warp syncs in Substrate that are not functional in Subspace due to various reasons, but I think we can still do better than we do now.

In Subspace we have sync from DSN where we take advantage of archival history collectively persisted by farmers as well as regular Substate sync to sync blocks that are not archived yet (also sometimes acts as a fallback).

What we could do in DSN sync is to download only last few segments with blocks and instead of importing them normally (expecting parent block and state to exist already as it would in archival node), download the state of the first imported block from one of the nodes on the network and continue from there.

This way we skip both downloading and importing of the majority of the blockchain history to get farmers up to speed quickly and efficiently.

Archival nodes will still need to go through the same process as they do now though. We could also extend it later with Substrate-like warp sync that will download and import older blocks in the background, but that will be a much lower priority relatively speaking.


The question here is about security implications of such implementation and whether it is acceptable. I think implementation-wise it is actually not that hard to do if this is considered to be secure enough.

Short answer: Yes, it works, but we should pay attention to several details.

Long answer: From the security perspective, we need to ensure the following

  • If the archived history is unique, we are all good. Otherwise, a new node needs to download block headers to determine which archived history is compatible with the longest/heaviest chain.

  • After downloading “the state of the first imported block”, a new node should check the corresponding state root.

More discussions: Consider an ideal case where each full node maintains a chain of block headers with each header containing a state root. Then, we can define a secure node-sync problem. One solution is the following. First, a new node contacts several existing full nodes to download block headers. As long as one is honest, the new node can obtain a longest/heaviest chain of headers in the local view of the honest full node. Second, the new node downloads the state of a block buried deep enough (regarding the longest/heaviest chain) and then it checks the state using the state root. This ensures the integrity of the state. It is easy to see that this solution is as good as a standard solution where a new node downloads the entire chain of blocks. To sum up, we are all good as long as our new implementation “simulates” the above solution.

@Chen_Feng_2023 is this something where MMR can come in handy as well?

Unique and longest are orthogonal properties from my point of view. Check the spec on how we do DSN sync, we don’t look at longest chain there actually.

Naturally, wouldn’t be done any other way.

As mentioned above, we are not downloading block headers in this case.

I don’t see how it does unless we verify all the block headers and current implementation is not able to do that without access to the runtime/state.

Well, that is not what I suggested though, what you’re suggesting doesn’t compress resources quite the way I suggested it.

Related, may be of interest: Mina protocol(https://www.kraken.com/learn/what-is-mina-protocol)

They are based on very different cryptographic primitives (zk SNARK based recursive proofs), but worth checking for the ideas themselves

To @nazar-pc : We discussed this in detail during our R&D meeting. Please go ahead with the implementation and we will then tell you what are additional checks we need to do. (These additional checks can be made orthogonal to your implementation.)

Unique and longest are orthogonal properties from my point of view. Check the spec on how we do DSN sync, we don’t look at longest chain there actually.

Yes, they are orthogonal. We just check the uniqueness for our purpose and we don’t look at the longest chain.

I don’t see how it does unless we verify all the block headers and current implementation is not able to do that without access to the runtime/state.

Verifying block headers is not the only way. Another way is described by Dariia here.

1 Like

I don’t have time to work on this right now, just wanted to initiate conversation to collect feedback. Hopefully this will be one of the nice upgrades to Gemini 3h.