Not requiring justifications in blocks can lead to trivial DoS attacks

Right now we use what Substrate calls “justifications” to store PoT checkpoints. Justifications are pieces of data stored alongside the block in the blockchain database and typically used for finality in PoStake blockchains.

In our implementation they are currently optional and designed to help with verification. Technically, there is nothing wrong with them not being included, we can still run proving locally and verify that proofs included in block header are correct.

There is a problem though that this proving in pessimistic case takes time and there is no way to know whether proofs are correct or not before we have proven the whole range from parent block’s future proof of time to current block’s future proof of time.

How to exploit this: simply produce a block that is otherwise valid, but is produced at (and includes invalid proof of) slot that is way into the future, let’s say 10_000 blocks into the future. Unless checkpoints are present, verifier would have to then run proving for 10_000 seconds+ just to find out that block is invalid.

What is worse, blocks are imported sequentially and even pre-import verification is not completely independent, meaning this will prevent the node from importing other, valid blocks, but this is an implementation detail of Substrate that can be, potentially, solved by improving Substrate itself.

With that in mind I see no other way than making justifications with checkpoints mandatory, then we can stop verification at the very first invalid checkpoint and previously verified and valid checkpoints will still be mostly helpful as we’ll likely need them for valid blocks anyway.

Additonally if we make them required for newly produced blocks and non-DSN Substrate sycn, we should probably include them in the archival history of the blockchain as well, since at block verification/import level there is no information where block comes from. While it is possible to create a piece of shared state between DSN sync and block verification/import it breaks too many abstractions IMHO. Also having justifications there will accelerate verification of archival history (even though we already do optimistic verification there) and will not occupy too much space comparing to the rest of the block data.

The number of checkpoints is capped, isn’t it? I think it was 16 or 32, so that blocks won’t get too big. In this case you would still have to run for 10_000/32 timeslots.

To me this is the solution. Since the previous block isn’t verified yet, if I get a new block at a future height I should try and verify it. This means that honest blocks will be verified quicker than these far-fetched blocks, and generally since we will have more (honest) blocks until 10_000 timeslots ahead, this block will not be in the heaviest chain.

Still, some related issues:

  1. Until we get an honest block we may spend resources on verifying this, and potentially more of the kind, blocks. Conceptually it is not different than other (non PoT) invalid blocks.
  2. When we’ll have (automatic) difficulty adjustments for proof of time, this kind of issue may arise naturally (obviously, with less than 10_000 slots), so we may actually need to verify a few blocks in parallel in order to determine the heaviest chain.

I think you’re referring to much older version of design. Neither it is capped right now nor would you be able to verify fewer time slots than there is between two specific blocks.

Previous block is verified, we’re building on top of the current tip, just for timeslot that is way in the future.

Except verification without PoT takes ~1s with fully loaded (computation-wise) block, while with PoT verification can take even years, depending on where in the future proposed time slot is.

I’m not sure how this is related to be honest.

Hi @Barak: I remember we discussed similar attacks before. Basically, a farmer mainly relies on PoT outputs from timekeepers to conduct verification. Also, a farmer has a rough sense of the “current” time slot. For example, we have a notion called “not too far in the future” in our spec.

So, in the pessimistic case, if a block “too far in the future” arrives, it will be cached and verified later. (This is similar to the case that in Bitcoin a future block arrives before its parent block.) This might be an alternative solution. Let me think more to see whether there are security concerns there.

Just to clarify that the block with proof of time too far in the future is built on top of the parent block.

It is true that nodes have a sense of the current time, but we cannot simply disregard the possibility of someone with a much faster proof-of-time algorithm. Even if we take this approach, we need to define what “later” means, and what happens if in the meantime a chain is formed on top of this block.

An attack of a similar flavour was described in Forks in time, the optimistic verification case. The difference is that there the alleged fork was heavier than the current chain (and there was no specific description of how far in the future the blocks’ PoT are). Here there is one block very far in the future, which means the the chain it may form should really be not heavy compared to the potential chain that could be built on the current tip with the “current” time slots.

We may be able to define some process that takes these notion of (potential) heaviness into account, so we don’t even start verifying the block(s) until “later”, but it requires some thought.

I do not think this is the current spec, we throw away and reduce reputation of peers that are sending stuff that are too far in the future on PoT side:

The assumption or coincidence in this case is that we should see a new block produced in not so distant future, which will make us caught up with PoT chain and newly received proofs of time will stop being “too far in the future”.

Overall I think you both have a bit outdated understanding of the spec, we had to tweak a few things with @dariolina as implementation progressed and we found more edge cases.

I can think of 3 ways to solve this issue, two are already presented above:

  1. Blocks must include checkpoints. We can relax this a bit, and require checkpoints every n timeslots, for some reasonable n. The main issue with this approach is that blocks may be very large if there are too many checkpoints. From above it sounds that we don’t have any cap on the number of checkpoints per block, so a farmer could include PoT’s for all the timeslots since the previous block for no good reason – so this issue is a bit more general.

  2. We enable verification of more than one block at a time, if necessary. In this case when we get an honest block it will be verified and the far dishonest block can be discarded. Here we can either start verifying the dishonest block immediately or wait for later, but “later” should be well defined, as well as the entire process around it. An issue with this approach is that we can’t really punish the dishonest farmer, as for this we will have to verify the block first.

  3. We introduce ‘empty’ blocks with proof of time. The idea is that for some n if there is no block after n timeslots from the previous block, the chain will have a checkpoint block that only serves for proof of time. Each farmer could build the block locally, given they have the n-th PoT. So basically we take approach 1 above but modify it a bit to smooth the verification process (a dishonest farmer may still reveal the entire chain – empty blocks + 1 PoS block – all together, but nodes don’t have to download the entire thing at once).
    This is just a suggestion and will require further study before being adopted. It is not immediate how we make sure we have consensus about this block or what kind of vulnerabilities it may have.

We can make logic more complicated, but I’m not sure it is worth it. In fact as mentioned blocks already include just current and future output, the rest is in justifications, which is not a part of the block, so I thought it is less of an issue.

We can still do it if really necessary, but considering a set of checkpoints for slot is 128 bytes and for typical 6 slots between blocks 768 bytes isn’t that bad. This is only really an issue when we have too many slots without any blocks in them. I do agree it is potentially an issue if the chain is running without any blocks for unbounded amount of time, but then the question is how likely that is and there is still a way to get it un-stuck by resetting time slots to the last block and producing a new block at a reasonable distance from it.

The way it is implemented they not only can, they have to if checkpoints are present at all. There is a check for contents and if it doesn’t match expectations it is considered invalid. And there is in fact a good reason: MUCH faster verification, ~7x faster than proving on a single CPU core.

What you describe is theoretically possible, but has a lot of edge-cases implementation-wise. We may get “disonest” block a few ms earlier and then we do have to verify it first because we know nothing else. Blocks are actually already verified in parallel, but there are different levels to it. Currently only blocks in the same batch are verified in parallel, all other blocks would have to wait for batch to be processed first.

I would definitely vote against this, there are way too many places that expect well formed block with correct contents and this will require everything to be updated to account for these “special” blocks.

We did have a notion of rejecting PoT that is “too far in the future”, but we never defined what is considered too far. Currently, such condition survived only in verifying PoT gossip: if received gossiped proof is more than 10 slots ahead of the latest, it is ignored (10 is a guess).
To prevent the issue Nazar raises, I think, we should have such a heuristic condition for newly produced blocks as well.
In an honest case, when there are many slots without a block, the real-time passed between two blocks should roughly be equal to an expected number of slots in between. It may happen and when such a block is later imported (e.g. during major sync) it will take a long time to verify and that is unavoidable.
In a DoS attempt, from the point of view of an honest node its local best block has timestamp t claimed on slot n. If it receives a block claiming slot n'=n+10000 it should expect the timestamp to be t'\approx t+10000. Wouldn’t Substrate already handle the case where the timestamp is too far in the future?
Even if we can’t count on timestamp, we can judge by the time t_{obs} the new block was observed. We can take the difference between t_{diff}=t_{obs} - t and n_{diff}=n'-n and see whether it is n_{diff}/t_{diff}\approx 1 and is it plausible that this many iterations were done in such a time frame n_{diff}/t_{diff}*slot\_iterations iterations per second if the number is off, an honest node can deem the new block invalid.
The margin by which this number can be off should be defined based on expected AES speedup and network delay.
This doesn’t prevent the issue completely, but bounds the verification cost.

I really dislike such heuristics and would very much prefer to not have it if possible.

Approximately, but the further we go the wider the range of things we accept should be and it still doesn’t address the fact that we will have to verify (potentially prove) 10000 slots before determining if block is valid at all. It is an edge-case, but we better handle it in some meaningful way.

Timestamp extrinsic in pallet-timestamp - yes, but slots are now completely decoupled from that.

Number could be off because someone deployed PoT ASIC onto the network, I don’t think we should rely on such things.

So far I think inclusion of PoT checkpoints, but with ability to skip some (but not more than some number that determines the cost of wasted compute) somewhat like Barak suggested is probably the way to go. I don’t like how many edge-cases we hit when designing something around this otherwise.

We did have Barak’s suggestion in design docs previously, however, we can make the number lower, i.e. include an output every 6-8 slots.
The tradeoff here is between keeping header PoT as small as possible vs keeping verification time smaller. If we consider keeping verification time smaller a bigger priority we could include every slot output. This way slots can be verified with instruction parallelism because you now have both seed and key for all slots.

We do not want to include any of this in block header, just in justifications that are not even part of the block as such. And ideally we’d include absolutely all checkpoints since that makes the verification most efficient.

The only challenge is when it gets too large, then we should include a subset of them, so we basically need to define a rule when and how many of them we are going to prune, kind of similar to how we do optimistic verification.

My bad, I thought the issue was that justifications can be omitted. But if you can make them mandatory that would be great.

Define too large? If a full set of checkpoints per slot is 128 bytes then we can fit 8192 of those in 1 MiB of justifications which is more than 2 hours. Feels like a nice to have optimization to prune for such rare cases.

Yes, we can make them mandatory if we want to.

We probably don’t want 1M overhead per block though. I was thinking to define a size limit and make checkpoints fit into it. If our limit is 1M - that works for me as well.

As discussed yesterday with @Barak, if checkpoints justifications are mandatory, but we impose a size limit on justifications and adhere to it by pruning checkpoints it doesn’t solve the initial issue Nazar raised. Because an attacker can always claim a slot that is much further away, and include fewer checkpoints and thus make verify more.
Is it really a concern if it gets too large? If we include all checkpoints we can fit 2.5 hrs with no blocks in 1MB, 5hrs in 2MB and so on. This is supposed to be extremely rare in honest cases and prevents your attack.

It will be 768 b in a normal 6s block, but unbounded if blocks get too rare.

1 Like

I’d say we don’t expect downtime that long and even if it happens, there is always a possibility to reset timekeeper to start with last block and produce somethign that isn’t too far in the future. So overall I’m fine with including all checkpoints and just assume that the size of the justifications is our effective limit for how far block can be in the future.

We have a consensus that justifications should include all checkpoints and be mandatory. On the question of potentially archiving justifications to speed-up deep sync, I recall they were not canonical, which was an issue for archiving?
Second, would archived checkpoints really help with deep sync if we verify blocks in parallel and not checkpoints?

1 Like
  1. They are generally not canonical, but we can filter-out and only archive checkpoints, which are in fact canonical
  2. It will help a bit since we’ll be able to verify corresponding proofs even faster than we would otherwise, though with probabilistic verification the impact will not be as dramatic

Certainly, archiving checkpoints would not hurt. It can allow us to make probabilistic verification more lightweight, where we verify a random checkpoint in a block rather than all slots (which was @Barak 's initial suggestion when we designed major sync).
Such verification may even be feasible for light clients.
And generally having a way to fully reconstruct PoT chain sounds potentially useful.

1 Like

Implementation: Mandatory PoT checkpoints in justifications by nazar-pc · Pull Request #2077 · subspace/subspace · GitHub

1 Like