Redefining s-buckets

This post continues the topic from a previous post titled Slight change to sector contents.
Since it may have security implications, I would rather have a separate discussion.

The main idea below is to transform ‘s-buckets’ into complete encoded records, rather than contain chucks, for a specific index, from all the sector’s records.

The motivation to reconsider s-buckets is driven by the current development efforts. As raised by the engineering team, the current proving time seems rather slow, partly because different s-buckets can have different sizes (see more on the previous post).

For readers interested in relevant background, see the Deep Dive part in the Dilithium: Subspace Consensus v2 blog post.

The current protocol
We recall how famers plot:
Consider a sector as a table, where the rows are encoded records. Each column i contains the i-th encoded chunks from each record, if such encoding exists. That is, the table may have empty entries.
Then, we call the collection of columns “s-buckets”, each labeled by the corresponding index i.

When farming, an index j is chosen at random, and the farmer ranges over all encoded chunks in the j-th s-bucket, checking for a “winning ticket”.
Note that a single challenge gives roughly 819 (=1300*0.63) “lottery tickets” per sector.

Security of the current protocol
When we analyze the security of the protocol with respect to a farmer that tries to create on-demand plots (hence amplify the amount of storage they allocate to the network), we want to show that the cost of on-demand plotting – basically the cost captured in creating PoS tables – is significantly higher than the cost of storing (thus making on-demand plotting undesirable).

Note that the cost of storing a sector of 1300 encoded records is 1300 times the cost of storing a single encoded record (I ignore global meta data per sector). Therefore, since a single (on-demand) creation of a PoS results in a single record, we may compare its cost to the cost of storing a single record. In other words, we can compare the cost for an on-demand sector (1300 PoS tables invocations vs. 1300 stored records), that gives 819 tickets, or just a single record that gives, on average, 0.63 of a ticket – per ticket it’s the same.

Therefore, in the current design, (honestly) storing 1 record gives 0.63 lottery ticket, while a single on-demand generation of a PoS table gives 0.63 lottery ticket. The proportion of lottery tickets in honestly storing a record vs. a PoS table generation is 1.

New proposal for s-buckets
Now let’s consider the case where we define an s-bucket to contain a full encoded record. This proposal does not change the method that one s-bucket is randomly chosen per challenge. Only now we look at rows. Hence, a single challenge gives roughly 2^15 lottery tickets.

First, it should resolve the issue above. Indeed, when a proof is found, the farmer no longer needs to range over all the s-buckets to collect chunks for the winning record. Moreover, it can directly “jump” to the s-bucket location, as all s-buckets are equally sized (some may need to contain raw chunks).
On the down side, auditing, i.e. checking for a winning ticket, is expected to take longer.

Secondly, on-demand PoS table generation may give many more than 2^15 tickets per challenge, since the farmer can range over the entire table, producing more tickets. To mitigate that, we need to limit the “PoS index” to 2^15 / (1-1/e) (currently the same value is limited by NUM_S_BUCKETS). By doing so, the proportion of tickets against the honest farming protocol does not change, it is still 1.

That is, changing s-buckets to contain full records does not seem to be less secure against on-demand plotting than the current approach, as the probability of winning a block slot, per PoS table generation, does not change.

4 Likes

Thanks @Barak for the interesting proposal and economic analysis. This argument only applies to on-demand plotting under a linear cost model. If we consider some other malicious strategies, this new proposal might be less secure than the current approach. In other words, more cases need to be discussed. For example, the current approach seems to amplify memory IO better than the new proposal.

Security implications apart, this construction could save in the ballpark (200-300ms) required to read the chunks. However, the time to recover the record from encoded chunks and compute PoS and KZG proof is the same.
A flow I see here

Therefore, in the current design, (honestly) storing 1 record gives 0.63 lottery ticket, while a single on-demand generation of a PoS table gives 0.63 lottery ticket. The proportion of lottery tickets in honestly storing a record vs. a PoS table generation is 1.

is that while the cost per ticket seems the same, it is not the case when you look at buckets vs tickets. In the current honest case, given a 1 sec challenge slot and a determined s_bucket_audit_index storing 1300 records honestly gives a farmer ~819 tickets instantly while achieving the same amount of tickets in a bucket for a malicious farmer would require up to 1300 PoS table generations that also require a significant computational resource to achieve under a second.
As I understand your proposal, someone with a lot of computing power is actually incentivized to store a relatively small history once and generate a table for every challenge and achieve the same chances as honest farmer. Or did I misunderstand?

You understand correctly.

So far our approach for “security against on-demand plotting” has not been that it is infeasible, but that it is not a rational strategy: if a farmer is willing to pay the cost of plotting, they better off direct it to buy HDs and store the data. For what it worth, this approach is taken by other projects as well.

Admittedly, the above is not so easy to argue about in practice. However, trying to “physically” limit plotting is also hard. First, the farmer can always “buy more computers” to parallelize the work. Secondly, we should also protect against long-range attacks where farmers create blocks not in real time. The latter is something we should protect against in general, but my point is that it is not easy to rely on physical limitations.

In the current design on-demand plotting is not rational being not really infeasible, but quite difficult. You would need a 1000 cores to compute a 1000 tickets within slot time, which isn’t cheap.
I do have a suggestion we can consider to make s buckets size more uniform in the current design. Right now, we start iterating at 0 and go up until we find NUM_CHUNKS proofs or hit NUM_S_BUCKETS, whichever happens first, so statistically tail buckets will be more sparse. Instead, we can have start_index uniformly distributed between 0 and NUM_S_BUCKETS , like start_index = piece_offset * NUM_S_BUCKETS/pieces_in_sector mod NUM_SBUCKETS or similar so that

record 0 start_index = 0
record 1 start_index =  50
record 100 start_index = 50240

and for each record it would loop back until it has had a chance to do NUM_S_BUCKETS challenges to the PoS table in the worst case
This should make each bucket size more uniform

Good idea! It makes the bucket size more uniform.

My understanding is that being “more uniform” is not the goal. We need deterministic s-bucket sizes, and ones that will make the fetching/reading from a given s-bucket highly efficient. We should advise the engineering team on this one.