S-buckets sizes difference

I have plotted the following distribution of current s-bucket sizes for a 1300-piece sector:

The median is around 750 chunks per bucket. I attribute the drop after s-bucket index ~50000 to the fact that by that point most of the records have most of their 2^{15} chunks encoded. In fact, 2^{15}/(1 - 1/e) \approx 51800 is the expected number of challenge needed to find 2^{15} proofs in a Chia PoS table.
For 1000 records in a sector the picture is quite expectedly similar with a median of around 580:


In this case each bucket (before the drop) is ~19 KiB in size.
Do we want to make buckets more uniform?
I do not think the “unlucky” tail buckets with significantly fewer tickets create an incentive to drop them, since the prover still needs all chunks to create KZG witness and the time to encode them is negligible. However, intuitively, it feels like having bucket sizes more uniform is better.

3 Likes

I agree with you that having a more uniform distribution is better. Plus, you already proposed a smart idea to achieve this.

We can do something as simple as taking first encoded chunks for even piece offsets and last for odd piece offsets. Not sure how tricky or performance it would be to implement and diagram will certainly get very messy, but it is a relatively minor change to the protocol.

Nazar’s solution is much simpler and should achieve eliminating the tail and leveling the buckets. I have opened an improvement issue on GitHub.