Proof of space performance improvements by sharing tables between pieces

I’ve been hacking intensively on Chia again last week and got some decent progress, but there is one thing that bothers me.

Since we need a table for each piece, on commodity hardware performance looks like “seconds per piece” rather than “pieces per second”, meaning users will be plotting under 1 MiB/s, which is way too slow.

Since k22 is MUCH bigger than k17, maybe we can reuse one table for multiple pieces somehow? It would compensate the negative effect of increased k, while leaving algorithm memory bandwidth-bound.

I’d like to explore this option, but I’m sure there are security implications to consider.

1 Like

Yes, for an attacker doing on-demand encoding, once the attacker produces a table, it can print multiple lottery tickets. Previously, the attacker can only print one.

On the other hand, k22 is more secure than k17. This might mitigate the previous issue. In other words, there is an interesting tradeoff. Let me see whether we can use some simple math to capture this tradeoff.

2 Likes

IDK if it’s worth it. It would not give you performance improvement in the time-sensitive part.
Using one table for multiple pieces reduces total plotting time. However, you would still need to generate the whole table for each piece during proving.
Intuitively, using a k22 table to encode two pieces would be equivalent to using two tables of k21, one for each piece.

I have tried generating k22 tables partially, but the number of matches and present proofs is reduced a lot if you don’t have the full table.

I know, but this is not the issue I’m raising here, we have already discovered that we can do time-sensitive part within time limit once we have prediction window. The issue now is that we will have painful plotting times.

Imagine plotting a terabyte at 0.5 MiB/s speed on a typical quad core machine. It’ll take you something like 25 days to complete plotting. All this time CPU will be at 100% usage. Hardly a great experience. Those with a few terabytes will wait months. And even if you get powerful consumer hardware you’re still looking at a week of plotting on my desktop while burning all 32 logical threads available.

In fact, depending on how tradeoffs play out, we can go to k23 if that allows us to reuse table across more than double of tables comparing to k22, it’ll be a better experience for farmers.

From the security perspective, on-the-fly computing for one sector requires computation in the order of 2^k * (# of pieces in one sector) / # of pieces covered by one table. So, if we move from k = 17 to k = 22, we gain 2^5 from the term 2^k. As long as the term “# of pieces covered by one table” is smaller than 2^5, we are good for on-the-fly computing. Next, I will consider on-demand plotting in another reply.

I’m not 100% sure why anything below 2^5 is fine since we didn’t consider k17 as secure enough before. If it was then what is wrong if we reuse the table for more than 2^5 pieces? I feel like I’m missing something important here.

“we didn’t consider k17 as secure enough” is mostly due to on-demand plotting. That is why I separated it from on-the-fly computing to avoid confusion. From on-demand plotting perspective, k22 (or k23) is significantly more secure than k17 in terms of memory IO.

Yes, it is correct. Mathematically, let f(23) be the amount of memory IO for k23. If f(23) > 2 f(22), then we can “reuse more than double of tables comparing to k22”. Any thoughts on f(23) vs. f(22)?

I think since every increase in k doubles the size, other properties should be doubling as well (most of them anyway).

I was just thinking if k23 is so large, maybe we can reuse it for 3x tables comparing to k22, then we’ll be doing less work for the same number of tables. Otherwise it is pointless from user experience point of view and makes proofs bigger unnecessarily.

I find the suggestion interesting.

  1. Regarding plotting time, is a table with k is roughly half of the time to generate than (k+1)?
  2. As @dariolina points out it will make proving longer. How do we take this into account? Currently the prediction window is short, and keeping it short is good against (non-private-chains) on-demand plotting.
  3. At the end of the day, the argument against on-demand plotting (especially for such small tables) is that it is not economical (a rational strategy would be to use the resources for on-demand plotting for permanent plotting), so we need to also make sure that the cost of plotting with (k+1) is not less than twice the cost of plotting with k (assuming that with each increase we double the number of pieces per table).
  1. Yes, roughly that (differs between single and multi-threaded implementation, but somewhere in that ballpark)
  2. See latest numbers in Faster chiapos (part 3) by nazar-pc · Pull Request #1685 · subspace/subspace · GitHub, it’ll get even faster still, but probably not orders of magnitude faster at this point, if the window is 6 seconds, it should be sufficient for modern consumer hardware with some margin (but more testing on that lower-power hardware is necessary)
  3. so we need to also make sure that the cost of plotting with (k+1) is not less than twice the cost of plotting with k (assuming that with each increase we double the number of pieces per table)

That was my hope though with k23 suggestion, that we can double the compute, but let’s say triple number of pieces that use the table

Thanks, so for 1, what is the actual gain for plotting times?
Specifically in this example:

is it with k=17 or k=22?

We can go from (the current) k=17 to k=22 and have 2^5 more pieces-per-table, but then it seems that we don’t gain anything in plotting time.
So if I understand correctly, the point is not to double the amount of pieces-per-table with each increase of k by 1, but to fix some k, that we feel comfortable enough with, then try to squeeze a bit more from that table rather than the 1-piece-per-table approach?

For 3, double the compute but triple the pieces per table is beneficial to both the honest and on-demand plotters, as on-demand plotting for 3 lottery tickets would cost a third less for k=23 than for k=22 (if costs indeed increase proportionally).

That’s my understanding of Nazar’s suggestion too.
The way I imagined it was that for the first piece we challenge indices 0..2^{16}, for the second piece 2^{16}..2*2^{16} and so on. IIRC the current implementation, that is not a big code change.
Then there’s another side to it too, a k table contains 2^{k} unique challenge indices. For each piece, we use 2^{16} of those, so we can fit a maximum of 64 pieces in a k22 table and have a unique challenge index for each chunk.

Those numbers are for k22. The gain will depend on how many pieces we can reuse table for. If we can reuse one table across 2 pieces then we save ~50% of the time for Chia table generation, which is not the whole plotting time, but a major component of it. If we can reuse it across 2^5=32 pieces (the difference between size of table for k17 and k22) then we can expect plotting to take days instead of weeks.

We want to move from k17 to k22 to improve security, not plotting time. As far as I can see it is not “can go”, it is “must go (or else protocol is not secure)”, hence I’m exploring ways to mitigate performance consequences of this change.

Primarily yes. However, if going to further than k22 allows us to improve farming experience in addition to improved security then it is something worth considering. But as you noticed this only makes sense if we can share table across more than 2x of pieces at k22, otherwise UX doesn’t improve at all.

If it is as simple as that then it answers my question about upgrading k22 to k23. Also note that latest AMD CPUs already have 1.1 GiB of L3 cache and our tables are MUCH smaller than that, the whole process uses ~700 MiB of memory, so if we wanted to overwhelm memory bandwidth we’d need to go to k24-k25 (I tried k25, it is slow and uses a lot of memory, not feasible for average consumer hardware to prove within 6 seconds prediction window).

I posted last benches results in Define Secure Consensus Parameters · Issue #1502 · subspace/subspace · GitHub

I don’t know if this is true. Since honest farmers will have to produce the table (fast enough) for winning plots, plotting is not designed such that it will be infeasible to create plots on the fly and use them with recent challenges. So now it’s just a matter of scale, and as this can be done in parallel, an attacker in not physically bounded by how many plots they can produce.
The remaining question is how much it would cost. Clearly bigger tables would cost more (though we need to understand the proportion in which the cost grows). The correct analysis is to fix costs, and compare how many tickets one would get with the honest farming strategy and with the on-demand farming strategy, and ideally also a mixed strategy. This, as far as I know, had not been done. Chia gives an analysis for their table size here.

Well, then I need better questions and more specific criteria then those we have in Define Secure Consensus Parameters · Issue #1502 · subspace/subspace · GitHub following the offsite to help here.

Could you run the same perf breakdown for k20-23 for comparison so we can see the progression and see what’s the actual “difficulty” increase with the increase of k?

Do you mean perf as Perf Wiki or just performance?