Erasure recovery in G1

To erasure code N source chunks of the records (as scalars in Fr of the BLS12-381 curve) and source commitments (as G1 points on the same curve) to 2N, we use the following approach:

  1. Inverse FFT (in Fr or G1, respectively) to interpolate a degree N-1 polynomial
  2. FFT to evaluate it over domain of size 2N

Then to recover the source chunks from a random sample of N chunks we use a known approach described by Vitalik here. @Chen_Feng_2023 was kind to explain it on a simple example:
Consider a simple polynomial f(x) = x + 1 over Z_5. Note that 2 is a primitive 4th root of unity. We have the following evaluations:

f(2^0) = 2,
f(2^1) = 3,
f(2^2) = 0,
f(2^3) = 4

Say, we are missing two evaluations, f(2^1) = 3 and f(2^3) = 4. How can we recover f(x) using FFT-based algorithms?

Input:

f(2^0) = 2,
f(2^1) = null,
f(2^2) = 0,
f(2^3) = null

By definition, we have Z(x) = (x - 2^1)(x - 2^3), D(x) = f(x), and E(x) that evaluates to our input with erasures:

E(2^0) = 2
E(2^1) = 0
E(2^2) = 0
E(2^3) = 0

Clearly, we have D(x) Z(x) = E(x) Z(x). Also,

E(2^0)*Z(2^0) = 2 * 2 = 4
E(2^1)*Z(2^1) = 0 * 0 = 0
E(2^2)*Z(2^2) = 0 * 2 = 0
E(2^3)*Z(2^3) = 0 * 0 = 0

By interpolating the above four points, we have a unique polynomial
g(x) = (x - 2^1) (x - 2^2) (x - 2^3). If we can prove that D(x) Z(x) = g(x), then we can recover D(x), which is the source f(x).
Note that, by construction, g(x) is of degree at most 3, and Z(x) is of degree 2. Also, by definition, f(x) is of degree at most 1.

Proof by contradiction. Suppose that we have another polynomial f’(x) of degree at most 1 such that f’(x) Z(x) = f(x) Z(x). Then, we have f’(2^0) = f(2^0) and f’(2^2) = f(2^2). Thus, f’(x) - f(x) has at least two zeros. This a contradiction because f’(x) - f(x) is of degree at most 1.

How can we compute g(x)/Z(x)?

Method 1: Extended Euclidean algorithm.

Method 2: Introducing a random k as suggested by Vitalik. Note that k shouldn’t be in the set {2^0, 2^1, 2^2, 2^3 }. Vitalik hasn’t emphasized this.

Open questions
Can the recovery algorithm as above be adapted to work with G1 points so we can recover erased commitments? I feel like it could since we can do FFT in G1, but I’m hesitant about E(x)*Z(x) multiplication
Alternatively, is there another algorithm that works in G1?

3 Likes

I have noticed that the zero-polynomial Z(x) does not have to be in G1. Then we can compute E(x) Z(x)easily if only E(x) has coefficients in G1.

1 Like

I’m actually not sure what you mean by “FFT in G1”, but how do you run the original erasure coding on the commitments? The implementation seems to do (inverse) “FFT in G1” – my understanding is that it’s FFT in the base field F_q, though I couldn’t find the actual code – so you should be able to use the same approach.

The way I believe that it works is the following:
Assuming affine representation, you represent the commitments as (x,y) \in (F_q)^2. Given erasures, you can naively run 2 FFTs on the existing data, one for x and one for y, and restore the missing points. Since y could be represented by a single bit, there may be a more efficient approach using a single FFT.

Note that you can totally ignore the context in which the data set consists of points in some subgroup of an elliptic curve. You just work on F_q, and by construction all of the data set points are also in G1.

1 Like

There is FFT in G1 implemented in multiple libraries: the set of G1 points is represented as evaluations {w^0, P_0}....{w^N, P_N} of some polynomial with coefficients in G1 where w is a primitive Nth root of unity. I already have a prototype implementation of erasure coding this way here, but we haven’t added it to Subspace yet.

We would like to preserve the homomorphic properties and treat commitments here as points - this will allow use to show that commitments to erasure-coded data are correct - because they are themselves erasure coded from source commitments. We get this idea from Information Dispersal with Provable Retrievability for Rollups p.7 eq.10

1 Like

I have resolved the problem and implemented a prototype here. It’s based on rust-kzg and the main thing to look at is the recover_poly_from_samples_g1 function.
Given a set of length 2N of points in G1 (N source, N erasure-coded) where up to N are missing (None) I want to recover the missing points:

  1. I create Z(x): Fr->Fr, which evaluates to 0 on the indices of the missing indices.
  2. I compute g(x) = E * Z(x), g(x): Fr->G1 which evaluates to identity in G1 on the missing indices and to P_i*Z(i) on the present points. I do not explicitly compute E(x), but that’s an efficiency detail. Now as we have proven above, D(x) Z(x) = g(x) and D(x):Fr->G1 would evaluate to our original data.
  3. To extract D(x) = g(x)/Z(x) and avoid 0/0 division, I use the trick with “random” k chosen to be 5, which is practically guaranteed to not be in our original set of evaluation points - powers of the primitive root of unity. So I compute the two scaled polynomials Q1(x)=D*Z(k*x) and Q2(x)=Z(k*x)
  4. Then I compute the division Q3(x) = Q1(x)/Q2(x) = Q1(x) * Q2^(-1)(x) = D*Z(k*x) * (Z(k*x))^(-1) = D(k*x)
  5. I unscale D(k*x) to D(x)
  6. Lastly, because D(x) is in coefficient form with coefficients in G1, I perform an FFT in G1 and obtain the evaluations which equal our original set of points.

I have tested it by generating some N random G1 points, extending them via FFT to a domain twice as large, replacing some random with None and trying to recover with recover_poly_from_samples_g1. Seems to work and recover correctly.

1 Like

Integrated into Subspace reconstructor here gives ~10% speedup (in non-parallel case) over current implementation that recommits recovered source pieces. Since my implementation is not state-of-art, there are probably more significant gains.

1 Like