On bundle transaction range adjustment

TL;DR domain bundle transaction range adjustment was broken. While trying to fix it, I found we may not need it
Background
A bundle is a set of transactions in a domain.
Each time slot operators of a domain hold a stake-weighted VRF election of the leader, who gets to select transactions from a predefined range and submit a bundle for inclusion into the consensus chain.
A transaction range is a range of sender IDs (out of all possible sender IDs) whose transactions this operator can include in their bundle. Example: if all possible IDs were letters A-Z, one operator’s range could be K-P. Limiting the tx pool available to each operator at each slot is a means of countering MEV and minimizing duplication of transactions across bundles. If there is no tx available in that range at the moment, the operator will not produce a bundle (with one exception, which is outside of the scope of this post).

Tx Range Adjustment
Dynamic transaction range adjustment is a mechanism that aims at stabilizing latency under variations in the operator set. It measures the number of bundles produced in a time interval, compares it to the expected number and adjusts the tx range accordingly. If bundles produced were too few, the range should increase so that more transactions can be picked in each bundle, and if bundles produced were too many, the range would shrink to avoid duplication.
Below I’m going to explain why it doesn’t work.

Case 1: enough tx demand

Assume that there are enough txs, so that every time an operator is elected, they produce a bundle with the txs in their range. Suppose we expect 600 slots in one interval before adjustment. By virtue of our VRF election, where each operator’s chance is equal to their percentage in total stake, we would roughly get 1 operator elected per slot, so ~600 bundles in 600 slots. It may happen we get less, so the range would increase, or more, so it would decrease. However, VRF elections are independent of one another and independent within intervals, and the tx range has no effect on bundle production whatsoever and would fluctuate.

Case 2: low demand (few txs)

Now assume that transactions happen to be few, so quite often, an operator can’t produce a bundle, so naturally, bundles are fewer than expected. If this situation is prolonged, the tx range will converge to the maximum value and stay there. When tx demand picks up, we move to case 1 and fluctuate around the same limit.
Note that the same happens if some operators are offline and don’t produce bundles when elected.

Case 3: high demand (too many txs)

In this case, when producing a bundle, an operator can’t include all of the transactions available to him due to weight (gas) limits. From the point of view of bundle production, this will be the same as case 1, but transactions will wait in the pool longer. So, in this case, tx range adjustment doesn’t affect tx latency.

Domain Config

Currently, domain configuration file with which the domain is instantiated contains bundle_slot_probability and target_bundles_per_block parameters aimed to control the latency of the domain. These 2 values are, in reality, derivatives of one another. Moreover, they have no effect on the analysis above as it always ends up in one of the three cases. Smaller bundle_slot_probability and target_bundles_per_block artificially slow down the domain, and the latency of the consensus chain caps bigger values.

Conclusions

I conclude that tx range adjustment is unnecessary, as are the bundle_slot_probability and target_bundles_per_block domain config fields.
I do think we should have a limited tx range per operator, but we need to select a reasonable static value (more on this later).

I agree. If we focus on one domain, a static value may work. On the one hand, if this value is too large, then multiple leaders may produce almost identical bundles, leading to a waste of resources. On the other hand, if this value is too small, then a leader cannot pack enough transactions in a bundle.

What if we have multiple domains competing for resources? Do we need something like “rate control”? I am currently thinking along this direction.

Here are my thoughts on a static value: There are n = 2^256-1 = U256::MAX possible values for tx sender. Let x be the range size chosen by each operator. The probability of a random tx being outside the range of an operator at any given slot is (n-x)/n. The probability of that happening for 10 slots is ((n-x)/n)^{10}. If we set x=n/3, the probability of a tx not being included within 10 slots is ((n - n/3)/n)^{10} \approx 0.017. Thus, setting the tx range to U256::MAX/3 gives ~98.2% probability of tx being included within 10 slots.

If we want it cleared within 6 slots (with a probability of 98.4%), we should set the range to U256::MAX/2

To Chen: each domain has a separate pool of txs relevant to this domain, so operators of different domains do not compete in this plane. They are competing for consensus chain blockspace resources, though. However, I think the discussion on rate controls on utilization of blockspace by different domains is a separate discussion in a larger picture of scalability.

The calculation is based on the assumption of having exactly one leader at each time slot, right? This assumption is reasonably good for our purpose. We can relax this assumption with a more complicated analysis.

They are competing for consensus chain blockspace resources, though.

-Yes, this is exactly the competition I am talking about :slight_smile:

Interesting discussion in this thread. I have a few points to add:

  • In the probability analysis, not sure if we should consider a random tx. We would like some level of fairness for all transactions, so we should ensure that “every” transaction gets included within a certain number of slots with reasonable probability. Then, the probability that a tx is not included in (say) 3 slots would be that it falls outside the range of first slot’s leader, then the second slot’s leader, and finally the third slot’s leader (who may not all be the same).
  • Taking the fairness argument further, wouldn’t it be important to rotate ranges between operators (no matter how big or small the ranges are)? Consider a simple case: We have two operators, one with a stake of 70% and the other with 30%. If a sender ID X falls outside the range of the first operator, then it could only be included in roughly 30% of the bundles, which seems unfair compared to a sender ID Y that falls within the the first operator’s range (and thus has a much higher chance of being included). To bring balance into the question, one option could be to rotate the ranges amongst the operators every so often (based on slot number, etc.). So, in this case, when the first operator’s range is rotated to the second one’s, X would get the chance to be eligible for a higher number of slots just like Y did before the rotation.

Please let me know if I am missing something.

  • I feel that Dariia’s simple analysis applies to both a fixed tx and a random tx under the assumption that the “location” of any leader is uniform at random and independent of the locations of other leaders. To see this, suppose that the range is MAX/2. Then, for any fixed tx and any random location of a leader, the chance of “escaping” a bundle is 1/2. This applies to a random tx as well.

  • I feel that we already rotate ranges in some sense. Consider a concrete example where MAX = 16. That is, the possible values are from 0 to 15. Let us set range = 4. Say, operator 1 (with 70\% stake) is selected as a leader with its random location 5. Then, operator 1’s effective range is \{ 3, 4, 5, 6, 7 \}. In other words, its random location is the center.

Does it make sensen? Happy to elaborate more if needed.

Thank you @shashank @Chen_Feng_2023!
Indeed, I think my formula applies to every transaction too, not just single random one.
The ranges are moving per operator. The range is fixed in size, but the center is different every slot, since it is not based directly on operator ID, but on the VRF output, which is different every slot. So in your example, the 70% stake operator will cover different parts of the pool each time he is elected.

Thank you @Chen_Feng_2023 and @dariolina for clarifying. Turns out that I did miss something important: that the ranges are not fixed for operators, that they are chosen based on VRF outputs (as Dariia explained). If the ranges are picked at random then, yes, random vs fixed tx doesn’t make a difference. Also, the rotation suggestion is already taken care of.

Great conversations! I also had an offline discussion with Shashank. Let me summarize our findings below.

  1. A fixed range is good enough. In order to determine its value, we have the following considerations: 1) reducing the latency of including a transaction to a bundle; 2) reducing the overlap among different bundles for a time slot. There is a tradeoff between these two metrics.

  2. The VRF threshold can be adjusted dynamically to make sure on average we have d leaders per slot. Here, d is a system parameter. (Previously, we set d = 1.) We will give some analysis on how to choose d.

  3. Since our leader-election process is memoryless, we feel that a simple fee structure works well. For example, if a transaction appears in two bundles for the same time slot, the two bundles can split the transaction fee (or part of it) equally. If a transaction appears in two bundles for different time slots, only the earlier bundle gets the fee (or part of it).

  4. If we have too many domains, we may need some rate-control mechanism. I have some ideas. And Shashank proposed an interesting direction. They are not the current priority, and we will explore them later.

1 Like

Perhaps I missed the discussion on this, but when there are too few transactions, if the tx range is not increased we are in the risk of transactions getting confirmed very slowly. Unless we are going to change a different parameter, like how many bundles are produced per slot.

Moreover, when there are many transactions, if the tx range is not decreases we will expect the inefficient case of frequent duplicated transactions (typically those with the highest fees) and more opportunity to MEV (which, I don’t think the tx range will solve anyway, as usually the extractor can easily grind on their tx to get a “good” ID).