Victor Farazdagi

Victor Farazdagi

Computer science and applied mathematics

01 Jul 2020

SoK: Attestation Aggregation Algorithms

If you want to suggest sth, please, feel free to contribute to the ethresear.ch topic.

1. Background

Let’s start from some relevant pre-requisite information:

  • BLS signature scheme on which the whole aggregation process is based: it is precisely because we can produce compressed BLS signatures, aggregation of non-overlapping attestations is possible.
  • There is an amazing note on Ethereum 2.0 attestation aggregation strategies which has the best coverage of current efforts on the problem of attestation aggregation. Strategies listed there are not essentially algorithms to merge/pack attestations, but more of a higher level approaches (which include changing of a way how validators’ overlay network is constructed and used).
  • There’s also a 1-pager design of Better Aggregation Inclusion by Preston and Terence.

2. Objective

In order to increase profitability, attestations must be aggregated in a way to cover as many individual attestors as possible.

In terms of expected input/output here is what we have:

  • Within the scope of this doc, an attestation is a bitlist of length equal to number of validators in a given committee, where each bit represents a single validator and there may be one or more of bits set per attestation.
  • Array of attestations is represented by incidence matrix, where rows are attestation instances, and each column represents a validator inclusion status across all known attestations.
  • Given a list of attestations, we are expected to aggregate all the non-overlapping ones, ideally having an aggregation which includes all the possible attesting participants.
 1// Sample attestations:
 2A0 = [
 3    [0 0 0 1 0], // A
 4    [0 0 1 0 0], // B
 5    [0 1 0 1 0]  // C
 6]
 7
 8// List can be transformed into:
 9A1 = [
10    [0 0 1 1 0], // A + B
11    [0 1 0 1 0]  // C
12]
13
14// Or, even better:
15A2 = [
16    [0 0 0 1 0], // A
17    [0 1 1 1 0]  // B + C
18]

At the moment, Prysm uses O(mn2)O(mn^2) iterations to search for possible aggregation pairs: the O(n2)O(n^2) is for pairing attestations and another O(m)O(m) to check for overlaps within each pair (mm is number of validators in a given committee).

It should be noted that, considering the fact that we are dealing with NP-Hard problem here, the runtime is not that bad, at all!

When it comes to performance of such a traversal (that’s how optimal the resultant aggregation is), then it solely depends on order in which attestations are presented: once a pair of non-overlapping attestations is located they are immediately merged, without any decision-making whether there’s a more profitable aggregation or not.

Here is a bit contrived example of such a suboptimal case:

 1// Given the following attestations:
 2A0 = [
 3    [0 0 1 0 0],
 4    [0 0 0 1 0],
 5    [0 0 0 0 1],
 6    [1 1 0 0 1],
 7]
 8
 9// Optimal solution (merge: rows 1, 2 and 4; discard: row 3):
10A_OPT = [
11    [1 1 1 1 1],
12]
13
14// Solution produced by the current solver:
15A1 = [
16    [0 0 1 1 1],
17    [1 1 0 0 1],
18]

So, the main objective is to find an approximation algorithm that will result in a solution which is more close to the optimal one.

Algorithms are analyzed using α\alpha-approximation method, where 0α10 \le \alpha \le 1 is the approximation ratio, and an algorithm with a given value of α\alpha produces solution which is at least α\alpha times the optimal value.

In our problem, solutions are scored by their cardinality: the more participants we have within a single aggregated item the better, with the maximum possible size equal to all the validators in a given committee.

When it comes to run-time efficiency, the aim is to make sure that aggregation doesn’t become a bottleneck in node’s processing (it should be able to process thousands of attestations in sub-second runs). This will be tackled via benchmarking alternative approaches.

3. Formal Problem Statement

3.1. Definition

Def (Attestation Aggregation): AA(U,S,k)SAA(U, S, k) \to S’

Let UU be a finite set of objects, where U=n|U| = n. Furthermore, let S=S1,,SmSi2US = {S_1, …, S_m | S_i \subseteq 2^U} be a collection of its subsets, where i=1mSi=U\bigcup_{i = 1}^m S_i = U i.e. all uUu \in U are present in one of elements of SS.

Then, Attestation Aggregation (AA) is the problem of finding SSS^\prime \subseteq S that covers at least k[1..n]k \in [1..n] elements from UU, and sets in SS’ are disjoint: TSTk |\bigcup\limits_{T \in S’}T| \ge k and SiSj=,i,j[1..m],ij S^\prime_i \cap S^\prime_j = \emptyset, \forall i, j \in [1..m], i \ne j

Ideally, we want TST\bigcup\limits_{T \in S’}T to have maximum cardinality, that’s k=Uk = |U|, and all uUu \in U are covered by SS’: TST=U |\bigcup_{T \in S’}T| = |U|

Since BLS doesn’t allow merging overlapping signatures, there’s that additional constraint of making sure that all elements of SS’ are pairwise disjoint.

To summarize: given a family of sets SS, we need to find a subfamily of disjoint sets SS’, which have the same (or close to same) union as the original family.

The problem is NP-Complete and only allows for logarithmic-factor polynomial-time approximation.

3.2. Comparison to Known Problems

3.2.1. Attestation Aggregation (AA) vs Minimum Set Cover (SC)

In the MSC we have the very same input set system (U,S)(U, S), but our target SS’ is a bit different: we want to find a full cover of UU with minimal S|S’|.

With AA, partial (if still maximal) coverage is enough, there’s no constrains on cardinality of SS’, and all elements of SS’ are pairwise disjoint.

3.2.2. Attestation Aggregation (AA) vs Exact Cover (EC)

Again, we start from the same set system (U,S)(U, S), and the EC matches the ideal case of our problem when there exists an optimal solution within a given input SS. So, if input list of attestations form (by itself or as any combination of its subsets) a full partition of UU, the resultant SS’ for both EC and AA coincide.

There is on important difference in AA: it allows for partial covers.

3.2.3. Attestation Aggregation (AA) vs Maximum Coverage (MC)

In the MC problem, we want to find up to kk subsets that cover UU maximally: SkargmaxSTST |S’| \le k \land \mathop{argmax}\limits_{S’} |\bigcup\limits_{T \in S’}T|

Important thing to note is that in its conventional form MC doesn’t require elements of SS’ to be disjoint, which is a problem for our case – as overlapping attestations cannot be aggregated.

So, the important differences of AA include: no constraints on cardinality of SS’, requirement of pairwise disjoint elements in SS’.

MC can still be utilized for our purposes: since there exists an approximation algorithm with α0.6\alpha \approx 0.6 (pretty impressive) we can rely on it to build partial solution by gradually increasing kk (see the Possible Solutions section below).

3.3. Sample problem

Since SSS’ \subseteq S, when implementing the relation in code, the optimal solution will be represented as a list of indexes corresponding to selected items from SS (to avoid unnecessary copying):

I=ii[1..m]SiS I = {i | i \in [1..m] \land S_i \in S^\prime}

To better illustrate the formal definition, here is a sample problem:

 1// We're given list of attestations, in matrix incidence form.
 2// Those attestations are essentially instances of S_i:
 3S = [
 4    [0 0 1 0 1 0], // participating bits are at columns: 3, 5
 5    [0 1 0 0 0 0],
 6    [0 0 0 0 1 0],
 7    [1 0 0 0 0 0],
 8    [0 0 1 0 0 1],
 9]
10
11// Union of all *column indexes* where participation bit is set to 1, 
12// for at least one attestation, represents the universe set, U:
13U = [1, 2, 3, 5] // w/i given list of attestations, there's no bit at column 4
14
15// We want to come up with a subcollection of attestations, 
16// which are disjoint and represent maximum number of the validators 
17// specified in initial list of attestations, S':
18I = [2, 3, 4, 5] // merge rows 2, 3, 4, 5

4. Possible Solutions

So, our problem is closely related to set cover kind of problems to which there exist several possible approaches, none of which enjoys having a deterministically optimal solution.

Several closely related NP/NP-hard problems (and their variants) have been considered:

The Exact Cover and Maximum Coverage problems seem to be the most relevant to what we have at hand, and with some twisting can be utilized to solve AA.

4.1. Set Cover

The Set Cover problem is one of Karp’s 21 NP-Complete problems.

It seems natural to start from the base covering problem because it serves as a foundation for other problems, it has a greedy algorithm solver with ln(n)ln(n) approximation to optimal, and with some effort we can even make that greedy solver run in a linear time!

Def (Minimum Set Cover): MSC(U,S)SMSC(U, S) \to S’

Let UU be a finite set of objects, where U=n|U| = n. Furthermore, let S=S1,,SmSi2US = {S_1, …, S_m | S_i \subseteq 2^U} be a collection of its subsets, where i=1mSi=U\bigcup_{i = 1}^m S_i = U.

Then, Minimum Set Cover (MSC) is the problem of covering UU with a subset SSS^\prime \subseteq S s.t S|S’| is minimal.

Framed like that, this problem doesn’t abstract attestation aggregation completely. While MSC produces a cover of UU, SS’ may contain subsets with overlapping elements from UU, and as such can’t be used as input to aggregation function. So, we need to add an extra constraint – making sure that all elements in SS’ are pairwise disjoint.

Relevant Works:

Def (Minimum Membership Set Cover): MMSC(U,S,k)SMMSC(U, S, k) \to S’

The same set system as in MSC, with additional requirement on how many times each uUu \in U can occur in elements of SS’ i.e. maxuUTSuTk\mathop{max}\limits_{u \in U} |{T \in S’| u \in T}| \le k, for a nonnegative k[1..m]k \in [1..m].

Relevant Works:

Applicability of MMSC:

  • Decision version of the problem (whether SS’ exists) can be used to check for cover.
  • When used as MMSC(U,S,1)MMSC(U, S, 1) i.e. limit number of occurrences of uUu \in U to a single occurrence, we effectively transform problem to Exact Cover variant (which matches our ideal case exactly).

Another variant worth mentioning is Partial Set Cover, where we again are looking for SS’ of minimal cardinality (just as we do in MSC) which covers at least kk elements from universe UU.

Def (Partial Set Cover): PSC(U,S,k)SPSC(U, S, k) \to S’

Consider the same set system as in MSC, with additional parameter k[1..m]k \in [1..m]. Then, Partial Set Cover (PSC) is the problem of finding SSS’ \subseteq S of minimal cardinality, that covers at least kk elements of UU.

Partial Set Cover (PSC) vs Maximum Coverage (MC)

PSC differs from Maximum Coverage problem in a subtle way: in the MC we limit number of subsets Sk|S’| \le k for maximum covered elements in UU; in PSC we limit upper bound on how many items are covered TSTk|\bigcup\limits_{T \in S’}T| \le k with SS’ of minimal cardinality.

Relevant Works:

Applicability of PSC:

  • Again, decision version can be useful, to check the boundaries (gradually increasing kk) of SS’ existence. With k=Uk = |U| we effectively have MSC problem. In order for PSC be really useful, we also need to constrain number of occurrences of uUu \in U within SS’ elements i.e. so that all subsets in SS’ are pairwise disjoint.

4.2. Exact Cover

The Exact Cover problem is one of Karp’s 21 NP-Complete problems.

When exact cover exists within a given set system, the Exact Cover abstracts attestation aggregation perfectly. The problem is that perfectly non-overlapping partitions of UU are not naturally happening in our system (so making them happen can be an attack vector when solving the problem :question:).

Def (Exact Cover): EC(U,S)SEC(U, S) \to S’

Let UU be a finite set of objects, where U=n|U| = n. Furthermore, let S=S1,,SmSi2US = {S_1, …, S_m | S_i \subseteq 2^U} be a collection of its subsets, where i=1mSi=U\bigcup_{i = 1}^m S_i = U.

Then, Exact Cover (EC) is the problem of covering UU with a subset SSS^\prime \subseteq S s.t SiSj=,i,j[1..m],ijS^\prime_i \cap S^\prime_j = \emptyset, \forall i, j \in [1..m], i \ne j.

This NP-Hard problem has a nondeterministic backtrack solver algorithm (Algorithm X by D.Knuth). The Algorithm X is capable of finding all the optimal solutions to the problem.

However, having such an SS that there exists a subcollection of pariwise disjoint subsets that cover UU completely is a rare luck in our system (is it :question:). More than often SS will not contain the solution to EC. In such cases, we still want some partial solution, even if only part of attesters can be collected within a single aggregation.

So, adding constraint similar to MMSC (where we limited number of times uUu \in U can occur in SS’), we need to transform the problem into accepting another parameter k[1..n]k \in [1..n], with the purpose of finding the SS’, where TSTk|\bigcup\limits_{T \in S’}T| \ge k i.e. union of elements of found subsets covers at least kk elements of UU. Then by gradually increasing kk we want it to be as close to U|U| as possible (max k-cover? :question:).

Relevant Works:

Applicability of EC:

  • If solution exists, then Algorithm X (effectively implemented using DLX) can find it. If full solution is impossible, we need to explore possibility of finding partial cover.

4.3. Maximum Coverage

Def (Maximum Coverage): MC(U,S,k)SMC(U, S, k) \to S’

Let UU be a finite set of objects, where U=n|U| = n. Furthermore, let S=S1,,SmSi2US = {S_1, …, S_m | S_i \subseteq 2^U} be a collection of its subsets, where i=1mSi=U\bigcup_{i = 1}^m S_i = U.

Then, Maximum Coverage (MC) is the problem of finding SS,SkS’ \subseteq S, |S’| \le k covering UU with maximum cardinality, that’s

Skarg maxSTST |S’| \le k \land \argmax_{S^\prime}|\bigcup_{T \in S^\prime}T|

Relevant Works:

Applicability of MC:

  • With additional requirement of SiSj,i,j[1..m],ijS_i \cap S_j, \forall i, j \in [1..m], i \ne j (pairwise disjoint sets in SS’) we can have a very useful mechanism to build approximate solutions using greedy approach.

Summary and Further Work

So, possible solutions can be enumerated as following:

  • Exact Cover (EC)
    • Can be used to check for solutions if situations when perfect solution exist are not rare.
    • If combined with Partial Set Cover (PSC) for partial cover solutions, can match Attestation Aggregation perfectly.
  • Maximum Coverage (MC)
    • Greedy algorithm + additional constraint of disjoint sets in SS'
    • Gradual increase of kk (1S1 \to |S|) to obtain maximal cover for a maximum number of available attestations.

Some items to consider:

  • We, need a section with formal runtime analysis – just to make sure upper bounds of what can be done is known.
  • Like for many NP-Hard problems, one can get polynomial time algorithm, if the problem size is fixed (parameterized complexity) . Say, for SS’ subsets, if we only look into solutions of size 4, we are dealing with O(S4)O(|S|^4) items, even if we enumerate every possible 4-tuples. We might consider some optimization based on this.
  • Since, we’re dealing with NP-Hard problems, there’s little chance we will not be bothered by runtime complexity. Another possible attack vector is to rely on parallel computing.
  • Consider combining algorithms or running individual algorithms in rounds.
  • There are number of highly effective aggregation heuristics that rely on how data is transmitted i.e. instead of concentrating on covering arbitrary set systems, we try to come up with heuristic that will result in a preferable attestations propagating the network ( see Heuristically Partitioned Attestation Aggregation for a very interesting approach).