I thought I'd chime in and point out some research results that might help.
might find interesting.
The problem being tackled here is very similar to "set reconciliation," where
peer A thinks that the set of transactions that should be in the block is S_A,
and peer B has actually included set S_B, and S_A and S_B are expected
to not differ much. Ideally, one would like the communication complexity
between A and B to be O(delta), not O(S_B) as it is right now. And ideally,
one would like B to send a single message to A, and for A to figure out the
difference between the two sets, without any lengthy back and forth
communication. In essence, I would like to give you some magical packet
that is pretty small and communicates just the delta between what you and
I know.
This paper from Cornell describes a scheme for achieving this:
Yaron Minsky, Ari Trachtenberg, Richard Zippel: Set reconciliation with nearly optimal communication complexity. IEEE Transactions on Information Theory 49(9): 2213-2218 (2003)
Those of you looking for a TL;DR should read the intro and then skip to
page 8 for the example. The underlying trick is very cool, comes from the
peer-to-peer/gossip literature, and it is underused. It'd be really cool if it
could be applied to this problem to reduce the size of the packets.
This approach has three benefits over the Bloom filter approach (if I
understand the Bloom filter idea correctly):
(1) Bloom filters require packets that are still O(S_A),
(2) Bloom filters are probabilistic, so require extra complications
when there is a hash collision. In the worst case, A might get confused
about which transaction B actually included, which would lead to a
fork. (I am not sure if I followed the Bloom filter idea fully -- this may
not happen with the proposal, but it's a possibility with a naive Bloom
filter implementation)
(3) Bloom filters are interactive, so when A detects that B has included
some transactions that A does not know about, it has to send a message
to figure out what those transactions are.
Set reconciliation is O(delta), non-probabilistic, and non-interactive. The
naive version requires that one have some idea of the size of the delta,
but I think the paper has some discussion of how to handle the delta
estimate.
I have not gone through the full exercise of actually applying this trick to
the Bitcoin p2p protocol yet, but wanted to draw your attention to it.
If someone is interested in applying this stuff to Bitcoin, I'd be happy
to communicate further off list.
- egs