Recently proponents of transaction "filtering" have started sybil attacking
Libre Relay nodes by running nodes with their "garbageman" fork¹. This fork
falsely advertise the NODE_LIBRE_RELAY service bit, silently discards
transactions that would be relayed by real Libre Relay nodes, and does not
provide any. Additionally, they have made clear that they intend to ramp up
this sybil attack with the aim of preventing people people from getting
transactions that they disagree with mined:

	The costs will increase even more once Libre Relay’s DoS attacks on
	bitcoin are countered by enough defensive nodes.
	-Chris Guida https://delvingbitcoin.org/t/addressing-community-concerns-and-objections-regarding-my-recent-proposal-to-relax-bitcoin-cores-standardness-limits-on-op-return-outputs/1697/4

They have also put effort into making the attack more than a simple proof of
concept, e.g. by adding code that attempts to make it more difficult to detect
attacking nodes, by keeping track of transactions received from peers, and then
replying to inv messages with those transactions even when they were
discarded².

With this attack in mind, I thought this would be a good opportunity to review
the math on how effective this type of attack is, as well as some of the
mitigations that could be implement to defeat sybil attacks on transaction
relaying. In particular, I'll present a defense to sybil attacks that is
sufficiently powerful that it may even negate the need for preferential peering
techniques like the NODE_LIBRE_RELAY bit. 

Note that I don't deserve credit for any of these ideas. I'm just putting down
in writing some ideas from Gregory Maxwell and others.


# The Effectiveness of Sybil Attacks on Transaction Relaying 

Non-listening nodes make a certain number of outgoing, transaction relaying,
connections to listening nodes. In the case of Bitcoin Core, 8 outgoing
transaction relaying nodes; in the case of Libre Relay, an additional 4
outgoing connections to other Libre Relay nodes to relay transactions relevant
to them.

For a sybil attack to succeed against a non-listing node, every one of the N
outgoing connections must be either a sybil attacking node, or a listening node
that itself has been defeated by sybil attack. Additionally, Bitcoin Core makes
outgoing IPv4 and IPv6 connections to a diversity of address space, so the
sybil attacking nodes need to themselves be running on a diverse set of IP
addresses (this is not that difficult to achieve with VPS providers these
days). Thus if the sybil attacking nodes are a ratio of q to all nodes, the
probability of the attack succeeding is q^N.

Against Libre Relay, N=4, this means that the attacker needs to be running ~84%
of all NODE_LIBRE_RELAY advertising nodes to have an attack success probability
of ~50%. Based on information from my Bitcoin seed node, there appear to be
about 15 Libre Relay nodes, so for a 50% attack success probability the
attackers would need to run about 85 attack nodes. If N was increased to 8, the
attackers would need about 172 nodes to achieve the same success rate.

Against *listening* nodes a different type of attack is necessary. The reason
for this is that defenders can easily defeat sybil attacks against listening
nodes by simply connecting to ~all listening nodes at once to ensure that
transaction propagation succeeds. Of course, the attacker can in turn do things
like attempt to exhaust connection slots of Libre Relay nodes, or simply DoS
attack them with packet floods. But those are different types of attack than
the sybil attack we are discussing here.


# Prior Art: Defeating Block Propagation Sybil Attack

Bitcoin Core already includes a defense against sybil attack for block
propagation: the feeler node system. Basically, every ~2 minutes an outgoing
connection is made to a gossiped address to check if a connection can be made;
successful connections are recorded in a table of "tried" addresses. If no new
blocks have been received for 30 minutes, these tried addresses are then used
every 10 minutes to try to find a peer that does know about a new block. 

Since this process goes on indefinitely, so long as outgoing connections are
themselves not censored (e.g. by the ISP), the node should eventually find a
non-sybil attacking node and learn about the true most-work chain. Even in
normal operation periods of >30minutes between blocks are fairly common, so
this defense will (eventually) work even if a forked chain exists with some
hash power extending it.

This approach is relatively straightforward for block propagation, as there is
a clear metric: the most-work chain. Peers that aren't giving you the most-work
chain can be ignored, and new peers found.  Proof-of-work's inherently
self-validating property means that doing this is cheap and straight forward.


# Directionality

A subtlety to the information censorship sybil attack is there are actually two
different simultaneous attacks: the attack on preventing you from learning
about new information, and the attack on preventing you from distribute new
information to others.

With block propagation, most nodes most directly care about the first class of
attack: they want to learn about the most-work chain, and do not want that
information censored from them.

For miners, in addition to knowing what the most-work chain is, they
(typically³) have a strong incentive to get their new blocks to all nodes as
quickly as possible. Also, all nodes have at least some incentive to do this as
Bitcoin will not function properly if miners are getting censored.

These attacks are not the same! The most-work-chain metric is only directly
detecting and preventing the first class of attack. It only prevents the second
attack indirectly, by making it easier for honest nodes to learn about new
blocks and attempt to themselves propagate that information further.


# Most Fees Metric

For transaction relaying, the moral equivalent to the most-work chain metric
are metrics based on the amount of new transaction fees that peers are
advertising to you. Unfortunately this isn't as straightforward to implement as
the most-work chain metric for a few reasons:

1) Resolution: differences in chain work are very clear, with even a single
   additional block being a very significant difference. For transaction relaying,
   we'd like to be able to successfully relay transaction types that only add a
   small % to total fees.
2) Bandwidth: a chain of 80 byte headers is sufficient to prove most-work;
   transactions are much larger.
3) Double-spends: mempools are not a consensus. Your peers may have
   transactions that conflict with your transactions, yet in ways that don't
   constitute a worthwhile RBF replacement (e.g. two different transactions
   with the same fees and fee-rate).

For example, one straight-forward approach would be to simply keep track of a
decaying average of new fees/sec each peer had advertised to you prior to you
advertising the transaction to them. Periodically, you could drop the peer with
the lowest new fees/sec ranking, and then connect to a new peer.

However, it's not clear that this approach has sufficient resolution to
actually detect censorship of relatively uncommon transaction types.
Additionally, since transaction broadcasting is a one-shot event - we don't
have a mempool synchronization mechanism - this approach may not work well if
transaction demand is bursty.


# Most-Fees Next (Dobule) Block Mempool

With the upcoming cluster mempool functionality that is expected to be added to
Core in the near future, transactions will be stored in memory in clusters
ordered by fees: essentially the order in which optimal blocks would be
created. This will make it computationally cheap to determine what the optimal
next block (or blocks) will be by simply iterating through transactions in
order, and stopping when N weight worth of transactions have been found.

Thus nodes can cheaply compute the total fees in the top one or two blocks
worth of transactions they currently have in their mempool, and advertise this
fact to their peers. Finally, to prevent lying, we can add a mechanism for a
peer to get a copy of all these transactions to ensure that they're not missing
out on anything paying enough fees to get mined soon.

While beyond the scope of this summary, there are many set-reconciliation
techniques available to do this in a bandwidth efficient manner. Basically,
through the existing transaction relay mechanisms we can expect mempools to be
relatively consistent between nodes. Thus, to get all transactions that your
peer has for the next block or two that you do not, you just need to transfer
the deltas between their next-block(s) mempool and yours.

Concretely, suppose we do this with the next two blocks worth of transactions.
At worst, each node would need to periodically create a maximum 8MB serialized
"double-block", using up to 8MB of ram. Secondly, to apply this to all outgoing
connections, you'd need to periodically use a set-reconciliation protocol to
download the differences between each of your outgoing peers' double-blocks,
and attempt to add any newly discovered transactions to your mempool. At worst
for 8 peers this would be 64MB of useless data to download, assuming every
single transaction was a conflicting double-spend. Not great. But not that bad.

As with the average fees idea, periodically you would drop the peer advertising
the lowest double-block of fees, and then connect to a new peer to see if
they're better.

Now consider what happens if you are sybil attacked. Due to RBF, with
synchronous mempools across different nodes with the same standardness policies
will have very similar transaction sets; even without active synchronization
long-running mempools across different nodes are already very similar in terms
of total fees. Thus even a small difference in transaction relay policy will
show up as missing transactions. This difference will translate into the sybil
attacking node(s) getting dropped, and honest nodes with policy compatible with
yours eventually being found.


## Peers With More Liberal Relay Policy

If you apply set reconciliation to a peer with a *more* liberal relay policy
than you, they'll have transactions that you will not accept. For example,
imagine the case of a peer that now accepts a new version number.

One way to deal with this could be to just drop peers that give you
transactions that you consider non-standard. So long as reconciliation is only
applied to a subset of all transaction relaying peers, this is fine. Indeed,
even if this is applied to all transaction relaying peers, Bitcoin Core already
connects to additional peers in blocks-only mode. So you'll still get send and
receive blocks and maintain consensus.


## Privacy

Tracking what transactions are in mempools is a potential way for attackers to
trace transactions back to their origin. Provided that set-reconciliation is
only a secondary transaction relay mechanism, with sufficient time delays, this
should not impact privacy as under normal operation transactions will have
already propagated widely making the set reconciliation data non-sensitive.


# Manual Peering With Known-Honest Friendly Nodes

More of a social solution than a technical solution, we should encourage people
to manually peer with other nodes they have a personal relationship with.  This
is a powerful technique against sybil attacks for the simple reason that
person-to-person relationships can evaluate honesty in much more powerful ways
than any code could possibly do so.

At the moment, actually doing this is inconvenient. Ideally we would have a
mechanism where node operators could get a simple pubkey@address connection
string from their node to tell to their friends, and equally, import that same
connection string into their bitcoin.conf. This mechanism should use some kind
of node identity to defeat MITM attacks, and also ensure that connection limits
are bypassed for friendly nodes. The existing addnode mechanism doesn't quite
achieve this. Notably, without a node identity mechanism, there's no way for
someone with a static IP address to whitelist a friend's node with a non-static
IP address.


# Footnotes

1) Chris Guida's "garbageman" branch: https://github.com/chrisguida/bitcoin/tree/garbageman,
   first presented at the btc++ mempool edition (2025) hackathon
2) https://github.com/chrisguida/bitcoin/commit/e9a921c045d64828a5f0de58d8f2706848c48fd2?s=09
3) https://petertodd.org/2016/block-publication-incentives-for-miners

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/aDWfDI03I-Rakopb%40petertodd.org.