From: Eric Voskuil <eric@voskuil.org>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
Date: Sat, 20 Jul 2024 13:29:53 -0700 (PDT) [thread overview]
Message-ID: <926fdd12-4e50-433d-bd62-9cc41c7b22a0n@googlegroups.com> (raw)
In-Reply-To: <ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n@googlegroups.com>
[-- Attachment #1.1: Type: text/plain, Size: 18647 bytes --]
Hi Antoine R,
>> While at some level the block message buffer would generally be
referenced by one or more C pointers, the difference between a valid
coinbase input (i.e. with a "null point") and any other input, is not
nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 byes
followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was
better (or easier) to serialize a first block tx (coinbase) with an input
containing an unusable script and pointing to an invalid [tx:index] tuple
(input point) as opposed to just not having any input. That invalid input
point is called a "null point", and of course cannot be pointed to by a
"null pointer". The coinbase must be identified by comparing those 36 bytes
to the well-known null point value (and if this does not match the Merkle
hash cannot have been type64 malleated).
> Good for the clarification here, I had in mind the core's `CheckBlock`
path where the first block transaction pointer is dereferenced to verify if
the transaction is a coinbase (i.e a "null point" where the prevout is
null). Zooming out and back to my remark, I think this is correct that
adding a new 64 byte size check on all block transactions to detect block
hash invalidity could be a low memory overhead (implementation dependant),
rather than making that 64 byte check alone on the coinbase transaction as
in my understanding you're proposing.
I'm not sure what you mean by stating that a new consensus rule, "could be
a low memory overhead". Checking all tx sizes is far more overhead than
validating the coinbase for a null point. As AntoineP agreed, it cannot be
done earlier, and I have shown that it is *significantly* more
computationally intensive. It makes the determination much more costly and
in all other cases by adding an additional check that serves no purpose.
>>> The second one is the bip141 wtxid commitment in one of the coinbase
transaction `scriptpubkey` output, which is itself covered by a txid in the
merkle tree.
>> While symmetry seems to imply that the witness commitment would be
malleable, just as the txs commitment, this is not the case. If the tx
commitment is correct it is computationally infeasible for the witness
commitment to be malleated, as the witness commitment incorporates each
full tx (with witness, sentinel, and marker). As such the block identifier,
which relies only on the header and tx commitment, is a sufficient
identifier. Yet it remains necessary to validate the witness commitment to
ensure that the correct witness data has been provided in the block message.
>>
>> The second type of malleability, in addition to type64, is what we call
type32. This is the consequence of duplicated trailing sets of txs (and
therefore tx hashes) in a block message. This is applicable to some but not
all blocks, as a function of the number of txs contained.
> To precise more your statement in describing source of malleability. The
witness stack can be malleated altering the wtxid and yet still valid. I
think you can still have the case where you're feeded a block header with a
merkle root commitment deserializing to a valid coinbase transaction with
an invalid witness commitment. This is the case of a "block message with
valid header but malleatead committed valid tx data". Validation of the
witness commitment to ensure the correct witness data has been provided in
the block message is indeed necessary.
I think you misunderstood me. Of course the witness commitment must be
validated (as I said, "Yet it remains necessary to validate the witness
commitment..."), as otherwise the witnesses within a block can be anything
without affecting the block hash. And of course the witness commitment is
computed in the same manner as the tx commitment and is therefore subject
to the same malleations. However, because the coinbase tx is committed to
the block hash, there is no need to guard the witness commitment for
malleation. And to my knowledge nobody has proposed doing so.
>>> I think I mostly agree with the identity issue as laid out so far,
there is one caveat to add if you're considering identity caching as the
problem solved. A validation node might have to consider differently block
messages processed if they connect on the longest most PoW valid chain for
which all blocks have been validated. Or alternatively if they have to be
added on a candidate longest most PoW valid chain.
>> Certainly an important consideration. We store both types. Once there is
a stronger candidate header chain we store the headers and proceed to
obtaining the blocks (if we don't already have them). The blocks are stored
in the same table; the confirmed vs. candidate indexes simply point to them
as applicable. It is feasible (and has happened twice) for two blocks to
share the very same coinbase tx, even with either/all bip30/34/90 active
(and setting aside future issues here for the sake of simplicity). This
remains only because two competing branches can have blocks at the same
height, and bip34 requires only height in the coinbase input script. This
therefore implies the same transaction but distinct blocks. It is however
infeasible for one block to exist in multiple distinct chains. In order for
this to happen two blocks at the same height must have the same coinbase
(ok), and also the same parent (ok). But this then means that they either
(1) have distinct identity due to another header property deviation, or (2)
are the same block with the same parent and are therefore in just one
chain. So I don't see an actual caveat. I'm not certain if this is the
ambiguity that you were referring to. If not please feel free to clarify.
> If you assume no network partition and the no blocks more than 2h in the
future consensus rule, I cannot see how one block with no header property
deviation can exist in multiple distinct chains.
It cannot, that was my point: "(1) have distinct identity due to another
header property deviation, or (2) are the same block..."
> The ambiguity I was referring was about a different angle, if the design
goal of introducing a 64 byte size check is to "it was about being able to
cache the hash of a (non-malleated) invalid block as permanently invalid to
avoid re-downloading and re-validating it", in my thinking we shall
consider the whole block headers caching strategy and be sure we don't get
situations where an attacker can attach a chain of low-pow block headers
with malleated committed valid tx data yielding a block invalidity at the
end, provoking as a side-effect a network-wide data download blowup. So I
think any implementation of the validation of a block validity, of which
identity is a sub-problem, should be strictly ordered by adequate
proof-of-work checks.
This was already the presumption.
>> We don't do this and I don't see how it would be relevant. If a peer
provides any invalid message or otherwise violates the protocol it is
simply dropped.
>>
>> The "problematic" that I'm referring to is the reliance on the block
hash as a message identifier, because it does not identify the message and
cannot be useful in an effectively unlimited number of zero-cost cases.
> Historically, it was to isolate transaction-relay from block-relay to
optimistically harden in face of network partition, as this is easy to
infer transaction-relay topology with a lot of heuristics.
I'm not seeing the connection here. Are you suggesting that tx and block
hashes may collide with each other? Or that that a block message may be
confused with a transaction message?
> I think this is correct that block hash message cannot be relied on as it
cannot be useful in an unlimited number of zero-cost cases, as I was
pointing that bitcoin core partially mitigate that with discouraging
connections to block-relay peers servicing block messages
(`MaybePunishNodeForBlocks`).
This does not mitigate the issue. It's essentially dead code. It's exactly
like saying, "there's an arbitrary number of holes in the bucket, but we
can plug a subset of those holes." Infinite minus any number is still
infinite.
> I believe somehow the bottleneck we're circling around is computationally
definining what are the "usable" identifiers for block messages. The most
straightforward answer to this question is the full block in one single
peer message, at least in my perspective.
I don't follow this statement. The term "usable" was specifically
addressing the proposal - that a header hash must uniquely identify a block
(a header and committed set of txs) as valid or otherwise. As I have
pointed out, this will still not be the case if 64 byte blocks are
invalidated. It is also not the case that detection of type64 malleated
blocks can be made more performant if 64 byte txs are globally invalid. In
fact the opposite is true, it becomes more costly (and complex) and is
therefore just dead code.
> Reality since headers first synchronization (`getheaders`), block
validation has been dissociated in steps for performance reasons, among
others.
Headers first only defers malleation checks. The same checks are necessary
whether you perform blocks first or headers first sync (we support both
protocol levels). The only difference is that for headers first, a stored
header might later become invalidated. However, this is the case with and
without the possibility of malleation.
>> Again, this has no relation to tx hashes/identifiers. Libbitcoin has a
tx pool, we just don't store them in RAM (memory).
>>
>> I don't follow this. An invalid 64 byte tx consensus rule would
definitely not make it harder to exploit block message invalidity. In fact
it would just slow down validation by adding a redundant rule. Furthermore,
as I have detailed in a previous message, caching invalidity does
absolutely nothing to increase protection. In fact it makes the situation
materially worse.
> Just to recall, in my understanding the proposal we're discussing is
about outlawing 64 bytes size transactions at the consensus-level to
minimize denial-of-service vectors during block validation. I think we're
talking about each other because the mempool already introduce a layer of
caching in bitcoin core, of which the result are re-used at block
validation, such as signature verification results. I'm not sure we can
fully waive apart performance considerations, though I agree implementation
architecture subsystems like mempool should only be a sideline
considerations.
I have not suggested that anything is waived or ignored here. I'm stating
that there is no "mempool" performance benefit whatsoever to invalidating
64 byte txs. Mempool caching could only rely on tx identifiers, not block
identifiers. Tx identifiers are not at issue.
>> No, this is not the case. As I detailed in my previous message, there is
no possible scenario where invalidation caching does anything but make the
situation materially worse.
> I think this can be correct that invalidation caching make the situation
materially worse, or is denial-of-service neutral, as I believe a full node
is only trading space for time resources in matters of block messages
validation. I still believe such analysis, as detailed in your previous
message, would benefit to be more detailed.
I don't know how to add any more detail than I already have. There are
three relevant considerations:
(1) block hashes will not become unique identifiers for block messages.
(2) the earliest point at which type64 malleation can be detected will not
be reduced.
(3) the necessary cost of type64 malleated determination will not be
reduced.
(4) the additional consensus rule will increase validation cost and code
complexity.
(5) invalid blocks can still be produced at no cost that require full
double tx hashing/Merkle root computations.
Which of these statements are not evident at this point?
>> On the other hand, just dealing with parse failure on the spot by
introducing a leading pattern in the stream just inflates the size of p2p
messages, and the transaction-relay bandwidth cost.
>>
>> I think you misunderstood me. I am suggesting no change to
serialization. I can see how it might be unclear, but I said, "nothing
precludes incorporating a requirement for a necessary leading pattern in
the stream." I meant that the parser can simply incorporate the
*requirement* that the byte stream starts with a null input point. That
identifies the malleation or invalidity without a single hash operation and
while only reading a handful of bytes. No change to any messages.
> Indeed, this is clearer with the re-explanation above about what you
meant by the "null point".
Ok
> In my understanding, you're suggesting the following algorithm:
> - receive transaction p2p messages
> - deserialize transaction p2p messages
> - if the transaction is a coinbase candidate, verify null input point
> - if null input point pattern invalid, reject the transaction
No, no part of this thread has any bearing on p2p transaction messages -
nor are coinbase transactions relayed as transaction messages. You could
restate it as:
- receive block p2p messages
- if the first tx's first input does not have a null point, reject the block
> If I'm understanding correctly, the last rule has for effect to
constraint the transaction space that can be used to brute-force and mount
a Merkle root forgery with a 64-byte coinbase transaction.
>
> As described in the 3.1.1 of the paper:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf
The above approach makes this malleation computationally infeasible.
>> I'm referring to DoS mitigation (the only relevant security
consideration here). I'm pointing out that invalidity caching is pointless
in all cases, and in this case is the most pointless as type64 malleation
is the cheapest of all invalidity to detect. I would prefer that all bogus
blocks sent to my node are of this type. The worst types of invalidity
detection have no mitigation and from a security standpoint are
counterproductive to cache. I'm describing what overall is actually not a
tradeoff. It's all negative and no positive.
> I think we're both discussing the same issue about DoS mitigation for
sure. Again, I think that saying the "invalidity caching" is pointless in
all cases cannot be fully grounded as a statement without precising (a)
what is the internal cache(s) layout of the full node processing block
messages and (b) the sha256 mining resources available during N difficulty
period and if any miner engage in self-fish mining like strategy.
It has nothing to do with internal cache layout and nothing to do with
mining resources. Not having a cache is clearly more efficient than having
a cache that provides no advantage, regardless of how the cache is laid
out. There is no cost to forcing a node to perform far more block
validation computations than can be precluded by invalidity caching. The
caching simply increases the overall computational cost (as would another
redundant rule to try and make it more efficient). Discarding invalid
blocks after the minimal amount of work is the most efficient resolution.
What one does with the peer at that point is orthogonal (e.g. drop, ban).
> About (a), I'll maintain my point I think it's a classic time-space
trade-off to ponder in function of the internal cache layouts.
An attacker can throw a nearly infinite number of distinct invalid blocks
at your node (and all will connect to the chain and show proper PoW). As
such you will encounter zero cache hits and therefore nothing but overhead
from the cache. Please explain to me in detail how "cache layout" is going
to make any difference at all.
> About (b) I think we''ll be back to the headers synchronization strategy
as implemented by a full node to discuss if they're exploitable asymmetries
for self-fish mining like strategies.
I don't see this as a related/relevant topic. There are zero mining
resources required to overflow the invalidity cache. Just as Core recently
published regarding overflowing to its "ban" store, resulting in process
termination, this then introduces another attack vector that must be
mitigated.
> If you can give a pseudo-code example of the "null point" validation
implementation in libbitcoin code (?) I think this can make the
conversation more concrete on the caching aspect.
pseudo-code , not from libbitcoin...
```
bool malleated64(block)
{
segregated = ((block[80 + 4] == 0) and (block[80 + 4 + 1] == 1))
return block[segregated ? 86 : 85] !=
0xffffffff0000000000000000000000000000000000000000000000000000000000000000
}
```
Obviously there is no error handling (e.g. block too small, too many
inputs, etc.) but that is not relevant to the particular question. The
block.header is fixed size, always 80 bytes. The tx.version is also fixed,
always 4 bytes. A following 0 implies a segregated witness (otherwise it's
the input count), assuming there is a following 1. The first and only input
for the coinbase tx, which must be the first block tx, follows. If it does
not match
0xffffffff0000000000000000000000000000000000000000000000000000000000000000
then the block is invalid. If it does match, it is computationally
infeasible that the merkle root is type64 malleated. That's it, absolutely
trivial and with no prerequisites. The only thing that even makes it
interesting is the segwit bifurcation.
>> Rust has its own set of problems. No need to get into a language Jihad
here. My point was to clarify that the particular question was not about a
C (or C++) null pointer value, either on the surface or underneath an
abstraction.
> Thanks for the additional comments on libbitcoin usage of dependencies,
yes I don't think there is a need to get into a language jihad here. It's
just like all languages have their memory model (stack, dynamic alloc,
smart pointers, etc) and when you're talking about performance it's useful
to have their minds, imho.
Sure, but no language difference that I'm aware of could have any bearing
on this particular question.
Best,
Eric
--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/926fdd12-4e50-433d-bd62-9cc41c7b22a0n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 19566 bytes --]
next prev parent reply other threads:[~2024-07-20 20:51 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-24 18:10 [bitcoindev] Great Consensus Cleanup Revival 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-03-26 19:11 ` [bitcoindev] " Antoine Riard
2024-03-27 10:35 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-03-27 18:57 ` Antoine Riard
2024-04-18 0:46 ` Mark F
2024-04-18 10:04 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-04-25 6:08 ` Antoine Riard
2024-04-30 22:20 ` Mark F
2024-05-06 1:10 ` Antoine Riard
2024-07-20 21:39 ` Murad Ali
2024-06-17 22:15 ` Eric Voskuil
2024-06-18 8:13 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-18 13:02 ` Eric Voskuil
2024-06-21 13:09 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-24 0:35 ` Eric Voskuil
2024-06-27 9:35 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-28 17:14 ` Eric Voskuil
2024-06-29 1:06 ` Antoine Riard
2024-06-29 1:31 ` Eric Voskuil
2024-06-29 1:53 ` Antoine Riard
2024-06-29 20:29 ` Eric Voskuil
2024-06-29 20:40 ` Eric Voskuil
2024-07-02 2:36 ` Antoine Riard
2024-07-03 1:07 ` Larry Ruane
2024-07-03 23:29 ` Eric Voskuil
2024-07-04 13:20 ` Antoine Riard
2024-07-04 14:45 ` Eric Voskuil
2024-07-18 17:39 ` Antoine Riard
2024-07-20 20:29 ` Eric Voskuil [this message]
2024-11-28 5:18 ` Antoine Riard
2024-07-03 1:13 ` Eric Voskuil
2024-07-02 10:23 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-07-02 15:57 ` Eric Voskuil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=926fdd12-4e50-433d-bd62-9cc41c7b22a0n@googlegroups.com \
--to=eric@voskuil.org \
--cc=bitcoindev@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox