From mboxrd@z Thu Jan 1 00:00:00 1970 Delivery-date: Tue, 02 Jul 2024 18:31:02 -0700 Received: from mail-yb1-f187.google.com ([209.85.219.187]) by mail.fairlystable.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sOopo-0007je-TR for bitcoindev@gnusha.org; Tue, 02 Jul 2024 18:31:02 -0700 Received: by mail-yb1-f187.google.com with SMTP id 3f1490d57ef6-e03a92302d1sf817548276.1 for ; Tue, 02 Jul 2024 18:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:sender:from :to:cc:subject:date:message-id:reply-to; bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=; b=bHnBaYi6CdztbUdGJas0Dz8G7k4UHb/ubQAPZ8zy714swVXcwb56zA4HGvSxzZkfQc 17Rz1vAzMQLUGhx9IpD5rpDxNXPKNUKGr1srx1Hf6WclGal9hKhfcbLIBoLGfS3V+BG9 /+OuyDbxZPbCmhATNdgbli2m+reHAUhrQarkdZbh+JiOllC52PmZJ841JGzh9lJ+XfIx jw0qT66q4D0l741RqBXkZ28/MxQl0PRNpx9fUBa6ctIYBeSZH6RP9p0/FEf1erf6XLJm 7O3IEIq1+pxr2oBbOm0GqQUA9gKJL7mBWCIiv5TGQvVyoy1cx7dBKDSE34YUHLk4jyq4 tTPw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups-com.20230601.gappssmtp.com; s=20230601; t=1719970254; x=1720575054; darn=gnusha.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:from:to:cc :subject:date:message-id:reply-to; bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=; b=irys9hm5QLYCa4xVccz5fdg/x+BHkEA8mUkN7kbYkHO5gQnDnL1qc5LzqXJNlyqMrS 6IWoFhZH2IHlVIt66iAcwtXpeDNxUfkwOt6BnMYf4Ce5SPTuPxI6KiIFHElsJi5IeLI9 KHWdlDkqwKeoAPpGQM75HDwfM0VRu57LcVEJsvX6n2BnTuHlggesJJX0sGSW8QxsNxYp 9jwxJUV67dXCFBEcJc7mPtNw979vcO064CGa0xI2Jaxwt1m7YTHAECaURoWp7nO8yGSl 2fku+QxmIdT7uRPG2EvLPeg1YO1fv+WDuow29GyE/P3m1bSlfsvPgLv0CWa7TCft6WfF Fz5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719970254; x=1720575054; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:x-original-sender:mime-version :subject:references:in-reply-to:message-id:to:from:date:x-beenthere :x-gm-message-state:sender:from:to:cc:subject:date:message-id :reply-to; bh=QGYGGRgKq5lyVuAQXa+uDb9sO0Il0pjFbXgFDk6nWF0=; b=a+8m231YYiPYIzYDmP3U1NNl2pj5dF8pHYI3A9ltIvOH3vU9BPtoC5QeOybSYdcDDz LuLzuLV9iH3rPClhdl/rCc+bFo5hRpR4HX8bB5ge3MU5fxJLfcK8h4LICFeQxn67rqlT dXgIB26wPWB7PIdaCOhOe563YnEimGVxb2JlYqjbhPi1v26GS2B//wA8YEzCIISJi07e tdCHakMLTUIvX0BAFByh5m4cHWKuE2rOyVRa9DHByb7jDG/4uHXtR2n6z3yP0D/pipTg t+CEL9pzlAp0aFLXps8QMeZf4vtism1590xq63pKXUqTmF7qpm3IUrN7SqgU7LP68JwY 0BPg== Sender: bitcoindev@googlegroups.com X-Forwarded-Encrypted: i=1; AJvYcCV8vfW22ZVamQzVnzWU7X2sp5jLzcLfzYDTqz1nhS6mxpkF2kE62osiDjxVkGwFkRsBcvMjRchLmuzPGAvG0YnQqnL4PlY= X-Gm-Message-State: AOJu0YyGauLvl55DjJ1R9Fz3gX92rbX+GnYT0+twUxeVfGsMDGaSkIaZ 0dMGH21VByzxHsYrzhrIrkkCUEKOJn5dR3vi0RudYr4scE34TKER X-Google-Smtp-Source: AGHT+IH4ZqWsLpki8Sdiv4KRJM30Qgv/PKh4KkXmPZUsFJmJ4MJMTFA+HO6ZyRwsqAnGDidr11+HtQ== X-Received: by 2002:a25:df16:0:b0:e03:229d:69f5 with SMTP id 3f1490d57ef6-e036ead1e52mr11151803276.3.1719970254432; Tue, 02 Jul 2024 18:30:54 -0700 (PDT) X-BeenThere: bitcoindev@googlegroups.com Received: by 2002:a05:6902:1007:b0:e03:6457:383f with SMTP id 3f1490d57ef6-e0364573bd8ls6892124276.1.-pod-prod-09-us; Tue, 02 Jul 2024 18:30:53 -0700 (PDT) X-Received: by 2002:a05:6902:2b8a:b0:e03:5a51:382f with SMTP id 3f1490d57ef6-e036ec429bcmr980548276.8.1719970253037; Tue, 02 Jul 2024 18:30:53 -0700 (PDT) Received: by 2002:a05:690c:4289:b0:63b:c3b0:e1c with SMTP id 00721157ae682-6514011671ams7b3; Tue, 2 Jul 2024 18:13:22 -0700 (PDT) X-Received: by 2002:a05:690c:fc8:b0:64b:16af:d264 with SMTP id 00721157ae682-64c776d2fd5mr283127b3.7.1719969201017; Tue, 02 Jul 2024 18:13:21 -0700 (PDT) Date: Tue, 2 Jul 2024 18:13:20 -0700 (PDT) From: Eric Voskuil To: Bitcoin Development Mailing List Message-Id: In-Reply-To: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> References: <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com> <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com> <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com> <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com> <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com> <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com> <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com> <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_35620_344008102.1719969200791" X-Original-Sender: eric@voskuil.org Precedence: list Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com List-ID: X-Google-Group-Id: 786775582512 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , X-Spam-Score: -0.7 (/) ------=_Part_35620_344008102.1719969200791 Content-Type: multipart/alternative; boundary="----=_Part_35621_885950809.1719969200791" ------=_Part_35621_885950809.1719969200791 Content-Type: text/plain; charset="UTF-8" Hi Antoine R, >> Ok, thanks for clarifying. I'm still not making the connection to "checking a non-null [C] pointer" but that's prob on me. > A C pointer, which is a language idiome assigning to a memory address A the value o memory address B can be 0 (or NULL a standard macro defined in stddef.h). > Here a snippet example of linked list code checking the pointer (`*begin_list`) is non null before the comparison operation to find the target element list. > ... > While both libbitcoin and bitcoin core are both written in c++, you still have underlying pointer derefencing playing out to access the coinbase transaction, and all underlying implications in terms of memory management. I'm familiar with pointers ;). While at some level the block message buffer would generally be referenced by one or more C pointers, the difference between a valid coinbase input (i.e. with a "null point") and any other input, is not nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 byes followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was better (or easier) to serialize a first block tx (coinbase) with an input containing an unusable script and pointing to an invalid [tx:index] tuple (input point) as opposed to just not having any input. That invalid input point is called a "null point", and of course cannot be pointed to by a "null pointer". The coinbase must be identified by comparing those 36 bytes to the well-known null point value (and if this does not match the Merkle hash cannot have been type64 malleated). > I think it's interesting to point out the two types of malleation that a bitcoin consensus validation logic should respect w.r.t block validity checks. Like you said the first one on the merkle root committed in the headers's `hashMerkleRoot` due to the lack of domain separation between leaf and merkle tree nodes. We call this type64 malleability (or malleation where it is not only possible but occurs). > The second one is the bip141 wtxid commitment in one of the coinbase transaction `scriptpubkey` output, which is itself covered by a txid in the merkle tree. While symmetry seems to imply that the witness commitment would be malleable, just as the txs commitment, this is not the case. If the tx commitment is correct it is computationally infeasible for the witness commitment to be malleated, as the witness commitment incorporates each full tx (with witness, sentinel, and marker). As such the block identifier, which relies only on the header and tx commitment, is a sufficient identifier. Yet it remains necessary to validate the witness commitment to ensure that the correct witness data has been provided in the block message. The second type of malleability, in addition to type64, is what we call type32. This is the consequence of duplicated trailing sets of txs (and therefore tx hashes) in a block message. This is applicable to some but not all blocks, as a function of the number of txs contained. >> Caching identity in the case of invalidity is more interesting question than it might seem. >> Background: A fully-validated block has established identity in its block hash. However an invalid block message may include the same block header, producing the same hash, but with any kind of nonsense following the header. The purpose of the transaction and witness commitments is of course to establish this identity, so these two checks are therefore necessary even under checkpoint/milestone. And then of course the two Merkle tree issues complicate the tx commitment (the integrity of the witness commitment is assured by that of the tx commitment). >> >> So what does it mean to speak of a block hash derived from: >> (1) a block message with an unparseable header? >> (2) a block message with parseable but invalid header? >> (3) a block message with valid header but unparseable tx data? >> (4) a block message with valid header but parseable invalid uncommitted tx data? >> (5) a block message with valid header but parseable invalid malleated committed tx data? >> (6) a block message with valid header but parseable invalid unmalleated committed tx data? >> (7) a block message with valid header but uncommitted valid tx data? >> (8) a block message with valid header but malleated committed valid tx data? >> (9) a block message with valid header but unmalleated committed valid tx data? >> >> Note that only the #9 p2p block message contains an actual Bitcoin block, the others are bogus messages. In all cases the message can be sha256 hashed to establish the identity of the *message*. And if one's objective is to reject repeating bogus messages, this might be a useful strategy. It's already part of the p2p protocol, is orders of magnitude cheaper to produce than a Merkle root, and has no identity issues. > I think I mostly agree with the identity issue as laid out so far, there is one caveat to add if you're considering identity caching as the problem solved. A validation node might have to consider differently block messages processed if they connect on the longest most PoW valid chain for which all blocks have been validated. Or alternatively if they have to be added on a candidate longest most PoW valid chain. Certainly an important consideration. We store both types. Once there is a stronger candidate header chain we store the headers and proceed to obtaining the blocks (if we don't already have them). The blocks are stored in the same table; the confirmed vs. candidate indexes simply point to them as applicable. It is feasible (and has happened twice) for two blocks to share the very same coinbase tx, even with either/all bip30/34/90 active (and setting aside future issues here for the sake of simplicity). This remains only because two competing branches can have blocks at the same height, and bip34 requires only height in the coinbase input script. This therefore implies the same transaction but distinct blocks. It is however infeasible for one block to exist in multiple distinct chains. In order for this to happen two blocks at the same height must have the same coinbase (ok), and also the same parent (ok). But this then means that they either (1) have distinct identity due to another header property deviation, or (2) are the same block with the same parent and are therefore in just one chain. So I don't see an actual caveat. I'm not certain if this is the ambiguity that you were referring to. If not please feel free to clarify. >> The concept of Bitcoin block hash as unique identifier for invalid p2p block messages is problematic. Apart from the malleation question, what is the Bitcoin block hash for a message with unparseable data (#1 and #3)? Such messages are trivial to produce and have no block hash. > For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in `src/node/connection_types.h`) where some preferential peering policy is applied in matters of block messages download. We don't do this and I don't see how it would be relevant. If a peer provides any invalid message or otherwise violates the protocol it is simply dropped. The "problematic" that I'm referring to is the reliance on the block hash as a message identifier, because it does not identify the message and cannot be useful in an effectively unlimited number of zero-cost cases. >> What is the useful identifier for a block with malleated commitments (#5 and #8) or invalid commitments (#4 and #7) - valid txs or otherwise? > The block header, as it commits to the transaction identifier tree can be useful as much for #4 and #5. #4 and #5 refer to "uncommitted" and "malleated committed". It may not be clear, but "uncommitted" means that the tx commitment is not valid (Merkle root doesn't match the header's value) and "malleated committed" means that the (matching) commitment cannot be relied upon because the txs represent malleation, invalidating the identifier. So neither of these are usable identifiers. > On the bitcoin core side, about #7 the uncommitted valid tx data can be already present in the validation cache from mempool acceptance. About #8, the malleaed committed valid transactions shall be also committed in the merkle root in headers. It seems you may be referring to "unconfirmed" txs as opposed to "uncommitted" txs. This doesn't pertain to tx storage or identifiers. Neither #7 nor #8 are usable for the same reasons. >> This seems reasonable at first glance, but given the list of scenarios above, which does it apply to? >> This seems reasonable at first glance, but given the list of scenarios above, which does it apply to? Presumably the invalid header (#2) doesn't get this far because of headers-first. >> That leaves just invalid blocks with useful block hash identifiers (#6). In all other cases the message is simply discarded. In this case the attempt is to move category #5 into category #6 by prohibiting 64 byte txs. > Yes, it's moving from the category #5 to the category #6. Note, transaction malleability can be a distinct issue than lack of domain separation. I'm making no reference to tx malleability. This concerns only Merkle tree (block hash) malleability, the two types described in detail in the paper I referenced earlier, here again: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf >> The requirement to "avoid re-downloading and re-validating it" is about performance, presumably minimizing initial block download/catch-up time. There is a > computational cost to producing 64 byte malleations and none for any of the other bogus block message categories above, including the other form of malleation. > Furthermore, 64 byte malleation has almost zero cost to preclude. No hashing and not even true header or tx parsing are required. Only a handful of bytes must be read > from the raw message before it can be discarded presently. >> That's actually far cheaper than any of the other scenarios that again, have no cost to produce. The other type of malleation requires parsing all of the txs in the block and > hashing and comparing some or all of them. In other words, if there is an attack scenario, that must be addressed before this can be meaningful. In fact all of the other bogus message scenarios (with tx data) will remain more expensive to discard than this one. > In practice on the bitcoin core side, the bogus block message categories from #4 to #6 are already mitigated by validation caching for transactions that have been received early. While libbitcoin has no mempool (at least in earlier versions) transactions buffering can be done by bip152's HeadersAndShortIds message. Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx pool, we just don't store them in RAM (memory). > About #7 and #8, introducing a domain separation where 64 bytes transactions are rejected and making it harder to exploit #7 and #8 categories of bogus block messages. This is correct that bitcoin core might accept valid transaction data before the merkle tree commitment has been verified. I don't follow this. An invalid 64 byte tx consensus rule would definitely not make it harder to exploit block message invalidity. In fact it would just slow down validation by adding a redundant rule. Furthermore, as I have detailed in a previous message, caching invalidity does absolutely nothing to increase protection. In fact it makes the situation materially worse. >> The problem arises from trying to optimize dismissal by storing an identifier. Just *producing* the identifier is orders of magnitude more costly than simply dismissing this > bogus message. I can't imagine why any implementation would want to compute and store and retrieve and recompute and compare hashes when the alterative is just dismissing the bogus messages with no hashing at all. >> Bogus messages will arrive, they do not even have to be requested. The simplest are dealt with by parse failure. What defines a parse is entirely subjective. Generally it's >> "structural" but nothing precludes incorporating a requirement for a necessary leading pattern in the stream, sort of like how the witness pattern is identified. If we were >> going to prioritize early dismissal this is where we would put it. > I don't think this is that simple - While producing an identifier comes with a computational cost (e.g fixed 64-byte structured coinbase transaction), if the full node have a hierarchy of validation cache like bitcoin core has already, the cost of bogus block messages can be slashed down. No, this is not the case. As I detailed in my previous message, there is no possible scenario where invalidation caching does anything but make the situation materially worse. > On the other hand, just dealing with parse failure on the spot by introducing a leading pattern in the stream just inflates the size of p2p messages, and the transaction-relay bandwidth cost. I think you misunderstood me. I am suggesting no change to serialization. I can see how it might be unclear, but I said, "nothing precludes incorporating a requirement for a necessary leading pattern in the stream." I meant that the parser can simply incorporate the *requirement* that the byte stream starts with a null input point. That identifies the malleation or invalidity without a single hash operation and while only reading a handful of bytes. No change to any messages. >> However, there is a tradeoff in terms of early dismissal. Looking up invalid hashes is a costly tradeoff, which becomes multiplied by every block validated. For example, expending 1 millisecond in hash/lookup to save 1 second of validation time in the failure case seems like a reasonable tradeoff, until you multiply across the whole chain. > 1 ms becomes 14 minutes across the chain, just to save a second for each mallied block encountered. That means you need to have encountered 840 such mallied blocks > just to break even. Early dismissing the block for non-null coinbase point (without hashing anything) would be on the order of 1000x faster than that (breakeven at 1 > encounter). So why the block hash cache requirement? It cannot be applied to many scenarios, and cannot be optimal in this one. > I think what you're describing is more a classic time-space tradeoff which is well-known in classic computer science litterature. In my reasonable opinion, one should more reason under what is the security paradigm we wish for bitcoin block-relay network and perduring decentralization, i.e one where it's easy to verify block messages proofs which could have been generated on specialized hardware with an asymmetric cost. Obviously encountering 840 such malliead blocks to make it break even doesn't make the math up to save on hash lookup, unless you can reduce the attack scenario in terms of adversaries capabilities. I'm referring to DoS mitigation (the only relevant security consideration here). I'm pointing out that invalidity caching is pointless in all cases, and in this case is the most pointless as type64 malleation is the cheapest of all invalidity to detect. I would prefer that all bogus blocks sent to my node are of this type. The worst types of invalidity detection have no mitigation and from a security standpoint are counterproductive to cache. I'm describing what overall is actually not a tradeoff. It's all negative and no positive. Best, Eric -- You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com. ------=_Part_35621_885950809.1719969200791 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Antoine R,

>> Ok, thanks for clarifying. I'm still not = making the connection to "checking a non-null [C] pointer" but that's prob = on me.

> A C pointer, which is a language idiome assigning to= a memory address A the value o memory address B can be 0 (or NULL a standa= rd macro defined in stddef.h).
> Here a snippet example of linked l= ist code checking the pointer (`*begin_list`) is non null before the compar= ison operation to find the target element list.
> ...
> Whi= le both libbitcoin and bitcoin core are both written in c++, you still have= underlying pointer derefencing playing out to access the coinbase transact= ion, and all underlying implications in terms of memory management.
I'm familiar with pointers ;).

While at some level the block= message buffer would generally be referenced by one or more C pointers, th= e difference between a valid coinbase input (i.e. with a "null point") and = any other input, is not nullptr vs. !nullptr. A "null point" is a 36 byte v= alue, 32 0x00 byes followed by 4 0xff bytes. In his infinite wisdom Satoshi= decided it was better (or easier) to serialize a first block tx (coinbase)= with an input containing an unusable script and pointing to an invalid [tx= :index] tuple (input point) as opposed to just not having any input. That i= nvalid input point is called a "null point", and of course cannot be pointe= d to by a "null pointer". The coinbase must be identified by comparing thos= e 36 bytes to the well-known null point value (and if this does not match t= he Merkle hash cannot have been type64 malleated).

> I t= hink it's interesting to point out the two types of malleation that a bitco= in consensus validation logic should respect w.r.t block validity checks.= =C2=A0Like you said the first one on the merkle root committed in the heade= rs's `hashMerkleRoot` due to the lack of domain separation between leaf and= merkle tree nodes.

We call this type64 malleability = (or malleation where it is not only possible but occurs).

> T= he second one is the bip141 wtxid commitment in one of the coinbase transac= tion `scriptpubkey` output, which is itself covered by a txid in the merkle= tree.

While symmetry seems to imply that the witness commitment= would be malleable, just as the txs commitment, this is not the case. If t= he tx commitment is correct it is computationally infeasible for the witnes= s commitment to be malleated, as the witness commitment incorporates each f= ull tx (with witness, sentinel, and marker). As such the block identifier, = which relies only on the header and tx commitment, is a sufficient identifi= er. Yet it remains necessary to validate the witness commitment to ensure t= hat the correct witness data has been provided in the block message.
<= br />The second type of malleability, in addition to type64, is what we cal= l type32. This is the consequence of duplicated trailing sets of txs (and t= herefore tx hashes) in a block message. This is applicable to some but not = all blocks, as a function of the number of txs contained.

>&g= t; Caching identity in the case of invalidity is more interesting question = than it might seem.
>> Background: A fully-validated block has e= stablished identity in its block hash. However an invalid block message may= include the same block header, producing the same hash, but with any kind = of nonsense following the header. The purpose of the transaction and witnes= s commitments is of course to establish this identity, so these two checks = are therefore necessary even under checkpoint/milestone. And then of course= the two Merkle tree issues complicate the tx commitment (the integrity of = the witness commitment is assured by that of the tx commitment).
>&= gt;
>> So what does it mean to speak of a block hash derived fro= m:
>> (1) a block message with an unparseable header?
>&= gt; (2) a block message with parseable but invalid header?
>> (3= ) a block message with valid header but unparseable tx data?
>> = (4) a block message with valid header but parseable invalid uncommitted tx = data?
>> (5) a block message with valid header but parseable inv= alid malleated committed tx data?
>> (6) a block message with va= lid header but parseable invalid unmalleated committed tx data?
>&g= t; (7) a block message with valid header but uncommitted valid tx data?
>> (8) a block message with valid header but malleated committed va= lid tx data?
>> (9) a block message with valid header but unmall= eated committed valid tx data?
>>
>> Note that only t= he #9 p2p block message contains an actual Bitcoin block, the others are bo= gus messages. In all cases the message can be sha256 hashed to establish th= e identity of the *message*. And if one's objective is to reject repeating = bogus messages, this might be a useful strategy. It's already part of the p= 2p protocol, is orders of magnitude cheaper to produce than a Merkle root, = and has no identity issues.

> I think I mostly agree with the= identity issue as laid out so far, there is one caveat to add if you're co= nsidering identity caching as the problem solved. A validation node might h= ave to consider differently block messages processed if they connect on the= longest most PoW valid chain for which all blocks have been validated. Or = alternatively if they have to be added on a candidate longest most PoW vali= d chain.

Certainly an important consideration. We store both typ= es. Once there is a stronger candidate header chain we store the headers an= d proceed to obtaining the blocks (if we don't already have them). The bloc= ks are stored in the same table; the confirmed vs. candidate indexes simply= point to them as applicable. It is feasible (and has happened twice) for t= wo blocks to share the very same coinbase tx, even with either/all bip30/34= /90 active (and setting aside future issues here for the sake of simplicity= ). This remains only because two competing branches can have blocks at the = same height, and bip34 requires only height in the coinbase input script. T= his therefore implies the same transaction but distinct blocks. It is howev= er infeasible for one block to exist in multiple distinct chains. In order = for this to happen two blocks at the same height must have the same coinbas= e (ok), and also the same parent (ok). But this then means that they either= (1) have distinct identity due to another header property deviation, or (2= ) are the same block with the same parent and are therefore in just one cha= in. So I don't see an actual caveat. I'm not certain if this is the ambigui= ty that you were referring to. If not please feel free to clarify.
>> The concept of Bitcoin block hash as unique identifier for inva= lid p2p block messages is problematic. Apart from the malleation question, = what is the Bitcoin block hash for a message with unparseable data (#1 and = #3)? Such messages are trivial to produce and have no block hash.

> For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` = (in `src/node/connection_types.h`) where some preferential peering policy i= s applied in matters of block messages download.

We don't do thi= s and I don't see how it would be relevant. If a peer provides any invalid = message or otherwise violates the protocol it is simply dropped.

The "problematic" that I'm referring to is the reliance on the block hash = as a message identifier, because it does not identify the message and canno= t be useful in an effectively unlimited number of zero-cost cases.
>> What is the useful identifier for a block with malleated commit= ments (#5 and #8) or invalid commitments (#4 and #7) - valid txs or otherwi= se?

> The block header, as it commits to the transaction iden= tifier tree can be useful as much for #4 and #5.

#4 and #5 refer= to "uncommitted" and "malleated committed". It may not be clear, but "unco= mmitted" means that the tx commitment is not valid (Merkle root doesn't mat= ch the header's value) and "malleated committed" means that the (matching) = commitment cannot be relied upon because the txs represent malleation, inva= lidating the identifier. So neither of these are usable identifiers.
<= br />> On the bitcoin core side, about #7 the uncommitted valid tx data = can be already present in the validation cache from mempool acceptance. Abo= ut #8, the malleaed committed valid transactions shall be also committed in= the merkle root in headers.

It seems you may be referring to "u= nconfirmed" txs as opposed to "uncommitted" txs. This doesn't pertain to tx= storage or identifiers. Neither #7 nor #8 are usable for the same reasons.=

>> This seems reasonable at first glance, but given the l= ist of scenarios above, which does it apply to?

>> This se= ems reasonable at first glance, but given the list of scenarios above, whic= h does it apply to? Presumably the invalid header (#2) doesn't get this far= because of headers-first.
>> That leaves just invalid blocks wi= th useful block hash identifiers (#6). In all other cases the message is si= mply discarded. In this case the attempt is to move category #5 into catego= ry #6 by prohibiting 64 byte txs.

> Yes, it's moving from the= category #5 to the category #6. Note, transaction malleability can be a di= stinct issue than lack of domain separation.

I'm making no refer= ence to tx malleability. This concerns only Merkle tree (block hash) mallea= bility, the two types described in detail in the paper I referenced earlier= , here again:

https://lists.linuxfoundation.org/pipermail/bitcoi= n-dev/attachments/20190225/a27d8837/attachment-0001.pdf

>>= The requirement to "avoid re-downloading and re-validating it" is about pe= rformance, presumably minimizing initial block download/catch-up time. Ther= e is a > computational cost to producing 64 byte malleations and none fo= r any of the other bogus block message categories above, including the othe= r form of malleation. > Furthermore, 64 byte malleation has almost zero = cost to preclude. No hashing and not even true header or tx parsing are req= uired. Only a handful of bytes must be read > from the raw message befor= e it can be discarded presently.

>> That's actually far ch= eaper than any of the other scenarios that again, have no cost to produce. = The other type of malleation requires parsing all of the txs in the block a= nd > hashing and comparing some or all of them. In other words, if there= is an attack scenario, that must be addressed before this can be meaningfu= l. In fact all of the other bogus message scenarios (with tx data) will rem= ain more expensive to discard than this one.

> In practice on= the bitcoin core side, the bogus block message categories from #4 to #6 ar= e already mitigated by validation caching for transactions that have been r= eceived early. While libbitcoin has no mempool (at least in earlier version= s) transactions buffering can be done by bip152's HeadersAndShortIds messag= e.

Again, this has no relation to tx hashes/identifiers. Libbitc= oin has a tx pool, we just don't store them in RAM (memory).

>= ; About #7 and #8, introducing a domain separation where 64 bytes transacti= ons are rejected and making it harder to exploit #7 and #8 categories of bo= gus block messages. This is correct that bitcoin core might accept valid tr= ansaction data before the merkle tree commitment has been verified.
I don't follow this. An invalid 64 byte tx consensus rule would definit= ely not make it harder to exploit block message invalidity. In fact it woul= d just slow down validation by adding a redundant rule. Furthermore, as I h= ave detailed in a previous message, caching invalidity does absolutely noth= ing to increase protection. In fact it makes the situation materially worse= .

>> The problem arises from trying to optimize dismissal = by storing an identifier. Just *producing* the identifier is orders of magn= itude more costly than simply dismissing this > bogus message. I can't i= magine why any implementation would want to compute and store and retrieve = and recompute and compare hashes when the alterative is just dismissing the= bogus messages with no hashing at all.

>> Bogus messages = will arrive, they do not even have to be requested. The simplest are dealt = with by parse failure. What defines a parse is entirely subjective. General= ly it's
>> "structural" but nothing precludes incorporating a re= quirement for a necessary leading pattern in the stream, sort of like how t= he witness pattern is identified. If we were
>> going to priorit= ize early dismissal this is where we would put it.

> I don't = think this is that simple - While producing an identifier comes with a comp= utational cost (e.g fixed 64-byte structured coinbase transaction), if the = full node have a hierarchy of validation cache like bitcoin core has alread= y, the cost of bogus block messages can be slashed down.

No, thi= s is not the case. As I detailed in my previous message, there is no possib= le scenario where invalidation caching does anything but make the situation= materially worse.

> On the other hand, just dealing with par= se failure on the spot by introducing a leading pattern in the stream just = inflates the size of p2p messages, and the transaction-relay bandwidth cost= .

I think you misunderstood me. I am suggesting no change to ser= ialization. I can see how it might be unclear, but I said, "nothing preclud= es incorporating a requirement for a necessary leading pattern in the strea= m." I meant that the parser can simply incorporate the *requirement* that t= he byte stream starts with a null input point. That identifies the malleati= on or invalidity without a single hash operation and while only reading a h= andful of bytes. No change to any messages.

>> However, th= ere is a tradeoff in terms of early dismissal. Looking up invalid hashes is= a costly tradeoff, which becomes multiplied by every block validated. For = example, expending 1 millisecond in hash/lookup to save 1 second of validat= ion time in the failure case seems like a reasonable tradeoff, until you mu= ltiply across the whole chain. > 1 ms becomes 14 minutes across the chai= n, just to save a second for each mallied block encountered. That means you= need to have encountered 840 such mallied blocks > just to break even. = Early dismissing the block for non-null coinbase point (without hashing any= thing) would be on the order of 1000x faster than that (breakeven at 1 >= encounter). So why the block hash cache requirement? It cannot be applied = to many scenarios, and cannot be optimal in this one.

> I thi= nk what you're describing is more a classic time-space tradeoff which is we= ll-known in classic computer science litterature. In my reasonable opinion,= one should more reason under what is the security paradigm we wish for bit= coin block-relay network and perduring decentralization, i.e one where it's= easy to verify block messages proofs which could have been generated on sp= ecialized hardware with an asymmetric cost. Obviously encountering 840 such= malliead blocks to make it break even doesn't make the math up to save on = hash lookup, unless you can reduce the attack scenario in terms of adversar= ies capabilities.

I'm referring to DoS mitigation (the only rele= vant security consideration here). I'm pointing out that invalidity caching= is pointless in all cases, and in this case is the most pointless as type6= 4 malleation is the cheapest of all invalidity to detect. I would prefer th= at all bogus blocks sent to my node are of this type. The worst types of in= validity detection have no mitigation and from a security standpoint are co= unterproductive to cache. I'm describing what overall is actually not a tra= deoff. It's all negative and no positive.

Best,
Eric

--
You received this message because you are subscribed to the Google Groups &= quot;Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to bitcoind= ev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg= id/bitcoindev/d9834ad5-f803-4a39-a854-95b2439738f5n%40googlegroups.com.=
------=_Part_35621_885950809.1719969200791-- ------=_Part_35620_344008102.1719969200791--