From mboxrd@z Thu Jan  1 00:00:00 1970
Delivery-date: Mon, 01 Jul 2024 22:03:42 -0700
Received: from mail-yb1-f186.google.com ([209.85.219.186])
	by mail.fairlystable.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	(Exim 4.94.2)
	(envelope-from <bitcoindev+bncBC3PT7FYWAMRBJEUR22AMGQE2LA5WUQ@googlegroups.com>)
	id 1sOVg4-0003Pg-CM
	for bitcoindev@gnusha.org; Mon, 01 Jul 2024 22:03:42 -0700
Received: by mail-yb1-f186.google.com with SMTP id 3f1490d57ef6-dc691f1f83asf1909934276.1
        for <bitcoindev@gnusha.org>; Mon, 01 Jul 2024 22:03:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20230601; t=1719896614; x=1720501414; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:sender:from
         :to:cc:subject:date:message-id:reply-to;
        bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=;
        b=Aa690xSGATVAzXlzCZcYaPMjHa1PeCgjMHAO+V74mOIAOpIZ2MhjCdWnq1Z9iOu9rR
         n8QZC0PVzv8jRJT7AfFxmiOZ0PA8N8YEHcVyK7qxlGZFaPF6ucl5aGzFW28ykNFzSQqI
         K7wvrbjGtI0g0DHWoJIsCuXO5mXAU3Aw8I0siZ/WQu4q7+3advGSfDYHndpzbJOMIlW6
         2F0qcuFhGHr7Eu1IuXpDeNXcSoK4swJW/oGWSSiIP5IL698+tqe224u/119cbMLuyL8L
         0gFBHrhUfQLb2wAeWNIh3spY1LAD6AJgrJFycuRinGgGKMm7rUSoKvcjhPJ2GDxCdupa
         /ycA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1719896614; x=1720501414; darn=gnusha.org;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:from:to:cc
         :subject:date:message-id:reply-to;
        bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=;
        b=imDVMNln+4Bjv2z7hdJiTBE17WjjjhAgi/hJSmzBmDpw6y37ypJC7lzkL5/o4fUXmy
         QRx7REOvFZpfRMXtNtwL8fkyyauAZsOUF8heaUTYkQHky1HmO5UyQElIKl2zMxLZIZG/
         IM/B3pKQg9ZHpA9SdJT8KRuPpubVk6KB7FtmLfZ7YXcm02oKy67JzY57akHf7ZgahPTZ
         tx2fvRMOO53asCA3l85r+r3Dx1AnNQmqRQCplj07mU5QyTXcZSXTVDWN4oHfn0fc5rVr
         6N86Y8p6+S44lkMhvcA4eIrbVDkAZtTL9N2/wji9In6HQo6yAupMz5lC4ukNO/iCzV0I
         bFlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1719896614; x=1720501414;
        h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post
         :list-id:mailing-list:precedence:x-original-sender:mime-version
         :subject:references:in-reply-to:message-id:to:from:date:x-beenthere
         :x-gm-message-state:sender:from:to:cc:subject:date:message-id
         :reply-to;
        bh=eBS2L6FdZw8kfftpl7Jjf151kJv2mrSQ4RRA3P0mmyw=;
        b=CFTYFGlTn6ii/3CJgOthF0IZ9gwtHgz4FN5KZ9tJjYzB/fB5aSlATewQlop4Wxhwoq
         068FSVXHFQlm0Zc00hgEjGH5LdNUIr0LLMEv44hB+4nvWXiQDi1h1nCJJipSs28R7F0B
         CCY0K+A5FCD/1dgtft1PuBs8ZT+zWL6N2p1QrMDa0r9R5xGs1OP/YyRT+pHX5C4uwgsv
         Z/v9kD8hUKQSa7gHt9urUwVAvupZnqDTaaDs6E2T0cme9H9+LmMtSlRVIlGVWHw9sN01
         FQvmhRCYLfUMVdQseewAX7SZgubi+wH/SY9QtaD50+/lnmIU5VOrh6Fjy3THt+RN20WO
         Qraw==
Sender: bitcoindev@googlegroups.com
X-Forwarded-Encrypted: i=1; AJvYcCXgMCdEwDaFGLNXi/OQQ+G5rw1srEZFEjjguUKOAyYdrwTOVz1C993vTiZWM8lucPU0GIZZNQnDc9GiqsgJw8sObzqfy64=
X-Gm-Message-State: AOJu0YwCGjpR8VTxzP92RWshAk5rh3BwgLotI0YdrngCv5wgDY8oRIiY
	eBP9MXcoLjEoZ/EUDQVVoi47HXztK3rBW7RM6fTGXXXo//93vVd7
X-Google-Smtp-Source: AGHT+IFRjvTmIYpVJKgh2S0Q357sDrV0V7L75NRvIdG4dOWX2YQSqk3vS25p9bU+ETXNwVVPJgzweg==
X-Received: by 2002:a25:abea:0:b0:e03:359a:6a54 with SMTP id 3f1490d57ef6-e036df3bcecmr5070667276.6.1719896613816;
        Mon, 01 Jul 2024 22:03:33 -0700 (PDT)
X-BeenThere: bitcoindev@googlegroups.com
Received: by 2002:a05:6902:1005:b0:e03:2217:3c88 with SMTP id
 3f1490d57ef6-e0356250cd3ls1670233276.1.-pod-prod-05-us; Mon, 01 Jul 2024
 22:03:32 -0700 (PDT)
X-Received: by 2002:a05:6902:188e:b0:dfb:1c1c:abf9 with SMTP id 3f1490d57ef6-e036eb1ec92mr406657276.2.1719896612408;
        Mon, 01 Jul 2024 22:03:32 -0700 (PDT)
Received: by 2002:a81:ae02:0:b0:627:7f59:2eee with SMTP id 00721157ae682-64d36134a0ams7b3;
        Mon, 1 Jul 2024 19:36:09 -0700 (PDT)
X-Received: by 2002:a05:6902:150d:b0:e03:31ec:8a1d with SMTP id 3f1490d57ef6-e036eae64bamr338066276.3.1719887769124;
        Mon, 01 Jul 2024 19:36:09 -0700 (PDT)
Date: Mon, 1 Jul 2024 19:36:08 -0700 (PDT)
From: Antoine Riard <antoine.riard@gmail.com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Message-Id: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com>
In-Reply-To: <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
References: <gnM89sIQ7MhDgI62JciQEGy63DassEv7YZAMhj0IEuIo0EdnafykF6RH4OqjTTHIHsIoZvC2MnTUzJI7EfET4o-UQoD-XAQRDcct994VarE=@protonmail.com>
 <72e83c31-408f-4c13-bff5-bf0789302e23n@googlegroups.com>
 <heKH68GFJr4Zuf6lBozPJrb-StyBJPMNvmZL0xvKFBnBGVA3fVSgTLdWc-_8igYWX8z3zCGvzflH-CsRv0QCJQcfwizNyYXlBJa_Kteb2zg=@protonmail.com>
 <5b0331a5-4e94-465d-a51d-02166e2c1937n@googlegroups.com>
 <yt1O1F7NiVj-WkmnYeta1fSqCYNFx8h6OiJaTBmwhmJ2MWAZkmmjPlUST6FM7t6_-2NwWKdglWh77vcnEKA8swiAnQCZJY2SSCAh4DOKt2I=@protonmail.com>
 <be78e733-6e9f-4f4e-8dc2-67b79ddbf677n@googlegroups.com>
 <jJLDrYTXvTgoslhl1n7Fk9-pL1mMC-0k6gtoniQINmioJpzgtqrJ_WqyFZkLltsCUusnQ4jZ6HbvRC-mGuaUlDi3kcqcFHALd10-JQl-FMY=@protonmail.com>
 <9a4c4151-36ed-425a-a535-aa2837919a04n@googlegroups.com>
 <3f0064f9-54bd-46a7-9d9a-c54b99aca7b2n@googlegroups.com>
 <26b7321b-cc64-44b9-bc95-a4d8feb701e5n@googlegroups.com>
 <CALZpt+EwVyaz1=A6hOOycqFGJs+zxyYYocZixTJgVmzZezUs9Q@mail.gmail.com>
 <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
 <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_22823_366698941.1719887768873"
X-Original-Sender: antoine.riard@gmail.com
Precedence: list
Mailing-list: list bitcoindev@googlegroups.com; contact bitcoindev+owners@googlegroups.com
List-ID: <bitcoindev.googlegroups.com>
X-Google-Group-Id: 786775582512
List-Post: <https://groups.google.com/group/bitcoindev/post>, <mailto:bitcoindev@googlegroups.com>
List-Help: <https://groups.google.com/support/>, <mailto:bitcoindev+help@googlegroups.com>
List-Archive: <https://groups.google.com/group/bitcoindev
List-Subscribe: <https://groups.google.com/group/bitcoindev/subscribe>, <mailto:bitcoindev+subscribe@googlegroups.com>
List-Unsubscribe: <mailto:googlegroups-manage+786775582512+unsubscribe@googlegroups.com>,
 <https://groups.google.com/group/bitcoindev/subscribe>
X-Spam-Score: -0.5 (/)

------=_Part_22823_366698941.1719887768873
Content-Type: multipart/alternative; 
	boundary="----=_Part_22824_1413333519.1719887768873"

------=_Part_22824_1413333519.1719887768873
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Eric,

> Ok, thanks for clarifying. I'm still not making the connection to=20
"checking a non-null [C] pointer" but that's prob on me.

A C pointer, which is a language idiome assigning to a memory address A the=
=20
value o memory address B can be 0 (or NULL a standard macro defined in=20
stddef.h).

Here a snippet example of linked list code checking the pointer=20
(`*begin_list`) is non null before the comparison operation to find the=20
target element list.

```
pointer_t       ft_list_find(pointer_t **start_list, void *data_ref, int=20
(*cmp)())
{
        while (*start_list)
        {
                if (cmp((*start_list)->data, data_ref) =3D=3D 0)
                        return (*start_list);
                *start_list =3D (*start_list)->next;
        }
        return (0);
}
```

While both libbitcoin and bitcoin core are both written in c++, you still=
=20
have underlying pointer derefencing playing out to access the coinbase
transaction, and all underlying implications in terms of memory management.

> Yes, a rough correlation but not necessarily equivalence. Note that=20
block.check has context free and contextual overrides.
>=20
> The 'bypass' parameter indicates a block under checkpoint or milestone=20
("assume valid"). In this case we must check Merkle root, witness=20
commitment, and both types of malleation - as the purpose is to establish=
=20
identity. Absent 'bypass' the typical checks are performed, and therefore a=
=20
malleation check is not required here. The "type64" malleation is subsumed=
=20
by the is_first_non_coinbase check and the "type32" malleation is subsumed=
=20
by the is_internal_double_spend check.

Yes, I understand it's not a 1-to-1 compatibility, just a rough logical=20
equivalence.

I think it's interesting to point out the two types of malleation that a=20
bitcoin consensus validation logic should respect w.r.t block validity=20
checks.

Like you said the first one on the merkle root committed in the headers's=
=20
`hashMerkleRoot` due to the lack of domain separation between leaf and=20
merkle tree nodes.
The second one is the bip141 wtxid commitment in one of the coinbase=20
transaction `scriptpubkey` output, which is itself covered by a txid in the=
=20
merkle tree.

> Caching identity in the case of invalidity is more interesting question=
=20
than it might seem.
>=20
> Background: A fully-validated block has established identity in its block=
=20
hash. However an invalid block message may include the same block header,=
=20
producing the same hash, but with any kind of nonsense following the=20
header. The purpose of the transaction and witness commitments is of course=
=20
to establish this identity, so these two checks are therefore necessary=20
even under checkpoint/milestone. And then of course the two Merkle tree=20
issues complicate the tx commitment (the integrity of the witness=20
commitment is assured by that of the tx commitment).
>=20
> So what does it mean to speak of a block hash derived from:
>=20
> (1) a block message with an unparseable header?
> (2) a block message with parseable but invalid header?
> (3) a block message with valid header but unparseable tx data?
> (4) a block message with valid header but parseable invalid uncommitted=
=20
tx data?
> (5) a block message with valid header but parseable invalid malleated=20
committed tx data?
> (6) a block message with valid header but parseable invalid unmalleated=
=20
committed tx data?
> (7) a block message with valid header but uncommitted valid tx data?
> (8) a block message with valid header but malleated committed valid tx=20
data?
> (9) a block message with valid header but unmalleated committed valid tx=
=20
data?
>=20
> Note that only the #9 p2p block message contains an actual Bitcoin block,=
=20
the others are bogus messages. In all cases the message can be sha256=20
hashed to establish the identity of the *message*. And if one's objective=
=20
is to reject repeating bogus messages, this might be a useful strategy.=20
It's already part of the p2p protocol, is orders of magnitude cheaper to=20
produce than a Merkle root, and has no identity issues.

I think I mostly agree with the identity issue as laid out so far, there is=
=20
one caveat to add if you're considering identity caching as the problem=20
solved.
A validation node might have to consider differently block messages=20
processed if they connect on the longest most PoW valid chain for which all=
=20
blocks have been validated. Or alternatively if they have to be added on a=
=20
candidate longest most PoW valid chain.

> The concept of Bitcoin block hash as unique identifier for invalid p2p=20
block messages is problematic. Apart from the malleation question, what is=
=20
the Bitcoin block
> hash for a message with unparseable data (#1 and #3)? Such messages are=
=20
trivial to produce and have no block hash.

For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in=20
`src/node/connection_types.h`) where some preferential peering policy is=20
applied in matters of block messages download.

> What is the useful identifier for a block with malleated commitments (#5=
=20
and #8) or invalid commitments (#4 and #7) - valid txs or otherwise?

The block header, as it commits to the transaction identifier tree can be=
=20
useful as much for #4 and #5. On the bitcoin core side, about #7 the=20
uncommitted valid tx data can be already present in the validation cache=20
from mempool acceptance. About #8, the malleaed committed valid=20
transactions shall be also committed in the merkle root in headers.

> This seems reasonable at first glance, but given the list of scenarios=20
above, which does it apply to?

> This seems reasonable at first glance, but given the list of scenarios=20
above, which does it apply to? Presumably the invalid header (#2) doesn't=
=20
get this far because of headers-first.
> That leaves just invalid blocks with useful block hash identifiers (#6).=
=20
In all other cases the message is simply discarded. In this case the=20
attempt is to move category #5 into category #6 by prohibiting 64 byte txs.

Yes, it's moving from the category #5 to the category #6. Note, transaction=
=20
malleability can be a distinct issue than lack of domain separation.

> The requirement to "avoid re-downloading and re-validating it" is about=
=20
performance, presumably minimizing initial block download/catch-up time.=20
There is a > computational cost to producing 64 byte malleations and none=
=20
for any of the other bogus block message categories above, including the=20
other form of malleation. > Furthermore, 64 byte malleation has almost zero=
=20
cost to preclude. No hashing and not even true header or tx parsing are=20
required. Only a handful of bytes must be read > from the raw message=20
before it can be discarded presently.

> That's actually far cheaper than any of the other scenarios that again,=
=20
have no cost to produce. The other type of malleation requires parsing all=
=20
of the txs in the block and > hashing and comparing some or all of them. In=
=20
other words, if there is an attack scenario, that must be addressed before=
=20
this can be meaningful. In fact all of the other
> bogus message scenarios (with tx data) will remain more expensive to=20
discard than this one.

In practice on the bitcoin core side, the bogus block message categories=20
from #4 to #6 are already mitigated by validation caching for transactions=
=20
that have been received early. While libbitcoin has no mempool (at least in=
=20
earlier versions) transactions buffering can be done by bip152's=20
HeadersAndShortIds message.

About #7 and #8, introducing a domain separation where 64 bytes=20
transactions are rejected and making it harder to exploit #7 and #8=20
categories of bogus block messages.
This is correct that bitcoin core might accept valid transaction data=20
before the merkle tree commitment has been verified.

> The problem arises from trying to optimize dismissal by storing an=20
identifier. Just *producing* the identifier is orders of magnitude more=20
costly than simply dismissing this > bogus message. I can't imagine why any=
=20
implementation would want to compute and store and retrieve and recompute=
=20
and compare hashes when the alterative is just
> dismissing the bogus messages with no hashing at all.

> Bogus messages will arrive, they do not even have to be requested. The=20
simplest are dealt with by parse failure. What defines a parse is entirely=
=20
subjective. Generally it's
> "structural" but nothing precludes incorporating a requirement for a=20
necessary leading pattern in the stream, sort of like how the witness=20
pattern is identified. If we were
> going to prioritize early dismissal this is where we would put it.

I don't think this is that simple - While producing an identifier comes=20
with a computational cost (e.g fixed 64-byte structured coinbase=20
transaction), if the full node have a hierarchy of validation cache like=20
bitcoin core has already, the cost of bogus block messages can be slashed=
=20
down. On the other hand, just dealing with parse failure on the spot by=20
introducing a leading pattern in the stream just inflates the size of p2p=
=20
messages, and the transaction-relay bandwidth cost.

> However, there is a tradeoff in terms of early dismissal. Looking up=20
invalid hashes is a costly tradeoff, which becomes multiplied by every=20
block validated. For example,
> expending 1 millisecond in hash/lookup to save 1 second of validation=20
time in the failure case seems like a reasonable tradeoff, until you=20
multiply across the whole chain. > 1 ms becomes 14 minutes across the=20
chain, just to save a second for each mallied block encountered. That means=
=20
you need to have encountered 840 such mallied blocks > just to break even.=
=20
Early dismissing the block for non-null coinbase point (without hashing=20
anything) would be on the order of 1000x faster than that (breakeven at 1 >=
=20
encounter). So why the block hash cache requirement? It cannot be applied=
=20
to many scenarios, and cannot be optimal in this one.

I think what you're describing is more a classic time-space tradeoff which=
=20
is well-known in classic computer science litterature. In my reasonable=20
opinion, one should more reason under what is the security paradigm we wish=
=20
for bitcoin block-relay network and perduring decentralization, i.e one=20
where it's easy to verify block messages proofs which could have been=20
generated on specialized hardware with an asymmetric cost. Obviously=20
encountering 840 such malliead blocks to make it break even doesn't make=20
the math up to save on hash lookup, unless you can reduce the attack=20
scenario in terms of adversaries capabilities.

Best,
Antoine=20
Le samedi 29 juin 2024 =C3=A0 21:42:23 UTC+1, Eric Voskuil a =C3=A9crit :

> Caching identity in the case of invalidity is more interesting question=
=20
> than it might seem.
>
> Background: A fully-validated block has established identity in its block=
=20
> hash. However an invalid block message may include the same block header,=
=20
> producing the same hash, but with any kind of nonsense following the=20
> header. The purpose of the transaction and witness commitments is of cour=
se=20
> to establish this identity, so these two checks are therefore necessary=
=20
> even under checkpoint/milestone. And then of course the two Merkle tree=
=20
> issues complicate the tx commitment (the integrity of the witness=20
> commitment is assured by that of the tx commitment).
>
> So what does it mean to speak of a block hash derived from:
>
> (1) a block message with an unparseable header?
> (2) a block message with parseable but invalid header?
> (3) a block message with valid header but unparseable tx data?
> (4) a block message with valid header but parseable invalid uncommitted t=
x=20
> data?
> (5) a block message with valid header but parseable invalid malleated=20
> committed tx data?
> (6) a block message with valid header but parseable invalid unmalleated=
=20
> committed tx data?
> (7) a block message with valid header but uncommitted valid tx data?
> (8) a block message with valid header but malleated committed valid tx=20
> data?
> (9) a block message with valid header but unmalleated committed valid tx=
=20
> data?
>
> Note that only the #9 p2p block message contains an actual Bitcoin block,=
=20
> the others are bogus messages. In all cases the message can be sha256=20
> hashed to establish the identity of the *message*. And if one's objective=
=20
> is to reject repeating bogus messages, this might be a useful strategy.=
=20
> It's already part of the p2p protocol, is orders of magnitude cheaper to=
=20
> produce than a Merkle root, and has no identity issues.
>
> The concept of Bitcoin block hash as unique identifier for invalid p2p=20
> block messages is problematic. Apart from the malleation question, what i=
s=20
> the Bitcoin block hash for a message with unparseable data (#1 and #3)?=
=20
> Such messages are trivial to produce and have no block hash. What is the=
=20
> useful identifier for a block with malleated commitments (#5 and #8) or=
=20
> invalid commitments (#4 and #7) - valid txs or otherwise?
>
> The stated objective for a consensus rule to invalidate all 64 byte txs i=
s:
>
> > being able to cache the hash of a (non-malleated) invalid block as=20
> permanently invalid to avoid re-downloading and re-validating it.
>
> This seems reasonable at first glance, but given the list of scenarios=20
> above, which does it apply to? Presumably the invalid header (#2) doesn't=
=20
> get this far because of headers-first. That leaves just invalid blocks wi=
th=20
> useful block hash identifiers (#6). In all other cases the message is=20
> simply discarded. In this case the attempt is to move category #5 into=20
> category #6 by prohibiting 64 byte txs.
>
> The requirement to "avoid re-downloading and re-validating it" is about=
=20
> performance, presumably minimizing initial block download/catch-up time.=
=20
> There is a computational cost to producing 64 byte malleations and none f=
or=20
> any of the other bogus block message categories above, including the othe=
r=20
> form of malleation. Furthermore, 64 byte malleation has almost zero cost =
to=20
> preclude. No hashing and not even true header or tx parsing are required.=
=20
> Only a handful of bytes must be read from the raw message before it can b=
e=20
> discarded presently.
>
> That's actually far cheaper than any of the other scenarios that again,=
=20
> have no cost to produce. The other type of malleation requires parsing al=
l=20
> of the txs in the block and hashing and comparing some or all of them. In=
=20
> other words, if there is an attack scenario, that must be addressed befor=
e=20
> this can be meaningful. In fact all of the other bogus message scenarios=
=20
> (with tx data) will remain more expensive to discard than this one.
>
> The problem arises from trying to optimize dismissal by storing an=20
> identifier. Just *producing* the identifier is orders of magnitude more=
=20
> costly than simply dismissing this bogus message. I can't imagine why any=
=20
> implementation would want to compute and store and retrieve and recompute=
=20
> and compare hashes when the alterative is just dismissing the bogus=20
> messages with no hashing at all.
>
> Bogus messages will arrive, they do not even have to be requested. The=20
> simplest are dealt with by parse failure. What defines a parse is entirel=
y=20
> subjective. Generally it's "structural" but nothing precludes incorporati=
ng=20
> a requirement for a necessary leading pattern in the stream, sort of like=
=20
> how the witness pattern is identified. If we were going to prioritize ear=
ly=20
> dismissal this is where we would put it.
>
> However, there is a tradeoff in terms of early dismissal. Looking up=20
> invalid hashes is a costly tradeoff, which becomes multiplied by every=20
> block validated. For example, expending 1 millisecond in hash/lookup to=
=20
> save 1 second of validation time in the failure case seems like a=20
> reasonable tradeoff, until you multiply across the whole chain. 1 ms=20
> becomes 14 minutes across the chain, just to save a second for each malli=
ed=20
> block encountered. That means you need to have encountered 840 such malli=
ed=20
> blocks just to break even. Early dismissing the block for non-null coinba=
se=20
> point (without hashing anything) would be on the order of 1000x faster th=
an=20
> that (breakeven at 1 encounter). So why the block hash cache requirement?=
=20
> It cannot be applied to many scenarios, and cannot be optimal in this one=
.
>
> Eric
>

--=20
You received this message because you are subscribed to the Google Groups "=
Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/=
bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com.

------=_Part_22824_1413333519.1719887768873
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Eric,<div><br /></div><div>&gt; Ok, thanks for clarifying. I'm still not=
 making the connection to "checking a non-null [C] pointer" but that's prob=
 on me.<br /><br />A C pointer, which is a language idiome assigning to a m=
emory address A the value o memory address B can be 0 (or NULL a standard m=
acro defined in stddef.h).<br /><br />Here a snippet example of linked list=
 code checking the pointer (`*begin_list`) is non null before the compariso=
n operation to find the target element list.<br /><br />```<br />pointer_t =
=C2=A0 =C2=A0 =C2=A0 ft_list_find(pointer_t **start_list, void *data_ref, i=
nt (*cmp)())<br />{<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (*start_list)<br=
 />=C2=A0 =C2=A0 =C2=A0 =C2=A0 {<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 if (cmp((*start_list)-&gt;data, data_ref) =3D=3D 0)<br=
 />=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 return (*start_list);<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 *start_list =3D (*start_list)-&gt;next;<br />=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 }<br />=C2=A0 =C2=A0 =C2=A0 =C2=A0 return (0);<br />}<=
/div><div>```</div><div><br /></div><div>While both libbitcoin and bitcoin =
core are both written in c++, you still have underlying pointer derefencing=
 playing out to access the coinbase<br />transaction, and all underlying im=
plications in terms of memory management.<br /><br />&gt; Yes, a rough corr=
elation but not necessarily equivalence. Note that block.check has context =
free and contextual overrides.<br />&gt; <br />&gt; The 'bypass' parameter =
indicates a block under checkpoint or milestone ("assume valid"). In this c=
ase we must check Merkle root, witness commitment, and both types of mallea=
tion - as the purpose is to establish identity. Absent 'bypass' the typical=
 checks are performed, and therefore a malleation check is not required her=
e. The "type64" malleation is subsumed by the is_first_non_coinbase check a=
nd the "type32" malleation is subsumed by the is_internal_double_spend chec=
k.<br /><br />Yes, I understand it's not a 1-to-1 compatibility, just a rou=
gh logical equivalence.<br /><br />I think it's interesting to point out th=
e two types of malleation that a bitcoin consensus validation logic should =
respect w.r.t block validity checks.<br /><br />Like you said the first one=
 on the merkle root committed in the headers's `hashMerkleRoot` due to the =
lack of domain separation between leaf and merkle tree nodes.<br />The seco=
nd one is the bip141 wtxid commitment in one of the coinbase transaction `s=
criptpubkey` output,=C2=A0which is itself covered by a txid in the merkle t=
ree.</div><div><br /></div><div>&gt; Caching identity in the case of invali=
dity is more interesting question than it might seem.<br />&gt; <br />&gt; =
Background: A fully-validated block has established identity in its block h=
ash. However an invalid block message may include the same block header, pr=
oducing the same hash, but with any kind of nonsense following the header. =
The purpose of the transaction and witness commitments is of course to esta=
blish this identity, so these two checks are therefore necessary even under=
 checkpoint/milestone. And then of course the two Merkle tree issues compli=
cate the tx commitment (the integrity of the witness commitment is assured =
by that of the tx commitment).<br />&gt; <br />&gt; So what does it mean to=
 speak of a block hash derived from:<br />&gt; <br />&gt; (1) a block messa=
ge with an unparseable header?<br />&gt; (2) a block message with parseable=
 but invalid header?<br />&gt; (3) a block message with valid header but un=
parseable tx data?<br />&gt; (4) a block message with valid header but pars=
eable invalid uncommitted tx data?<br />&gt; (5) a block message with valid=
 header but parseable invalid malleated committed tx data?<br />&gt; (6) a =
block message with valid header but parseable invalid unmalleated committed=
 tx data?<br />&gt; (7) a block message with valid header but uncommitted v=
alid tx data?<br />&gt; (8) a block message with valid header but malleated=
 committed valid tx data?<br />&gt; (9) a block message with valid header b=
ut unmalleated committed valid tx data?<br />&gt; <br />&gt; Note that only=
 the #9 p2p block message contains an actual Bitcoin block, the others are =
bogus messages. In all cases the message can be sha256 hashed to establish =
the identity of the *message*. And if one's objective is to reject repeatin=
g bogus messages, this might be a useful strategy. It's already part of the=
 p2p protocol, is orders of magnitude cheaper to produce than a Merkle root=
, and has no identity issues.<br /><br /></div><div>I think I mostly agree =
with the identity issue as laid out so far, there is one caveat to add if y=
ou're considering identity caching as the problem solved.<br />A validation=
 node might have to consider differently block messages processed if they c=
onnect on the longest most PoW valid chain for which all blocks have been v=
alidated. Or alternatively if they have to be added on a candidate longest =
most PoW valid chain.<br /><br />&gt; The concept of Bitcoin block hash as =
unique identifier for invalid p2p block messages is problematic. Apart from=
 the malleation question, what is the Bitcoin block<br />&gt; hash for a me=
ssage with unparseable data (#1 and #3)? Such messages are trivial to produ=
ce and have no block hash.<br /><br />For reasons, bitcoin core has the con=
cept of outbound `BLOCK_RELAY` (in `src/node/connection_types.h`) where som=
e preferential peering policy is applied in matters of block messages downl=
oad.<br /><br />&gt; What is the useful identifier for a block with malleat=
ed commitments (#5 and #8) or invalid commitments (#4 and #7) - valid txs o=
r otherwise?<br /><br />The block header, as it commits to the transaction =
identifier tree can be useful as much for #4 and #5. On the bitcoin core si=
de, about #7 the uncommitted valid tx data can be already present in the va=
lidation cache from mempool acceptance. About #8, the malleaed committed va=
lid transactions shall be also committed in the merkle root in headers.<br =
/><br />&gt; This seems reasonable at first glance, but given the list of s=
cenarios above, which does it apply to?</div><div><br /></div>&gt; This see=
ms reasonable at first glance, but given the list of scenarios above, which=
 does it apply to? Presumably the invalid header (#2) doesn't get this far =
because of headers-first.<br />&gt; That leaves just invalid blocks with us=
eful block hash identifiers (#6). In all other cases the message is simply =
discarded. In this case the attempt is to move category #5 into category #6=
 by prohibiting 64 byte txs.<br /><br />Yes, it's moving from the category =
#5 to the category #6. Note, transaction malleability can be a distinct iss=
ue than lack of domain separation.<br /><br />&gt; The requirement to "avoi=
d re-downloading and re-validating it" is about performance, presumably min=
imizing initial block download/catch-up time. There is a &gt; computational=
 cost to producing 64 byte malleations and none for any of the other bogus =
block message categories above, including the other form of malleation. &gt=
; Furthermore, 64 byte malleation has almost zero cost to preclude. No hash=
ing and not even true header or tx parsing are required. Only a handful of =
bytes must be read &gt; from the raw message before it can be discarded pre=
sently.<br /><br />&gt; That's actually far cheaper than any of the other s=
cenarios that again, have no cost to produce. The other type of malleation =
requires parsing all of the txs in the block and &gt; hashing and comparing=
 some or all of them. In other words, if there is an attack scenario, that =
must be addressed before this can be meaningful. In fact all of the other<d=
iv>&gt; bogus message scenarios (with tx data) will remain more expensive t=
o discard than this one.<br /><br />In practice on the bitcoin core side, t=
he bogus block message categories from #4 to #6 are already mitigated by va=
lidation caching for transactions that have been received early. While libb=
itcoin has no mempool (at least in earlier versions) transactions buffering=
 can be done by bip152's HeadersAndShortIds message.<br /><br />About #7 an=
d #8, introducing a domain separation where 64 bytes transactions are rejec=
ted and making it harder to exploit #7 and #8 categories of bogus block mes=
sages.<br /><div>This is correct that bitcoin core might accept valid trans=
action data before the merkle tree commitment has been verified.</div><div>=
<br /></div>&gt; The problem arises from trying to optimize dismissal by st=
oring an identifier. Just *producing* the identifier is orders of magnitude=
 more costly than simply dismissing this &gt; bogus message. I can't imagin=
e why any implementation would want to compute and store and retrieve and r=
ecompute and compare hashes when the alterative is just<div>&gt; dismissing=
 the bogus messages with no hashing at all.<br /><br />&gt; Bogus messages =
will arrive, they do not even have to be requested. The simplest are dealt =
with by parse failure. What defines a parse is entirely subjective. General=
ly it's<div>&gt; "structural" but nothing precludes incorporating a require=
ment for a necessary leading pattern in the stream, sort of like how the wi=
tness pattern is identified. If we were</div><div>&gt; going to prioritize =
early dismissal this is where we would put it.<div><br />I don't think this=
 is that simple - While producing an identifier comes with a computational =
cost (e.g fixed 64-byte structured coinbase transaction), if the full node =
have a hierarchy of validation cache like bitcoin core has already, the cos=
t of bogus block messages can be slashed down. On the other hand, just deal=
ing with parse failure on the spot by introducing a leading pattern in the =
stream just inflates the size of p2p messages, and the transaction-relay ba=
ndwidth cost.<br /><br />&gt; However, there is a tradeoff in terms of earl=
y dismissal. Looking up invalid hashes is a costly tradeoff, which becomes =
multiplied by every block validated. For example,</div><div>&gt; expending =
1 millisecond in hash/lookup to save 1 second of validation time in the fai=
lure case seems like a reasonable tradeoff, until you multiply across the w=
hole chain. &gt; 1 ms becomes 14 minutes across the chain, just to save a s=
econd for each mallied block encountered. That means you need to have encou=
ntered 840 such mallied blocks &gt; just to break even. Early dismissing th=
e block for non-null coinbase point (without hashing anything) would be on =
the order of 1000x faster than that (breakeven at 1 &gt; encounter). So why=
 the block hash cache requirement? It cannot be applied to many scenarios, =
and cannot be optimal in this one.<br /><br />I think what you're describin=
g is more a classic time-space tradeoff which is well-known in classic comp=
uter science litterature. In my reasonable opinion, one should more reason =
under what is the security paradigm we wish for bitcoin block-relay network=
 and perduring decentralization, i.e one where it's easy to verify block me=
ssages proofs which could have been generated on specialized hardware with =
an asymmetric cost. Obviously encountering 840 such malliead blocks to make=
 it break even doesn't make the math up to save on hash lookup, unless you =
can reduce the attack scenario in terms of adversaries capabilities.<br /><=
br />Best,<br /><div>Antoine=C2=A0</div></div></div></div></div><div class=
=3D"gmail_quote"><div dir=3D"auto" class=3D"gmail_attr">Le samedi 29 juin 2=
024 =C3=A0 21:42:23 UTC+1, Eric Voskuil a =C3=A9crit=C2=A0:<br/></div><bloc=
kquote class=3D"gmail_quote" style=3D"margin: 0 0 0 0.8ex; border-left: 1px=
 solid rgb(204, 204, 204); padding-left: 1ex;">Caching identity in the case=
 of invalidity is more interesting question than it might seem.<br><br>Back=
ground: A fully-validated block has established identity in its block hash.=
 However an invalid block message may include the same block header, produc=
ing the same hash, but with any kind of nonsense following the header. The =
purpose of the transaction and witness commitments is of course to establis=
h this identity, so these two checks are therefore necessary even under che=
ckpoint/milestone. And then of course the two Merkle tree issues complicate=
 the tx commitment (the integrity of the witness commitment is assured by t=
hat of the tx commitment).<br><br>So what does it mean to speak of a block =
hash derived from:<br><br>(1) a block message with an unparseable header?<b=
r>(2) a block message with parseable but invalid header?<br>(3) a block mes=
sage with valid header but unparseable tx data?<br>(4) a block message with=
 valid header but parseable invalid uncommitted tx data?<br>(5) a block mes=
sage with valid header but parseable invalid malleated committed tx data?<b=
r>(6) a block message with valid header but parseable invalid unmalleated c=
ommitted tx data?<br>(7) a block message with valid header but uncommitted =
valid tx data?<br>(8) a block message with valid header but malleated commi=
tted valid tx data?<br>(9) a block message with valid header but unmalleate=
d committed valid tx data?<br><br>Note that only the #9 p2p block message c=
ontains an actual Bitcoin block, the others are bogus messages. In all case=
s the message can be sha256 hashed to establish the identity of the *messag=
e*. And if one&#39;s objective is to reject repeating bogus messages, this =
might be a useful strategy. It&#39;s already part of the p2p protocol, is o=
rders of magnitude cheaper to produce than a Merkle root, and has no identi=
ty issues.<br><br>The concept of Bitcoin block hash as unique identifier fo=
r invalid p2p block messages is problematic. Apart from the malleation ques=
tion, what is the Bitcoin block hash for a message with unparseable data (#=
1 and #3)? Such messages are trivial to produce and have no block hash. Wha=
t is the useful identifier for a block with malleated commitments (#5 and #=
8) or invalid commitments (#4 and #7) - valid txs or otherwise?<br><br>The =
stated objective for a consensus rule to invalidate all 64 byte txs is:<br>=
<br>&gt; being able to cache the hash of a (non-malleated) invalid block as=
 permanently invalid to avoid re-downloading and re-validating it.<br><br>T=
his seems reasonable at first glance, but given the list of scenarios above=
, which does it apply to? Presumably the invalid header (#2) doesn&#39;t ge=
t this far because of headers-first. That leaves just invalid blocks with u=
seful block hash identifiers (#6). In all other cases the message is simply=
 discarded. In this case the attempt is to move category #5 into category #=
6 by prohibiting 64 byte txs.<br><br>The requirement to &quot;avoid re-down=
loading and re-validating it&quot; is about performance, presumably minimiz=
ing initial block download/catch-up time. There is a computational cost to =
producing 64 byte malleations and none for any of the other bogus block mes=
sage categories above, including the other form of malleation. Furthermore,=
 64 byte malleation has almost zero cost to preclude. No hashing and not ev=
en true header or tx parsing are required. Only a handful of bytes must be =
read from the raw message before it can be discarded presently.<br><br>That=
&#39;s actually far cheaper than any of the other scenarios that again, hav=
e no cost to produce. The other type of malleation requires parsing all of =
the txs in the block and hashing and comparing some or all of them. In othe=
r words, if there is an attack scenario, that must be addressed before this=
 can be meaningful. In fact all of the other bogus message scenarios (with =
tx data) will remain more expensive to discard than this one.<br><br>The pr=
oblem arises from trying to optimize dismissal by storing an identifier. Ju=
st *producing* the identifier is orders of magnitude more costly than simpl=
y dismissing this bogus message. I can&#39;t imagine why any implementation=
 would want to compute and store and retrieve and recompute and compare has=
hes when the alterative is just dismissing the bogus messages with no hashi=
ng at all.<br><br>Bogus messages will arrive, they do not even have to be r=
equested. The simplest are dealt with by parse failure. What defines a pars=
e is entirely subjective. Generally it&#39;s &quot;structural&quot; but not=
hing precludes incorporating a requirement for a necessary leading pattern =
in the stream, sort of like how the witness pattern is identified. If we we=
re going to prioritize early dismissal this is where we would put it.<br><b=
r>However, there is a tradeoff in terms of early dismissal. Looking up inva=
lid hashes is a costly tradeoff, which becomes multiplied by every block va=
lidated. For example, expending 1 millisecond in hash/lookup to save 1 seco=
nd of validation time in the failure case seems like a reasonable tradeoff,=
 until you multiply across the whole chain. 1 ms becomes 14 minutes across =
the chain, just to save a second for each mallied block encountered. That m=
eans you need to have encountered 840 such mallied blocks just to break eve=
n. Early dismissing the block for non-null coinbase point (without hashing =
anything) would be on the order of 1000x faster than that (breakeven at 1 e=
ncounter). So why the block hash cache requirement? It cannot be applied to=
 many scenarios, and cannot be optimal in this one.<br><br>Eric<br></blockq=
uote></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;Bitcoin Development Mailing List&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:bitcoindev+unsubscribe@googlegroups.com">bitcoind=
ev+unsubscribe@googlegroups.com</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.=
com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/msg=
id/bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com</a>.=
<br />

------=_Part_22824_1413333519.1719887768873--

------=_Part_22823_366698941.1719887768873--