Re: [bitcoin-dev] BIP 174 thoughts

From: matejcik <jan.matejek@satoshilabs.com>
To: Achow101 <achow101-lists@achow101.com>,
	Bitcoin Protocol Discussion
	<bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP 174 thoughts
Date: Wed, 4 Jul 2018 15:19:11 +0200	[thread overview]
Message-ID: <c7a4476b-8643-3ddd-723b-1ff8b8910e36@satoshilabs.com> (raw)
In-Reply-To: <RdSjdFhvANrG9ve8bXVnqs68ih5_iVK11jdOAL6WoMI2358TdylR3H2SyGHQfByKwMYYOfIJIOq0l6clYf-az8_D_D-D7cByzqbyYt1nV4c=@achow101.com>

[-- Attachment #1.1: Type: text/plain, Size: 15797 bytes --]

hello,

we still have some concerns about the BIP as currently proposed - not
about the format or data contents, but more about strictness and
security properties. I have raised some in the previous e-mails, but
they might have been lost in the overall talk about format.

* Choosing from duplicate keys when combining.
We believe that "choose whichever value it wishes" is not a good
resolution strategy. We propose to either change this to "in case of
conflicts, software MUST reject the conflicting PSBTs", or explain in
more detail why picking at random is a safe choice.

* Signing records with unknown keys.
There's been some talk about this at start, but there should be a clear
strategy for Signers when unknown fields are encountered. We intend to
implement the rule: "will not sign an input with any unknown fields
present".
Maybe it is worth codifying this behavior in the standard, or maybe
there should be a way to mark a field as "optional" so that strict
Signers know they can _safely_ ignore the unknown field.

And two minor points:

* Fields with empty keys.
This might be inferred from the definition, but is probably worth
spelling out explicitly: If a field definition states that the key data
is empty, an implementation MUST enforce this and reject PSBTs that
contain non-empty data.
We suggest adding something to the effect of:
"If a key or value data in a field doesn't match the specified format,
the PSBT is invalid. In particular, if key data is specified as "none"
but the key contains data beyond the type specifier, implementation MUST
reject the PSBT."
(not sure about the languge, this should of course allow processing
unknown fields)

* "Combiner can detect inconsistencies"
Added in response to this comment [1], the current wording looks like
it's describing what the Combiner is _capable of_, as opposed to
prescribing what the combiner is _allowed to_ do.
We suggest changing to something like:
"For every field type that the Combiner understands, it MAY also refuse
to combine PSBTs that have inconsistencies in that field, or cause a
conflict when combined."

regards
m.

[1] https://github.com/bitcoin/bips/pull/694#discussion_r199232318

On 29.6.2018 21:12, Achow101 wrote:
> Hi,
> 
> I do not think that protobuf is the way to go for this. Not only is it another dependency
> which many wallets do not want to add (e.g. Armory has not added BIP 70 support because
> of its dependency on protobuf), but it is a more drastic change than the currently proposed
> changes. The point of this email thread isn't to rewrite and design a new BIP (which is effectively
> what is currently going on). The point is to modify and improve the current one. In particular,
> we do not want such drastic changes that people who have already implemented the current
> BIP would have to effectively rewrite everything from scratch again.
> 
> I believe that this discussion has become bikeshedding and is really no longer constructive. Neither
> of us are going to convince the other to use or not use protobuf. ASeeing how no one else
> has really participated in this discussion about protobuf and key uniqueness, I do not think
> that these suggested changes are really necessary nor useful to others. It boils down to personal preference
> rather than technical merit. As such, I have opened a PR to the BIPs repo (https://github.com/bitcoin/bips/pull/694)
> which contains the changes that I proposed in an earlier email.
> 
> Additionally, because there have been no objections to the currently proposed changes, I propose
> to move the BIP from Draft to Proposed status.
> 
> Andrew
> 
> 
> 
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> 
> On June 29, 2018 2:53 AM, matejcik via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
>> 
>>
>> Short version:
>>
>> -   I propose that conflicting "values" for the same "key" are considered
>>     
>>     invalid.
>>     
>> -   Let's not optimize for invalid data.
>> -   Given that, there's an open question on how to handle invalid data
>>     
>>     when encountered
>>     
>>     In general, I don't think it's possible to enforce correctness at the
>>     
>>     format level. You still need application level checks - and that calls
>>     
>>     into question what we gain by trying to do this on the format level.
>>     
>>     Long version:
>>     
>>     Let's look at this from a different angle.
>>     
>>     There are roughly two possible "modes" for the format with regard to
>>     
>>     possibly-conflicting data. Call them "permissive" and "restrictive".
>>     
>>     The spec says:
>>     
>>     """
>>     
>>     Keys within each scope should never be duplicated; all keys in the
>>     
>>     format are unique. PSBTs containing duplicate keys are invalid. However
>>     
>>     implementors will still need to handle events where keys are duplicated
>>     
>>     when combining transactions with duplicated fields. In this event, the
>>     
>>     software may choose whichever value it wishes.
>>     
>>     """
>>     
>>     The last sentence of this paragraph sets the mode to permissive:
>>     
>>     duplicate values are pretty much OK. If you see them, just pick one.
>>     
>>     You seem to argue that Combiners, in particular simple ones that don't
>>     
>>     understand field semantics, should merge keys permissively, but
>>     
>>     deduplicate values restrictively.
>>     
>>     IOW: if you receive two different values for the same key, just pick
>>     
>>     whichever, but $deity forbid you include both!
>>     
>>     This choice doesn't make sense to me.
>>     
>>     What would make sense is fully restrictive mode: receiving two
>>     
>>     different values for the same key is a fatal condition with no recovery.
>>     
>>     If you have a non-deterministic scheme, put a differentiator in the key.
>>     
>>     Or all the data, for that matter.
>>     
>>     (Incidentally, this puts key-aware and keyless Combiners on the same
>>     
>>     footing. As long as all participants uphold the protocol, different
>>     
>>     value = different key = different full record.)
>>     
>>     Given that, it's nice to have the Combiner perform the task of detecting
>>     
>>     this and failing. But not at all necessary. As the quoted paragraph
>>     
>>     correctly notes, consumers still need to handle PSBTs with duplicate keys.
>>     
>>     (In this context, your implied permissive/restrictive Combiner is
>>     
>>     optimized for dealing with invalid data. That seems like a wrong
>>     
>>     optimization.)
>>     
>>     A reasonable point to decide is whether the handling at the consumer
>>     
>>     should be permissive or restrictive. Personally I'm OK with either. I'd
>>     
>>     go with the following change:
>>     
>>     """
>>     
>>     In this event, the software MAY reject the transaction as invalid. If it
>>     
>>     decides to accept it, it MUST choose the last value encountered.
>>     
>>     """
>>     
>>     (deterministic way of choosing, instead of "whichever you like")
>>     
>>     We could also drop the first part, explicitly allowing consumers to
>>     
>>     pick, and simplifying the Combiner algorithm to `sort -u`.
>>     
>>     Note that this sort of "picking" will probably be implicit. I'd expect
>>     
>>     the consumer to look like this:
>>     
>>
>>     for key, value in parse(nextRecord()):
>>       data[key] = value
>>     
>>
>> Or we could drop the second part and switch MAY to MUST, for a fully
>>
>> restrictive mode - which, funnily enough, still lets the Combiner work
>>
>> as `sort -u`.
>>
>> To see why, remember that distinct values for the same key are not
>>
>> allowed in fully restrictive mode. If a Combiner encounters two
>>
>> conflicting values F(1) and F(2), it should fail -- but if it doesn't,
>>
>> it includes both and the same failure WILL happen on the fully
>>
>> restrictive consumer.
>>
>> This was (or is) my point of confusion re Combiners: the permissive key
>>
>> -   restrictive value mode of operation doesn't seem to help subsequent
>>     
>>     consumers in any way.
>>     
>>     Now, for the fully restrictive consumer, the key-value model is indeed
>>     
>>     advantageous (and this is the only scenario that I can imagine in which
>>     
>>     it is advantageous), because you can catch key duplication on the parser
>>     
>>     level.
>>     
>>     But as it turns out, it's not enough. Consider the following records:
>>     
>>     key(<PSBT_IN_REDEEM_SCRIPT> + abcde), value(<some redeem script>)
>>     
>>
>> and:
>>
>> key(<PSBT_IN_REDEEM_SCRIPT> + fghij), value(<some other redeem script>)
>>
>> A purely syntactic Combiner simply can't handle this case. The
>>
>> restrictive consumer needs to know whether the key is supposed to be
>>
>> repeating or not.
>>
>> We could fix this, e.g., by saying that repeating types must have high
>>
>> bit set and non-repeating must not. We also don't have to, because the
>>
>> worst failure here is that a consumer passes an invalid record to a
>>
>> subsequent one and the failure happens one step later.
>>
>> At this point it seems weird to be concerned about the "unique key"
>>
>> correctness, which is a very small subset of possibly invalid inputs. As
>>
>> a strict safety measure, I'd instead propose that a consumer MUST NOT
>>
>> operate on inputs or outputs, unless it understand ALL included fields -
>>
>> IOW, if you're signing a particular input, all fields in said input are
>>
>> mandatory. This prevents a situation where a simple Signer processes an
>>
>> input incorrectly based on incomplete set of fields, while still
>>
>> allowing Signers with different capabilities within the same PSBT.
>>
>> (The question here is whether to have either a flag or a reserved range
>>
>> for "optional fields" that can be safely ignored by consumers that don't
>>
>> understand them, but provide data for consumers who do.)
>>
>>>> To repeat and restate my central question: Why is it important,
>>>>
>>>> that an agent which doesn't understand a particular field
>>>>
>>>> structure, can nevertheless make decisions about its inclusion or
>>>>
>>>> omission from the result (based on a repeated prefix)?
>>>
>>> Again, because otherwise you may need a separate Combiner for each
>>>
>>> type of script involved. That would be unfortunate, and is very
>>>
>>> easily avoided.
>>
>> This is still confusing to me, and I would really like to get to the
>>
>> same page on this particular thing, because a lot of the debate hinges
>>
>> on it. I think I covered most of it above, but there are still pieces to
>>
>> clarify.
>>
>> As I understand it, the Combiner role (actually all the roles) is mostly
>>
>> an algorithm, with the implication that it can be performed
>>
>> independently by a separate agent, say a network node.
>>
>> So there's two types of Combiners:
>>
>> a) Combiner as a part of an intelligent consumer -- the usual scenario
>>
>> is a Creator/Combiner/Finalizer/Extractor being one participant, and
>>
>> Updater/Signers as other participants.
>>
>> In this case, the discussion of "simple Combiners" is actually talking
>>
>> about intelligent Combiners which don't understand new fields and must
>>
>> correctly pass them on. I argue that this can safely be done without
>>
>> loss of any important properties.
>>
>> b) Combiner as a separate service, with no understanding of semantics.
>>
>> Although parts of the debate seem to assume this scenario, I don't think
>>
>> it's worth considering. Again, do you have an usecase in mind for it?
>>
>> You also insist on enforcing a limited form of correctness on the
>>
>> Combiner level, but that is not worth it IMHO, as discussed above.
>>
>> Or am I missing something else?
>>
>>> Perhaps you want to avoid signing with keys that are already signed
>>>
>>> with? If you need to derive all the keys before even knowing what
>>>
>>> was already signed with, you've already performed 80% of the work.
>>
>> This wouldn't concern me at all, honestly. If the user sends an already
>>
>> signed PSBT to the same signer, IMHO it is OK to sign again; the
>>
>> slowdown is a fault of the user/workflow. You could argue that signing
>>
>> again is the valid response. Perhaps the Signer should even "consume"
>>
>> its keys and not pass them on after producing a signature? That seems
>>
>> like a sensible rule.
>>
>>> To your point: proto v2 afaik has no way to declare "whole record
>>>
>>> uniqueness", so either you drop that (which I think is unacceptable
>>>
>>> -   see the copy/sign/combine argument above), or you deal with it in
>>>     
>>>     your application code.
>>>     
>>
>> Yes. My argument is that "whole record uniqueness" isn't in fact an
>>
>> important property, because you need application-level checks anyway.
>>
>> Additionally, protobuf provides awareness of which fields are repeated
>>
>> and which aren't, and implicitly implements the "pick last" resolution
>>
>> strategy for duplicates.
>>
>> The simplest possible protobuf-based Combiner will:
>>
>> -   assume all fields are repeating
>> -   concatenate and parse
>> -   deduplicate and reserialize.
>>     
>>     More knowledgeable Combiner will intelligently handle non-repeating
>>     
>>     fields, but still has to assume that unknown fields are repeating and
>>     
>>     use the above algorithm.
>>     
>>     For "pick last" strategy, a consumer can simply parse the message and
>>     
>>     perform appropriate application-level checks.
>>     
>>     For "hard-fail" strategy, it must parse all fields as repeating and
>>     
>>     check that there's only one of those that are supposed to be unique.
>>     
>>     This is admittedly more work, and yes, protobuf is not perfectly suited
>>     
>>     for this task.
>>     
>>     But:
>>     
>>     One, this work must be done by hand anyway, if we go with a custom
>>     
>>     hand-parsed format. There is a protobuf implementation for every
>>     
>>     conceivable platform, we'll never have the same amount of BIP174 parsing
>>     
>>     code.
>>     
>>     (And if you're hand-writing a parser in order to avoid the dependency,
>>     
>>     you can modify it to do the checks at parser level. Note that this is
>>     
>>     not breaking the format! The modifed parser will consume well-formed
>>     
>>     protobuf and reject that which is valid protobuf but invalid bip174 - a
>>     
>>     correct behavior for a bip174 parser.)
>>     
>>     Two, it is my opinion that this is worth it in order to have a standard,
>>     
>>     well described, well studied and widely implemented format.
>>     
>>     Aside: I ha that there is no advantage to a record-set based
>>     
>>     custom format by itself, so IMHO the choice is between protobuf vs
>>     
>>     a custom key-value format. Additionally, it's even possible to implement
>>     
>>     a hand-parsable key-value format in terms of protobuf -- again, arguing
>>     
>>     that "standardness" of protobuf is valuable in itself.
>>     
>>     regards
>>     
>>     m.
>>     
>>
>> bitcoin-dev mailing list
>>
>> bitcoin-dev@lists.linuxfoundation.org
>>
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]