From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 87B082C for ; Fri, 29 Jun 2018 19:12:35 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from mail2.protonmail.ch (mail2.protonmail.ch [185.70.40.22]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 46510787 for ; Fri, 29 Jun 2018 19:12:34 +0000 (UTC) Date: Fri, 29 Jun 2018 15:12:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=achow101.com; s=protonmail; t=1530299549; bh=VFsRyJDdWw4iIOwu2NQSvNWR+SQM4hI4aeDhz8TFczc=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References: Feedback-ID:From; b=f2bJp3tg1qBzJ6WwnMchMOHSkmBC/UUli8KvG8htw6QtPC+c3F3nYbhboy93SzY0A xys1GwrCBAD3FNffH01H1tWUg9ds4qhSxkH+Cp9LJ5L1ANi47H03NmXMlXd3E8zvlU aCD8aCdbe7qKxiqlqn8Bs4Xz4TIADMWN6k6W8qrA= To: matejcik , Bitcoin Protocol Discussion From: Achow101 Reply-To: Achow101 Message-ID: In-Reply-To: <95137ba3-1662-b75d-e55f-893d64c76059@satoshilabs.com> References: <881def14-696c-3207-cf6c-49f337ccf0d1@satoshilabs.com> <95137ba3-1662-b75d-e55f-893d64c76059@satoshilabs.com> Feedback-ID: VjS95yl5HLFwBfNLRqi61OdL1ERZPmvMbZRH2ZcBR7SKVUVYPgv7VJsV9uoyC4vIfjYnW8hPXGuLTycZbh49Zw==:Ext:ProtonMail MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] BIP 174 thoughts X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jun 2018 19:12:35 -0000 Hi, I do not think that protobuf is the way to go for this. Not only is it anot= her dependency which many wallets do not want to add (e.g. Armory has not added BIP 70 sup= port because of its dependency on protobuf), but it is a more drastic change than the cu= rrently proposed changes. The point of this email thread isn't to rewrite and design a new B= IP (which is effectively what is currently going on). The point is to modify and improve the current= one. In particular, we do not want such drastic changes that people who have already implemente= d the current BIP would have to effectively rewrite everything from scratch again. I believe that this discussion has become bikeshedding and is really no lon= ger constructive. Neither of us are going to convince the other to use or not use protobuf. ASeeing h= ow no one else has really participated in this discussion about protobuf and key uniquenes= s, I do not think that these suggested changes are really necessary nor useful to others. It = boils down to personal preference rather than technical merit. As such, I have opened a PR to the BIPs repo (= https://github.com/bitcoin/bips/pull/694) which contains the changes that I proposed in an earlier email. Additionally, because there have been no objections to the currently propos= ed changes, I propose to move the BIP from Draft to Proposed status. Andrew =E2=80=8B=E2=80=8B =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On June 29, 2018 2:53 AM, matejcik via bitcoin-dev wrote: > =E2=80=8B=E2=80=8B >=20 > Short version: >=20 > - I propose that conflicting "values" for the same "key" are considered > =20 > invalid. > =20 > - Let's not optimize for invalid data. > - Given that, there's an open question on how to handle invalid data > =20 > when encountered > =20 > In general, I don't think it's possible to enforce correctness at the > =20 > format level. You still need application level checks - and that call= s > =20 > into question what we gain by trying to do this on the format level. > =20 > Long version: > =20 > Let's look at this from a different angle. > =20 > There are roughly two possible "modes" for the format with regard to > =20 > possibly-conflicting data. Call them "permissive" and "restrictive". > =20 > The spec says: > =20 > """ > =20 > Keys within each scope should never be duplicated; all keys in the > =20 > format are unique. PSBTs containing duplicate keys are invalid. Howev= er > =20 > implementors will still need to handle events where keys are duplicat= ed > =20 > when combining transactions with duplicated fields. In this event, th= e > =20 > software may choose whichever value it wishes. > =20 > """ > =20 > The last sentence of this paragraph sets the mode to permissive: > =20 > duplicate values are pretty much OK. If you see them, just pick one. > =20 > You seem to argue that Combiners, in particular simple ones that don'= t > =20 > understand field semantics, should merge keys permissively, but > =20 > deduplicate values restrictively. > =20 > IOW: if you receive two different values for the same key, just pick > =20 > whichever, but $deity forbid you include both! > =20 > This choice doesn't make sense to me. > =20 > What would make sense is fully restrictive mode: receiving two > =20 > different values for the same key is a fatal condition with no recove= ry. > =20 > If you have a non-deterministic scheme, put a differentiator in the k= ey. > =20 > Or all the data, for that matter. > =20 > (Incidentally, this puts key-aware and keyless Combiners on the same > =20 > footing. As long as all participants uphold the protocol, different > =20 > value =3D different key =3D different full record.) > =20 > Given that, it's nice to have the Combiner perform the task of detect= ing > =20 > this and failing. But not at all necessary. As the quoted paragraph > =20 > correctly notes, consumers still need to handle PSBTs with duplicate = keys. > =20 > (In this context, your implied permissive/restrictive Combiner is > =20 > optimized for dealing with invalid data. That seems like a wrong > =20 > optimization.) > =20 > A reasonable point to decide is whether the handling at the consumer > =20 > should be permissive or restrictive. Personally I'm OK with either. I= 'd > =20 > go with the following change: > =20 > """ > =20 > In this event, the software MAY reject the transaction as invalid. If= it > =20 > decides to accept it, it MUST choose the last value encountered. > =20 > """ > =20 > (deterministic way of choosing, instead of "whichever you like") > =20 > We could also drop the first part, explicitly allowing consumers to > =20 > pick, and simplifying the Combiner algorithm to `sort -u`. > =20 > Note that this sort of "picking" will probably be implicit. I'd expec= t > =20 > the consumer to look like this: > =20 >=20 > for key, value in parse(nextRecord()): > data[key] =3D value > =20 >=20 > Or we could drop the second part and switch MAY to MUST, for a fully >=20 > restrictive mode - which, funnily enough, still lets the Combiner work >=20 > as `sort -u`. >=20 > To see why, remember that distinct values for the same key are not >=20 > allowed in fully restrictive mode. If a Combiner encounters two >=20 > conflicting values F(1) and F(2), it should fail -- but if it doesn't, >=20 > it includes both and the same failure WILL happen on the fully >=20 > restrictive consumer. >=20 > This was (or is) my point of confusion re Combiners: the permissive key >=20 > - restrictive value mode of operation doesn't seem to help subsequent > =20 > consumers in any way. > =20 > Now, for the fully restrictive consumer, the key-value model is indee= d > =20 > advantageous (and this is the only scenario that I can imagine in whi= ch > =20 > it is advantageous), because you can catch key duplication on the par= ser > =20 > level. > =20 > But as it turns out, it's not enough. Consider the following records: > =20 > key( + abcde), value() > =20 >=20 > and: >=20 > key( + fghij), value() >=20 > A purely syntactic Combiner simply can't handle this case. The >=20 > restrictive consumer needs to know whether the key is supposed to be >=20 > repeating or not. >=20 > We could fix this, e.g., by saying that repeating types must have high >=20 > bit set and non-repeating must not. We also don't have to, because the >=20 > worst failure here is that a consumer passes an invalid record to a >=20 > subsequent one and the failure happens one step later. >=20 > At this point it seems weird to be concerned about the "unique key" >=20 > correctness, which is a very small subset of possibly invalid inputs. As >=20 > a strict safety measure, I'd instead propose that a consumer MUST NOT >=20 > operate on inputs or outputs, unless it understand ALL included fields - >=20 > IOW, if you're signing a particular input, all fields in said input are >=20 > mandatory. This prevents a situation where a simple Signer processes an >=20 > input incorrectly based on incomplete set of fields, while still >=20 > allowing Signers with different capabilities within the same PSBT. >=20 > (The question here is whether to have either a flag or a reserved range >=20 > for "optional fields" that can be safely ignored by consumers that don't >=20 > understand them, but provide data for consumers who do.) >=20 > > > To repeat and restate my central question: Why is it important, > > >=20 > > > that an agent which doesn't understand a particular field > > >=20 > > > structure, can nevertheless make decisions about its inclusion or > > >=20 > > > omission from the result (based on a repeated prefix)? > >=20 > > Again, because otherwise you may need a separate Combiner for each > >=20 > > type of script involved. That would be unfortunate, and is very > >=20 > > easily avoided. >=20 > This is still confusing to me, and I would really like to get to the >=20 > same page on this particular thing, because a lot of the debate hinges >=20 > on it. I think I covered most of it above, but there are still pieces to >=20 > clarify. >=20 > As I understand it, the Combiner role (actually all the roles) is mostly >=20 > an algorithm, with the implication that it can be performed >=20 > independently by a separate agent, say a network node. >=20 > So there's two types of Combiners: >=20 > a) Combiner as a part of an intelligent consumer -- the usual scenario >=20 > is a Creator/Combiner/Finalizer/Extractor being one participant, and >=20 > Updater/Signers as other participants. >=20 > In this case, the discussion of "simple Combiners" is actually talking >=20 > about intelligent Combiners which don't understand new fields and must >=20 > correctly pass them on. I argue that this can safely be done without >=20 > loss of any important properties. >=20 > b) Combiner as a separate service, with no understanding of semantics. >=20 > Although parts of the debate seem to assume this scenario, I don't think >=20 > it's worth considering. Again, do you have an usecase in mind for it? >=20 > You also insist on enforcing a limited form of correctness on the >=20 > Combiner level, but that is not worth it IMHO, as discussed above. >=20 > Or am I missing something else? >=20 > > Perhaps you want to avoid signing with keys that are already signed > >=20 > > with? If you need to derive all the keys before even knowing what > >=20 > > was already signed with, you've already performed 80% of the work. >=20 > This wouldn't concern me at all, honestly. If the user sends an already >=20 > signed PSBT to the same signer, IMHO it is OK to sign again; the >=20 > slowdown is a fault of the user/workflow. You could argue that signing >=20 > again is the valid response. Perhaps the Signer should even "consume" >=20 > its keys and not pass them on after producing a signature? That seems >=20 > like a sensible rule. >=20 > > To your point: proto v2 afaik has no way to declare "whole record > >=20 > > uniqueness", so either you drop that (which I think is unacceptable > >=20 > > - see the copy/sign/combine argument above), or you deal with it in > > =20 > > your application code. > > =20 >=20 > Yes. My argument is that "whole record uniqueness" isn't in fact an >=20 > important property, because you need application-level checks anyway. >=20 > Additionally, protobuf provides awareness of which fields are repeated >=20 > and which aren't, and implicitly implements the "pick last" resolution >=20 > strategy for duplicates. >=20 > The simplest possible protobuf-based Combiner will: >=20 > - assume all fields are repeating > - concatenate and parse > - deduplicate and reserialize. > =20 > More knowledgeable Combiner will intelligently handle non-repeating > =20 > fields, but still has to assume that unknown fields are repeating and > =20 > use the above algorithm. > =20 > For "pick last" strategy, a consumer can simply parse the message and > =20 > perform appropriate application-level checks. > =20 > For "hard-fail" strategy, it must parse all fields as repeating and > =20 > check that there's only one of those that are supposed to be unique. > =20 > This is admittedly more work, and yes, protobuf is not perfectly suit= ed > =20 > for this task. > =20 > But: > =20 > One, this work must be done by hand anyway, if we go with a custom > =20 > hand-parsed format. There is a protobuf implementation for every > =20 > conceivable platform, we'll never have the same amount of BIP174 pars= ing > =20 > code. > =20 > (And if you're hand-writing a parser in order to avoid the dependency= , > =20 > you can modify it to do the checks at parser level. Note that this is > =20 > not breaking the format! The modifed parser will consume well-formed > =20 > protobuf and reject that which is valid protobuf but invalid bip174 -= a > =20 > correct behavior for a bip174 parser.) > =20 > Two, it is my opinion that this is worth it in order to have a standa= rd, > =20 > well described, well studied and widely implemented format. > =20 > Aside: I ha that there is no advantage to a record-set based > =20 > custom format by itself, so IMHO the choice is between protobuf vs > =20 > a custom key-value format. Additionally, it's even possible to implem= ent > =20 > a hand-parsable key-value format in terms of protobuf -- again, argui= ng > =20 > that "standardness" of protobuf is valuable in itself. > =20 > regards > =20 > m. > =20 >=20 > bitcoin-dev mailing list >=20 > bitcoin-dev@lists.linuxfoundation.org >=20 > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev