From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2C701A92 for ; Fri, 29 Jun 2018 09:53:41 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail.sldev.cz (mail.sldev.cz [88.208.115.66]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 061D2FD for ; Fri, 29 Jun 2018 09:53:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.sldev.cz (Postfix) with ESMTP id C9CD0E104A; Fri, 29 Jun 2018 09:53:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.sldev.cz Received: from mail.sldev.cz ([127.0.0.1]) by localhost (mail.sldev.cz [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JRS2XcsXhT8j; Fri, 29 Jun 2018 09:53:35 +0000 (UTC) Received: from [10.8.0.37] (unknown [10.8.0.37]) by mail.sldev.cz (Postfix) with ESMTPSA id 8FF7AE0F27; Fri, 29 Jun 2018 09:53:35 +0000 (UTC) From: matejcik To: Pieter Wuille References: <21a616f5-7a17-35b9-85ea-f779f20a6a2d@satoshilabs.com> <20180621195654.GC99379@coinkite.com> <881def14-696c-3207-cf6c-49f337ccf0d1@satoshilabs.com> Openpgp: preference=signencrypt Autocrypt: addr=jan.matejek@satoshilabs.com; prefer-encrypt=mutual; keydata= xsFNBFqFmMgBEADPJ8NULpuu0nwox/tIfo+slGfcXZLUEZstNoaY9QgNuILJRtoJ6xZy8rQf S7iQlkaZcrpMJYdZtkRHvndkceBxesCG8io6tsU+t2SK6AvaW0FG95a9shFM/U9/JVO/QmBi IuQzbiE2XTZ/JStyEp4zpuyJqX1o9gzS/4MBXwj7Rzk8u+fHI28h96HILC2a0mC+c2gJ7f5t o/w+vxFZmk06COK08W5+odb9I8mjs0uf7jgTUEFrfwi6oCoTFmSon7cOy/WTieClwF/vUKuJ DBAtsMh2rxh8IHyH8xpR+Ay/K6jUWqeb3P2csQqMXmquYG/qdaHjQgxyuoJFbn+nT6jNGVQZ MjpZkMrGnjLccecaXlgx/rZK6ElCZ1PDHKOTW7A1YY1/eG7TWYnVv1ehQLueAoqyyfiEutsK E5jGbR0AmNjCahpeK7dxj+8g8TXpVsH207xJ+mqOm5RYqlX4OzfVvcnoHhlRIOu85i4I9rWm 1u/pP6uJFnBCKtuhhbmXCxM6wF7W5U6EVW3yymsPmSoVoaR024vffE3L5jZSsDMRxY6fDXNm ljRnOpT3l3d+kMVdAM3CdDCgmV87fdo4PAaGDfnmufGue/Gp0RiLCe/Wsm4DgIIa5UK6DmzD q0B6i9y/GJSPUChzZ8y7fYzuyXdpk/13gV2NRsskg9oXJVd1vQARAQABzSZtYXRlamNpayA8 amFuLm1hdGVqZWtAc2F0b3NoaWxhYnMuY29tPsLBfQQTAQgAJwUCWoWYyAIbIwUJCWYBgAUL CQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRDGf7EG5O0XHoU0D/4+fTbt4KELEtnpkirDH4mQ Vt3KtKJrI/gp/3u+r6jUWMv2V9iRFMs09GAVBmE2DkXXIlfaT1P0QfwVSpHC4k5lwKwSCSyS MUgBbQGPOiYMCgMQ+in4vjlqWWcx6jjlgxQctQHRrVG5jyi7BSb0jwG8rcYtx8SAYkN4joG/ oy2zMbq6qu+Vsl+xR5WwWF2mcUUyiVo7dSwNy+1PaeygOR9xAWkM8J42ckLfJgvyLSviBKnU 9rgg94ryEDAMNUL5yJUygQmUM/jdpyBpBycRbWMB+zIYDPVGnFj4vN8Hs9DyGUHVb2OqSW+q VPxD7U9m9z6J3NnY9HpaFX1DD8leK3TebpyYaeODY5jyk7retuLrMq+W4kJU0290xzlWa9sU wa7lTWw63pelfPUKZ+mjhSFQSZBqiuNv67CBd/UmoqMWSDrCWj+3IFQxReFbh47Wl4MUX2cK cLocYkBzDck7hH4YfK6jJ++teN6RKXr7P1y6EI25WEfJxWK9say7x/FRkNW0s98MxtOuwEsm /vHqHQQanAT4R5l+Rr7XfU7fpmH0As98qD81lc3RHbrxEXgA0ks2VuCxBWsPpzaHUFPOcE9H hsg1jSEDi/Mo6D4e2ap7FYXDgZiKye9WnSdPlVBqJxqinDDgSBv5wzKaEGQS0MKrF9myS7d0 pBSy1Dr6IWOegM7BTQRahZjIARAAwwT6h4IFvs/hmY9KHiX/GIbvybQUU71ZWYRE2KKo5E2c ZXBJj7SiDtU80bS+NCSeF2c0i4xOYgZlIYMqlgS8k1zfdBt/JHmG3tm1JgohVj+pm42RfBAF d0y05zz5wysQOw1M4WlWKZH0ameM+0/AGqspeZushWay8Q4yx1dO/6MeyPy/NwE/MKEsCOPV aN28DndN3iKOyriCQt/IhG/n6ORPRGyei3JYqxsnpW36BOmSPWJ7Qj2pFw53p5coPOEDL8mN Ique0LJZ3zVFVMa4i7HtqIEnYO+ZnKx2G8aLsHEir2pzBv6tMwlgETcUTVfK1ePN7OzhYy4q a38hMWzk0db2V+gOlAu6SuAi1ANkcPhCPUWxPIvXiNdd9iwe5gOzFy0FoZxj22rFwgUX8wcc cfWStgoE1MGE9G5zrqc01R0x7by8BOFkImAwTyJ9vq4jG+w7Npky3PhoHPgCT5knV7Q91U2I TqPOQBcMda0B+4LOaElb1sXqe44dHwcg4dMVngaea5xL7winSqU2Gtm6pqFAGut5F7JiYhPb dGUHJPMS67ONkKe5ARu/Z/r9XoFe2TxpkvNJ/+QJQ3PCiJ6ya31ij6HOIfFbZr3xlTyU/DvG SejIvDK/SnJMw+/x60bYAshYBp0uQgih1ugtoZh7cnKj3KfhlpXT0mL8rsl1QHsAEQEAAcLB ZQQYAQgADwUCWoWYyAIbDAUJCWYBgAAKCRDGf7EG5O0XHs2xD/92sa5L6gafP/rRKfo9u3/w s+7E/kKPgG4VGDeirLo8hbinCjPr0cfZ7OgDDvp0zy6lTdZc2tcHsEbiPqblzaSZimV5Y3EQ eIzz0UhY6YdDELr8pvdnB8qnOJHXgWmZTRYkRgxFOWI3v4STmOYZQ7MFv0kHBfV3htCjYTHS Qx2jQO4CTbcSEbkVwNv56OiZroabrHRf0WUSyzElf13P/MRFjUJFYYZDqc0iOWUh4QeXbFiY fLYpOCtm0nqaDdG1VD4jMpKq1FKBvTw4id1i7pONENd4BB7ytnDvKGdVI6oDnGUBsc5VUrEa h1PbbshNMbRtFigeMe8998jWhK4jQzeuDr0FSBlhxbluGfyMUgk7s6aBC9BOsdDkgtJk1Fd/ j9sWOj8Pxzc4lMQRfygm+QxxLdqa36Qh3oK+jfK7362CXlqBfb9ryerjfFGY4VqMBzQ+BFtj lYZSdVzGWlmLD9D88wzeByIZMScQPvrXSFwPO2/TuOQNCo0VHcgHpNFzeMRK2eT8bhry+dlq U+0Kxy2gQijw9j/EZlqR3w053EwUrfAAmHHeYPimXK4pc8oSw0s1A6hQO7Vc0SgblF8taFTM UhRR7xZg+l5vybAgrDYVL75b9CDscZqd7WVmZx+xU23sUG6SaxXI7PV6bPuMug0fD3SAsieu +vypQ3jCcUKGrA== Message-ID: <95137ba3-1662-b75d-e55f-893d64c76059@satoshilabs.com> Date: Fri, 29 Jun 2018 11:53:34 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="SCCDbFKhFDRU3UL6P4RMlyEGUSW7sjI8N" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Mailman-Approved-At: Fri, 29 Jun 2018 10:02:47 +0000 Cc: Bitcoin Dev Subject: [bitcoin-dev] BIP 174 thoughts X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jun 2018 09:53:41 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --SCCDbFKhFDRU3UL6P4RMlyEGUSW7sjI8N Content-Type: multipart/mixed; boundary="BYMUlZvOvbZzb9nj7yJOz7VkFyeI5dvmT"; protected-headers="v1" From: matejcik To: Pieter Wuille Cc: Bitcoin Dev Message-ID: <95137ba3-1662-b75d-e55f-893d64c76059@satoshilabs.com> Subject: [bitcoin-dev] BIP 174 thoughts References: <21a616f5-7a17-35b9-85ea-f779f20a6a2d@satoshilabs.com> <20180621195654.GC99379@coinkite.com> <881def14-696c-3207-cf6c-49f337ccf0d1@satoshilabs.com> In-Reply-To: --BYMUlZvOvbZzb9nj7yJOz7VkFyeI5dvmT Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Short version: - I propose that conflicting "values" for the same "key" are considered invalid. - Let's not optimize for invalid data. - Given that, there's an open question on how to handle invalid data when encountered In general, I don't think it's possible to enforce correctness at the format level. You still need application level checks - and that calls into question what we gain by trying to do this on the format level. Long version: Let's look at this from a different angle. There are roughly two possible "modes" for the format with regard to possibly-conflicting data. Call them "permissive" and "restrictive". The spec says: """ Keys within each scope should never be duplicated; all keys in the format are unique. PSBTs containing duplicate keys are invalid. However implementors will still need to handle events where keys are duplicated when combining transactions with duplicated fields. In this event, the software may choose whichever value it wishes. """ The last sentence of this paragraph sets the mode to permissive: duplicate values are pretty much OK. If you see them, just pick one. You seem to argue that Combiners, in particular simple ones that don't understand field semantics, should merge _keys_ permissively, but deduplicate _values_ restrictively. IOW: if you receive two different values for the same key, just pick whichever, but $deity forbid you include both! This choice doesn't make sense to me. What _would_ make sense is fully restrictive mode: receiving two different values for the same key is a fatal condition with no recovery. If you have a non-deterministic scheme, put a differentiator in the key. Or all the data, for that matter. (Incidentally, this puts key-aware and keyless Combiners on the same footing. As long as all participants uphold the protocol, different value =3D different key =3D different full record.) Given that, it's nice to have the Combiner perform the task of detecting this and failing. But not at all necessary. As the quoted paragraph correctly notes, consumers *still need to handle* PSBTs with duplicate ke= ys. (In this context, your implied permissive/restrictive Combiner is optimized for dealing with invalid data. That seems like a wrong optimization.) A reasonable point to decide is whether the handling at the consumer should be permissive or restrictive. Personally I'm OK with either. I'd go with the following change: """ In this event, the software MAY reject the transaction as invalid. If it decides to accept it, it MUST choose the last value encountered. """ (deterministic way of choosing, instead of "whichever you like") We could also drop the first part, explicitly allowing consumers to pick, and simplifying the Combiner algorithm to `sort -u`. Note that this sort of "picking" will probably be implicit. I'd expect the consumer to look like this: ``` for key, value in parse(nextRecord()): data[key] =3D value ``` Or we could drop the second part and switch MAY to MUST, for a fully restrictive mode - which, funnily enough, still lets the Combiner work as `sort -u`. To see why, remember that distinct values for the same key are not allowed in fully restrictive mode. If a Combiner encounters two conflicting values F(1) and F(2), it should fail -- but if it doesn't, it includes both and the same failure WILL happen on the fully restrictive consumer. This was (or is) my point of confusion re Combiners: the permissive key + restrictive value mode of operation doesn't seem to help subsequent consumers in any way. Now, for the fully restrictive consumer, the key-value model is indeed advantageous (and this is the only scenario that I can imagine in which it is advantageous), because you can catch key duplication on the parser level. But as it turns out, it's not enough. Consider the following records: key( + abcde), value() and: key( + fghij), value() A purely syntactic Combiner simply can't handle this case. The restrictive consumer needs to know whether the key is supposed to be repeating or not. We could fix this, e.g., by saying that repeating types must have high bit set and non-repeating must not. We also don't have to, because the worst failure here is that a consumer passes an invalid record to a subsequent one and the failure happens one step later. At this point it seems weird to be concerned about the "unique key" correctness, which is a very small subset of possibly invalid inputs. As a strict safety measure, I'd instead propose that a consumer MUST NOT operate on inputs or outputs, unless it understand ALL included fields - IOW, if you're signing a particular input, all fields in said input are mandatory. This prevents a situation where a simple Signer processes an input incorrectly based on incomplete set of fields, while still allowing Signers with different capabilities within the same PSBT. (The question here is whether to have either a flag or a reserved range for "optional fields" that can be safely ignored by consumers that don't understand them, but provide data for consumers who do.) >> To repeat and restate my central question: Why is it important,=20 >> that an agent which doesn't understand a particular field=20 >> structure, can nevertheless make decisions about its inclusion or=20 >> omission from the result (based on a repeated prefix)? >>=20 >=20 > Again, because otherwise you may need a separate Combiner for each=20 > type of script involved. That would be unfortunate, and is very=20 > easily avoided. This is still confusing to me, and I would really like to get to the same page on this particular thing, because a lot of the debate hinges on it. I think I covered most of it above, but there are still pieces to clarify. As I understand it, the Combiner role (actually all the roles) is mostly an algorithm, with the implication that it can be performed independently by a separate agent, say a network node. So there's two types of Combiners: a) Combiner as a part of an intelligent consumer -- the usual scenario is a Creator/Combiner/Finalizer/Extractor being one participant, and Updater/Signers as other participants. In this case, the discussion of "simple Combiners" is actually talking about intelligent Combiners which don't understand new fields and must correctly pass them on. I argue that this can safely be done without loss of any important properties. b) Combiner as a separate service, with no understanding of semantics. Although parts of the debate seem to assume this scenario, I don't think it's worth considering. Again, do you have an usecase in mind for it? You also insist on enforcing a limited form of correctness on the Combiner level, but that is not worth it IMHO, as discussed above. Or am I missing something else? > Perhaps you want to avoid signing with keys that are already signed=20 > with? If you need to derive all the keys before even knowing what > was already signed with, you've already performed 80% of the work. This wouldn't concern me at all, honestly. If the user sends an already signed PSBT to the same signer, IMHO it is OK to sign again; the slowdown is a fault of the user/workflow. You could argue that signing again is the valid response. Perhaps the Signer should even "consume" its keys and not pass them on after producing a signature? That seems like a sensible rule. > To your point: proto v2 afaik has no way to declare "whole record=20 > uniqueness", so either you drop that (which I think is unacceptable > - see the copy/sign/combine argument above), or you deal with it in=20 > your application code. Yes. My argument is that "whole record uniqueness" isn't in fact an important property, because you need application-level checks anyway. Additionally, protobuf provides awareness of which fields are repeated and which aren't, and implicitly implements the "pick last" resolution strategy for duplicates. The simplest possible protobuf-based Combiner will: - assume all fields are repeating - concatenate and parse - deduplicate and reserialize. More knowledgeable Combiner will intelligently handle non-repeating fields, but still has to assume that unknown fields are repeating and use the above algorithm. For "pick last" strategy, a consumer can simply parse the message and perform appropriate application-level checks. For "hard-fail" strategy, it must parse all fields as repeating and check that there's only one of those that are supposed to be unique. This is admittedly more work, and yes, protobuf is not perfectly suited for this task. But: One, this work must be done by hand anyway, if we go with a custom hand-parsed format. There is a protobuf implementation for every conceivable platform, we'll never have the same amount of BIP174 parsing code. (And if you're hand-writing a parser in order to avoid the dependency, you can modify it to do the checks at parser level. Note that this is not breaking the format! The modifed parser will consume well-formed protobuf and reject that which is valid protobuf but invalid bip174 - a correct behavior for a bip174 parser.) Two, it is my opinion that this is worth it in order to have a standard, well described, well studied and widely implemented format. Aside: I ha that there is no advantage to a record-set based custom format by itself, so IMHO the choice is between protobuf vs a custom key-value format. Additionally, it's even possible to implement a hand-parsable key-value format in terms of protobuf -- again, arguing that "standardness" of protobuf is valuable in itself. regards m. --BYMUlZvOvbZzb9nj7yJOz7VkFyeI5dvmT-- --SCCDbFKhFDRU3UL6P4RMlyEGUSW7sjI8N Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJbNgGeAAoJEMZ/sQbk7RceKn0QAIeaZx3GRHfWro4LJYeVgxOs XUxi4VrXiZDxt756VvCxWSxtZw8Kiw4GfnFtWtrTKfcAHlbsyVcEW0YYqvXWu6r1 VbnP8V7dPeV9NaelrdhAcjX2dp/TpgENOdKSKe/nllq7AFEsJRo2F6qgxAR+4ajk Kb4bpI7XkPRv+R/ufXHeCmww8f4iy3TLNO2fUYgs8ns8X61jlQD5KgLIx1m3nzE/ J9UmIwnocRJBlhsvB6XsqewvO1HT87m8/28hguQWQfQpeQeWVLz0cPN75h+tZSlv 3MNlNZ0CYB26woARS2ZaLb65aZJ9l+3HAmfP0qStZtzht92tUYKyyQsPT1jTO7/+ ceCZ3jmH9O79QqpwzDh1uVA9uh8K9XiFOyphkxxnXtlBPrS7R3cjyC9rGuLLuTkP qPKgPGvf5yEuMZ+qIVZYQNo4BF04Q+euj64B94kzXsOGOxdmeiBGpryJsalr9a44 eKFBJCVPMrZKqJA4l5ynuQgXq7a++pVm1LXdj4KEsAgqe9DBDg1U16PukucyccgE D8eZn/DoQvt+fOr23AMOjlxNhJUdAStz8Jy5F4Ds9YXysguX8vsv80O2kTjR0F0E 4X37VOvLbEPaHxlJTfF80+SueMsZoJ6S7ctoIrgDqN+Ky9AOUUmhpDK37YgxktU/ HyvjqTbcQZkPM+kni/46 =PkEJ -----END PGP SIGNATURE----- --SCCDbFKhFDRU3UL6P4RMlyEGUSW7sjI8N--