* [Bitcoin-development] BIP proposal: Authenticated prefix trees @ 2013-12-20 1:47 Mark Friedenbach 2013-12-20 6:48 ` Jeremy Spilman ` (3 more replies) 0 siblings, 4 replies; 14+ messages in thread From: Mark Friedenbach @ 2013-12-20 1:47 UTC (permalink / raw) To: Bitcoin Dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello fellow bitcoin developers. Included below is the first draft of a BIP for a new Merkle-compressed data structure. The need for this data structure arose out of the misnamed "Ultimate blockchain compression" project, but it has since been recognized to have many other applications. In addition to this BIP I am preparing three additional BIPs describing the use of this data structure in stateless validation & mining, the UBC address index for "SPV+" operating modes, document timestamping and merged mining. A Python implementation of this data structure is available here: https://github.com/monetizeio/python-bitcoin A C++ implementation is being worked on. As per the BIP-1 procedure, I am submitting this rough draft to the community for discussion. I welcome all comments and criticisms of both form and content. - -Mark ==Abstract== This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle hash tree] variant of the [http://en.wikipedia.org/wiki/Trie prefix-tree data structure], ideally suited for encoding key-value indices which support memory-efficient proofs. ==Motivation== There are a number of applications which would benefit from having a data structure with the following properties: * '''Arbitrary mapping of keys to values.''' A ''key'' can be any bytestring, and its ''value'' any other bytestring. * '''Duplicate keys disallowed.''' Every key has one, and only one value associated with it. Some applications demand assurance that no key value is reused, and that this constraint can be checked without requiring access to the entire data structure. * '''Efficient look-up by key.''' The data structure should support sub-linear lookup operations with respect to the number of keys in the mapping. Logarithmic time or linear with respect to the length of the key should be achievable and would be sufficient for realistic applications. * '''Merkle compression of mapping structure.''' It should be possible to produce a reduced description of the tree consisting of a single root hash value which is deterministically calculated from the mapping structure. * '''Efficient proofs of inclusion.''' It should be possible to extract a proof of key/value mapping which is limited in size and verification time by the length of the key in the worst case. * '''Computation of updates using local information.''' Given a set of inclusion proofs, it should be possible to calculate adjustments to the local mapping structure (update or deletion of included mappings, or insertion between two included mappings which are adjacent in the global structure). Such applications include committed validation indices which enable stateless mining nodes, committed wallet indices which enable trust-less querying of the unspent transaction output set by <code>scriptPubKey</code>, efficient document time-stamping, and secure & efficient merged mining. This BIP describes an authenticated prefix tree which has the above properties, but leaves the myriad applications to be formalized in future BIPs. ==Data structure== This BIP defines a binary prefix tree. Such a structure provides a mapping of bitstrings (the ''keys'') to bytestrings (the ''values''). It is an acyclic binary tree which implicitly encodes keys within the traversal path -- a "left" branch is a 0, and a "right" branch is a 1. Each node is reachable by only one unique path, and reading off the branches taken (0 for each left, 1 for each right) as one follows the path from root to target yields the node's key. The particular binary prefix tree defined by this BIP is a hybrid PATRICIA / de la Brandais tree structure. [http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a long sequence of non-branching nodes into a single interior node with a per-branch ''skip prefix''. This achieves significant savings in storage space, root hash calculation, and traversal time. A de la Brandais trie achieves compression by only storing branches actually taken in a node. The space savings are minimal for a binary tree, but place the serialized size of a non-branching interior node under the SHA-256 block size, thereby reducing the number of hash operations required to perform updates and validate proofs. This BIP describes the authenticated prefix tree and its many variations in terms of its serialized representation. Additional BIPs describe the application of authenticated prefix trees to such applications as committed indices, document time-stamping, and merged mining. ==Serialization format== As a hierarchical structure, the serialization of an entire tree is the serialization of its root node. A serialized node is the concatenation of five structures: node := flags || VARCHAR(extra) || value || left || right The <code>flags</code> is a single byte field whose composite values determine the bytes that follow. flags = (left_flags << 0) | (right_flags << 2) | (has_value << 4) | (prune_left << 5) | (prune_right << 6) | (prune_value << 7) The <code>left_flags</code> and <code>right_flags</code> are special 2-bit enumeration fields. A value of 0 indicates that the node does not branch in this direction, and the corresponding <code>left</code> or <code>right</code> branch is missing (replaced with the empty string in the node serialization). A value of 1 indicates a single bit key prefix for this branch, implicitly 0 for <code>left</code> and 1 for <code>right</code>. A 2 indicates up to 7 bits of additional skip prefix (beyond the implicit first bit, making 8 bits total) are stored in a compact single-byte format. A 3 indicates a skip prefix with greater than 7 additional bits, stored length-prefix encoded. The single bit <code>has_value</code> indicates whether the node stores a data bytestring, the value associated with its key prefix. Since keys may be any value or length, including one key being a prefix of another, it is possible for interior nodes in addition to leaf nodes to have values associated with them, and therefore an explicit value-existence bit is required. The remaining three bits are used for proof extraction, and are masked away prior to hash operations. <code>prune_left</code> indicates that the entire left branch has been pruned. <code>prune_right</code> has similar meaning for the right branch. If <code>has_value</code> is set, <code>prune_value</code> may be set to exclude the node's value from encoded proof. This is necessary field for interior nodes, since it is possible that their values may be pruned while their children are not. The <code>value</code> field is only present if the bit <code>flags.has_value</code> is set, in which case it is a <code>VARCHAR</code> bytestring: switch flags.has_value: case 0: value := ε case 1: value := VARCHAR(node.value) The <code>extra</code> field is always present, and takes on a bytestring value defined by the particular application. Use of the <code>extra</code> field is application dependent, and will not be covered in this specification. It can be set to the empty bytestring (serialized as a single zero byte) if the application has no use for the <code>extra</code> field. value := VARCHAR(calculate_extra(node)) The <code>left</code> and <code>right</code> non-terminals are only present if the corresponding <code>flags.left_flags</code> or <code>flags.right_flags</code> are non-zero. The format depends on the value of this flags setting: switch branch_flags: case 0: branch := ε case 1: branch := branch_node_or_hash case 2: prefix = prefix >> 1 branch := int_to_byte(1 << len(prefix) | bits_to_int(prefix)) || branch_node_or_hash case 3: prefix = prefix >> 1 branch := VARINT(len(prefix) - 9) || bits_to_string(prefix) || branch_node_or_hash <code>branch_flags</code> is a stand-in meant to describe either <code>left_flags</code> or <code>right_flags</code>, and likewise everywhere else in the above pseudocode <code>branch</code> can be replaced with either <code>left</code> or <code>right</code>. <code>prefix</code> is the key bits between the current node and the next branching, terminal, and/or leaf node, including the implicit leading bit for the branch (0 for the left branch, 1 for the right branch). In the above code, <code>len(prefix)</code> returns the number of bits in the bitstring, and <code>prefix >> 1</code> drops the first bit reducing the size of the bitstring by one and renumbering the indices accordingly. The function <code>int_to_byte</code> takes an integer in the range [0, 255] and returns the octet representing that value. This is a NOP in many languages, but present in this pseudocode so as to be explicit about what is going on. The function <code>bits_to_int</code> interprets a sequence of bits as a little-endian integer value. This is analogous to the following pseudocode: def bits_to_int(bits): result = 0 for idx in 1..len(bits): if bits[idx] == 1: result |= 1<<idx The function <code>bits_to_string</code> serializes a sequence of bits into a binary string. It uses little-endian bit and byte order, as demonstrated by the following pseudocode: def bits_to_string(bits): bytes = [0] * ceil(len(bits) / 8) for idx in 1..len(bits): if bits[idx] == 1: bytes[idx / 8] |= 1 << idx % 8 return map(int_to_byte, bytes) <code>branch_node_or_hash</code> is either the serialized child node or its SHA-256 hash and associated meta-data. Context determines which value to use: during digest calculations, disk/database serialization, and when the branch is pruned the hash value is used and serialized in the same way as other SHA-256 values in the bitcoin protocol (note however that it is single-SHA-256, not the double-SHA-256 more commonly used in bitcoin). The number of terminal (value-containing) nodes and the serialized size in bytes of the fully unpruned branch are suffixed to the branch hash. When serializing a proof or snapshotting tree state and the branch is not pruned, the serialized child node is included directly and the count and size are omitted as they can be derived from the serialization. if branch_pruned or SER_HASH: branch_node_or_hash := SHA-256(branch) || count(branch) || size(branch) else: branch_node_or_hash := serialize(branch) As an example, here is the serialization of a prefix tree mapping the names men and women of science to the year of their greatest publication: >>> dict = AuthTree() >>> dict['Curie'] = VARINT(1898) >>> dict('Einstein') = VARINT(1905) >>> dict['Fleming'] = VARINT(1928) >>> dict['中本'] = VARINT(2009) >>> dict.serialize() # An bytestring, broken out into parts: # . Root node: 0x0e # left_flags: 2, right_flags: 3, has_value: 1 0x00 # extra: ε # .l Inner node: 0b01000 0x11 # 0b01000 0x07 # left_flags: 3, right_flags: 1 0x00 # extra: ε # .l.l Inner node: 0b01000011 0b01110101 0b01110010 0b01101001 # 'C' 'u' 'r' 'i' # 0b01100101 # 'e' 0x1abb3a599a02 # 0b01101110101011100100110100101100101 0x10 # has_value: 1 0x00 # extra: ε 0x03fd6a07 # value: VARINT(1911) # .l.r Inner node: 0b010001 0x0f # left_flags: 3, right_flags: 3 0x00 # extra: ε # .l.r.l Inner node: 0b01000101 0b01101001 0b01101110 0b01110011 # 'E' 'i' 'n' 's' # 0b01110100 0b01100101 0b01101001 0b01101110 # 't' 'e' 'i' 'n' 0x312ded9c5d4c2ded00 # 0b1011010010110111 # 0b0011100110111010 # 0b0011001010110100 # 0b101101110 0x10 # has_value: 1 0x00 # extra: ε 0x03fd7107 # value: VARINT(1905) # .l.r.r Inner node: 0b01000110 0b01101100 0b01100101 0b01101101 # 'F' 'l' 'e' 'm' # 0b01101001 0b01101110 0b01100111 # 'i' 'n' 'g' 0x296c4c6d2dedcc01 # 0b0011011000110010 # 0b1011011010110100 # 0b10110111001100111 0x10 # has_value: 1 0x00 # extra: ε 0x03fd8807 # value: VARINT(1928) # .r Inner node: 0b11100100 0b10111000 0b10101101 # '中' # 0b11100110 0b10011100 0b10101100 # '本' 0x27938edab39c1a # 0b1100100101110001 # 0b0101101111001101 # 0b001110010101100 0x10 # has_value: 1 0x00 # extra: ε 0x03fdd907 # value: VARINT(2009) ==Hashing== There are two variations of the authenticated prefix tree presented in this draft BIP. They differ only in the way in which hash values of a node and its left/right branches are constructed. The variations, discussed below, tradeoff computational resources for the ability to compose operational proofs. Whether the performance hit is significant, and whether or not the added features are worth the tradeoff depends very much on the application. ===Variation 1: Level-compressed hashing=== In this variation the referenced child node's hash is used in construction of an interior node's hash digest. The interior node is serialized just as described (using the child node's digest instead of inline serialization), the resulting bytestring is passed through one round of SHA-256, and the digest that comes out of that is the hash value of the node. This is very efficient to calculate, requiring the absolute minimum number of SHA-256 hash operations, and achieving level-compression of computational resources in addition to reduction of space usage. For example: >>> dict = AuthTree() >>> dict['a'] = 0xff >>> dict.serialize() 0x0200c3100001ff >>> dict.root AuthTreeNode( left_prefix = 0b01100001, left_hash = 0xbafa0e2bba3396c5e9804b6cbe61be82bc442c1121aed81f8d5de36e9b20dc2f, left_count = 1, left_size = 4) >>> dict.hash 0xb4837376022a7c9ddaa7d685ad183bcbd5d16c362b81fa293a7b9e911766cf3c Assuming uniform distribution of key values, level-compressed hashing has time-complexity logarithmic with respect to the number of keys in the prefix tree. The disadvantage is that it is not possible in general to "rebase" an operational proof on top of a sibling, particularly if that sibling deletes branches that result in reorganization and level compression of internal nodes used by the rebased proof. ===Variation 2: Proof-updatable hashing=== In this variation, level-compressed branches are expanded into a series of chained single-branch internal nodes, each including the hash of its direct child. For a brach with a prefix N bits in length, this requires N chained hashes. Thanks to node-compression (excluding empty branches from the serialization), it is possible for each hash operation + padding to fit within a single SHA-256 block. Note that the serialization semantics are unchanged! The variation only changes the procedure for calculating the hash values of interior nodes. The serialization format remains the same (modulo differing hash values standing in for pruned branches). Using the above example, calling <code>dict.hash</code> causes the following internal nodes to be constructed: >>> node1 = AuthTreeNode( right_prefix = 0b1, right_hash = 0xbafa0e2bba3396c5e9804b6cbe61be82bc442c1121aed81f8d5de36e9b20dc2f, right_count = 1, right_size = 4) >>> node2 = AuthTreeNode( left_prefix=0b0, left_hash=node1.hash, left_count=1, left_size=4) >>> node3 = AuthTreeNode( left_prefix=0b0, left_hash=node2.hash, left_count=1, left_size=4) >>> node4 = AuthTreeNode( left_prefix=0b0, left_hash=node3.hash, left_count=1, left_size=4) >>> node5 = AuthTreeNode( left_prefix=0b0, left_hash=node4.hash, left_count=1, left_size=4) >>> node6 = AuthTreeNode(right_prefix=0b1, right_hash=node5.hash, right_count=1, right_size=4) >>> node7 = AuthTreeNode(right_prefix=0b1, right_hash=node6.hash, right_count=1, right_size=4) >>> node8 = AuthTreeNode( left_prefix=0b0, left_hash=node7.hash, left_count=1, left_size=4, value=0xff) >>> dict.hash == node8.hash True >>> dict.hash 0xc3a9328eff06662ed9ff8e82aa9cc094d05f70f0953828ea8c643c4679213895 The advantage of proof-updatable hashing is that any operational proof may be "rebased" onto the tree resulting from a sibling proof, using only the information locally available in the proofs, even in the presence of deletion operations that result in level-compression of the serialized form. The disadvantage is performance: validating an updatable proof requires a number of hash operations lower-bounded by the length of the key in bits. ==Inclusion proofs== An inclusion proof is a prefix tree pruned to contain a subset of its keys. The serialization of an inclusion proof takes the following form: inclusion_proof := variant || root_hash || root_node || checksum Where <code>variant</code> is a single-byte value indicating the presence of level-compression (0 for proof-updatable hashing, 1 for level-compressed hashing). <code>root_hash</code> is the Merkle compression hash of the tree, the 32-byte SHA-256 hash of the root node. <code>tree</code> is the possibly pruned, serialized representation of the tree. And finally, <code>checksum</code> is the first 4 bytes of the SHA-256 checksum of <code>variant</code>, <code>root_hash</code>, and <code>root_node</code>. For ease of transport, the standard envelope for display of an inclusion proof is internet-standard base64 encoding in the following format: - -----BEGIN INCLUSION PROOF----- ATzPZheRnns6KfqBKzZs0dXLOxithdan2p18KgJ2c4O0DgARBwAauzpZmgIQAAP9agcPADEt7Zxd TC3tABAAA/1xBylsTG0t7cwBEAAD/YgHJ5OO2rOcGhAAA/3ZByEg+2g= - -----END INCLUSION PROOF----- Decoded, it looks like this: 0x01 # Level-compressed hashing # Merkle root: 0x3ccf6617919e7b3a29fa812b366cd1d5cb3b18ad85d6a7da9d7c2a02767383b4 # Serialized tree (unpruned): 0x0e001107001abb3a599a02100003fd6a070f00312ded9c5d4c2ded00100003fd 0x7107296c4c6d2dedcc01100003fd880727938edab39c1a100003fdd907 # Checksum: 0x2120fb68 ==Operational proofs== An operational proof is a list of insert/update and delete operations suffixed to an inclusion proof which contains the pathways necessary to perform the specified operations. The inclusion proof must contain the key values to be updated or deleted, and the nearest adjacent key values for each insertion. The serialization of an operational proof takes the following form: operational_proof := variant || root_hash || tree || VARLIST(delete*) || VARLIST(update*) || new_hash || checksum delete := VARCHAR(key) update := VARCHAR(key) || VARCHAR(value) The first three fields, <code>variant</code>, <code>root_hash</code>, and <code>tree</code> are the inclusion proof, and take the same values described in the previous section. <code>deletes</code> is a list of key values to be deleted; each key value in this list must exist in the inclusion proof. <code>updates</code> is a list of key, value mappings which are to be inserted into the tree, possibly replacing any mapping for the key which already exists; either the key itself if it exists (update), or the two lexicographically closest keys on either side if it does not (insert) must be present in the insertion proof. <code>new_hash</code> is the resulting Merkle root after the insertion, updates, and deletes are performed, and <code>checksum</code> is the initial 4 bytes of the SHA-256 hash of the preceding fields. Just like inclusion proofs, an operational proof is encoded in base64 for display and transport. Here's the same - -----BEGIN OPERATIONAL PROOF----- ATzPZheRnns6KfqBKzZs0dXLOxithdan2p18KgJ2c4O0LgARaIsVaQi/GdhOPOgA8p4Pu4PiEfEg lcmy3j7bOc7hXw0DLSeTjtqznBoQAAP92QcBMOS4reacrACzuZJbyP7fqIOf5VEk4iarG4+uPoZC oun8BztQMQBy0LHVeSY= - -----END OPERATIONAL PROOF----- Decoded and broken into its constituent fields: 0x01 # Level-compressed hashing # Original Merkle root: 0x3ccf6617919e7b3a29fa812b366cd1d5cb3b18ad85d6a7da9d7c2a02767383b4 # Serialized tree (included keys: '中本'): 0x2e0011688b156908bf19d84e3ce800f29e0fbb83e211f12095c9b2de3edb39ce 0xe15f0d032d27938edab39c1a100003fdd907 # Deletion list ['中本']: 0x01 0x30e4b8ade69cac # Insertion list []: 0x00 # New Merkle root: 0xb3b9925bc8fedfa8839fe55124e226ab1b8fae3e8642a2e9fc073b50310072d0 # Checksum: 0xb1d57926 ~End of File~ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSs6HIAAoJEAdzVfsmodw4gooQAJm7XNsZjgdeTSpKIvUIU38f tQx2FD08hQdLl48me5mDUbHJgGlYINsKAgoZ8Mqwi/kHEEYhuIlLIX1p6Ovigidb 21BiVoOLdG1egGOwxp17DuwYaDPTppFTlN9TBjZzW6WKc7+4aNvyc1KtrbHIhtj/ 04ekFyAn4U5UH0ht7CI79j0u3Kp85p5D4PyYZB2m82mzti6OxpSM4tXlMkDW7ihg QJwiZSjzejqTd7WF0zr0SLeGVRSN2A0dzUCoVsI98eIa3hkw2N4ae6dRkibyStOT V8VEDvHArEDlvu8jiryajhsom5mvtOOclNDkVXWAf/Te4gj05iYdTIvNvDEJtqsP XDbmw6GgV1kBLlLo0mp//t/+wr+nIvy+sVAP+eqtM/0vjaVXBkXxkUMqqNkrtVpB f3whq7nFahssUMSoWE93jgob1ayAax2XUALVMAXYsJl7b2MqBGlhiTZ8FQZ+TW4A tIpKeUprPmDvA18rO3SCbmLMQryZqYiH0sRyvUc5kvn3qCRHrISZNkEuK591eS+x BO1eOluPzVqeXPPSK1jvGeY0FNJtwzbov4nI1mzOvzQHLCvkHn5PhUFCK5tL5tAe b0Z5qwDV+SvVs7W1R7ejYBzEj77U1zuzZ9AtikOuvy+bNGrkIlpI49EyXHijm7C3 Q6JacTuI0PelYji2gaBJ =BbDs -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 1:47 [Bitcoin-development] BIP proposal: Authenticated prefix trees Mark Friedenbach @ 2013-12-20 6:48 ` Jeremy Spilman 2013-12-20 11:21 ` Mark Friedenbach 2013-12-20 10:48 ` Peter Todd ` (2 subsequent siblings) 3 siblings, 1 reply; 14+ messages in thread From: Jeremy Spilman @ 2013-12-20 6:48 UTC (permalink / raw) To: Bitcoin Dev Wow there's a lot here to think about. I'm pretty sure I haven't grasped the full implications yet. I see it proposes to also introduce additional BIPs describing the use of the data stucture for stateless validation & mining, the UBC address index for "SPV+" operating modes, document timestamping and merged mining. Can the BIP stand alone as a BIP without some specific changes to the protocol or end-user accessible features defined within it? It seems like an extremely useful data stucture, but as I understand it the purpose of BIPS is defining interoperability points, not implementation details? Unless the tree itself is becoming part of the protocol, seems like its spec, test vectors, and reference implementation can live elsewhere, but I would love to read about BIPS which use this tree to accomplish some amazing scalability or security benefits. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 6:48 ` Jeremy Spilman @ 2013-12-20 11:21 ` Mark Friedenbach 2013-12-20 13:17 ` Peter Todd 0 siblings, 1 reply; 14+ messages in thread From: Mark Friedenbach @ 2013-12-20 11:21 UTC (permalink / raw) To: bitcoin-development -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Jeremy, Let's give a preview of the application-oriented BIPs I mentioned: Stateless validation and mining involves prefixing transaction and block messages with proofs of their UTxO state changes. These are the "operational proofs" I describe in the draft, and they work on prefix trees whose root hashes committed to the coinbase in a soft-fork upgrade of the validation rules. "Ultimate blockchain compression" involves consensus over an address index, which can be queried over the p2p network by lightweight nodes. The structure of the index is an authenticated prefix tree, and the results of such a query is an an inclusion proof. Document time-stamping and this new method of merged mining use the same structure: a prefix tree whose root hash value is committed to a pruneable output of the coinbase transaction. Document timestamp proofs and merged mining proof-of-works are inclusion proofs over this tree. I hope that shows how the BIP directly affects interoperability of the bitcoin protocol and clients which use these applications. I released this BIP first to get some feedback on the structure itself, which will be used by all of the application-specific BIPs which follow. Stepping back and speaking generically, the purpose of a BIP as I see it is to standardize details which affect interoperability between clients. In fact, at a cursory glance only about half of the BIPs deal with protocol issues directly - the rest deal with local / user-interface issues like key derivation or JSON-RPC APIs. Even if none of the applications involved protocol changes, I still think BIPs like this would be of value in that they serve to standardize things which are or will seek to become commonly used and widely implemented. Cheers, Mark On 12/19/2013 10:48 PM, Jeremy Spilman wrote: > Wow there's a lot here to think about. I'm pretty sure I haven't > grasped the full implications yet. > > I see it proposes to also introduce additional BIPs describing the > use of the data stucture for stateless validation & mining, the UBC > address index for "SPV+" operating modes, document timestamping and > merged mining. > > Can the BIP stand alone as a BIP without some specific changes to > the protocol or end-user accessible features defined within it? It > seems like an extremely useful data stucture, but as I understand > it the purpose of BIPS is defining interoperability points, not > implementation details? > > Unless the tree itself is becoming part of the protocol, seems like > its spec, test vectors, and reference implementation can live > elsewhere, but I would love to read about BIPS which use this tree > to accomplish some amazing scalability or security benefits. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJStChCAAoJEAdzVfsmodw42DcQAIlgkukh5K/XYloIiT5pgaHS xCZXtOvxpNUep8x35rvEO1ePjvPvUkbUE2jRw2se1rSMkhzw3PpHHtXV/gIOGqUe WVKeeIM5pZX56sEcEdUQ1pTwB2rmtSNeyCuHl8fLatk8eLhcAHcpv/7esLuAjWCr EE840s8+q3ltuzKi3nqxK84bwIohgSMKhncfonNp5lMAtug8Itqopq3DPDoxwiK/ qUwSz5UCEMH6oNHnywzhKGUhBErqo4q8IgAKcZYBZZ9n4BRjf4ngoCw9n5wCef8v tyTvwrg0nSQTQa67cg7RCsY7SisrI9gaMvCYTSvEMKdw9X0aqAX1p0yZpTbV+dIr Q2ZT6gJmg2sD22zKY1/58oq+PiNO+nRS81OG2znZofsIfhOVW0bIZAQ8+zZtFW40 vXxMuHCNieCK8e7f9A6LLv/Zz7rmNxdQ6cHBEL1nIs1Y4d1FpHJVI2LHi54QmVXf C5PKF/e7K2eD3LZMNxS818rZaiJJ7qmpjS3rkG2owHyJHEhBJIlkYXfI1YCraT+b R5AzAh2Oz0Nyb5ChP2VSaecJNjGvRMo7Z6HCytmgAGOEcDDZkxSv0kkprbvchqXx XziFgA4iSajBKYWPiPLGMADfMPT6zd4fhDjyaN8+LPO3d3ZK1VwmQDLRQ3DxfeIP RgchHR/pS77XI7hCFwtN =ao17 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 11:21 ` Mark Friedenbach @ 2013-12-20 13:17 ` Peter Todd 2013-12-20 18:41 ` Mark Friedenbach 0 siblings, 1 reply; 14+ messages in thread From: Peter Todd @ 2013-12-20 13:17 UTC (permalink / raw) To: Mark Friedenbach; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 2917 bytes --] On Fri, Dec 20, 2013 at 03:21:38AM -0800, Mark Friedenbach wrote: > Hi Jeremy, Let's give a preview of the application-oriented BIPs I > mentioned: > > Stateless validation and mining involves prefixing transaction and > block messages with proofs of their UTxO state changes. These are the > "operational proofs" I describe in the draft, and they work on prefix > trees whose root hashes committed to the coinbase in a soft-fork > upgrade of the validation rules. > > "Ultimate blockchain compression" involves consensus over an address > index, which can be queried over the p2p network by lightweight nodes. > The structure of the index is an authenticated prefix tree, and the > results of such a query is an an inclusion proof. I've thought about this for awhile and come to the conclusion that UTXO commitments are a really bad idea. I myself wanted to see them implemented about a year ago for fidelity bonded banks, but I've changed my mind and I hope you do too. They force miners and every full node with SPV clients to store the entire UTXO set in perpetuity. This is bad by itself, but then they make it even worse by making Bitcoin really useful and convenient to use as a decentralized database; UTXO commitments make it easy and convenient to implement systems like Namecoin on top of Bitcoin, yet we don't have the UTXO expiration that might make such uses reasonable. Right now the UTXO set is reasonable small - ~300MB - but that can and will change if we make it an attractive way to store data. UTXO commitments do exactly that. You're also *not* giving users what they actually want: the transactions associated with their wallets. Even though Electrum could easily work via a pure UTXO database they implemented transaction lookup instead; Electrum servers cough up every transaction associated with a user's wallet. If you're going to do that, it's just as easy to do per-block lookup trees which don't force the UTXO set to be stored. There's also a more subtle issue: the security model of UTXO commitments sucks. It encourages wallets to essentially trust single confirmations because it's unlikely that nodes will want to store the multiple copies of the UTXO set required to provide proof of multiple confirmations. Basically the issue is when you start your wallet you get a proof of UTXO set for the most recent block; that's just one confirmation. To get more confirmations you have to wait for subsequent blocks, checking the set on each block. Per block indexes on the other hand naturally lead wallets to count confirmations properly. IMO you should take this technology to Namecoin instead. For them the fast lookups are probably worth the trade-offs, and they expire domains so the total set size doesn't grow unbounded. -- 'peter'[:-1]@petertodd.org 00000000000000028fd077fb1e33e942e3e875aa29cec134fed89d650242c577 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 681 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 13:17 ` Peter Todd @ 2013-12-20 18:41 ` Mark Friedenbach 0 siblings, 0 replies; 14+ messages in thread From: Mark Friedenbach @ 2013-12-20 18:41 UTC (permalink / raw) To: Peter Todd; +Cc: bitcoin-development -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 (Sorry Peter, this was meant for the whole list:) On 12/20/2013 05:17 AM, Peter Todd wrote: > I've thought about this for awhile and come to the conclusion that > UTXO commitments are a really bad idea. I myself wanted to see them > implemented about a year ago for fidelity bonded banks, but I've > changed my mind and I hope you do too. > > They force miners and every full node with SPV clients to store the > entire UTXO set in perpetuity. This is incorrect. If the slower proof-updatable hashes are used, then mining only requires what I've called "operational proofs" to be attached to received transactions and blocks. Access to the UTXO set is required to make new transactions, at least for the outputs of the transaction, but I do not believe this is as significant a problem as you do. It is a service that can be outsourced for a minimal fee - include an explicit output of the necessary amount to a scriptPubKey specified by the archival node, and they will make sure the proper proofs are attached. > This is bad by itself, but then they make it even worse by making > Bitcoin really useful and convenient to use as a decentralized > database; UTXO commitments make it easy and convenient to > implement systems like Namecoin on top of Bitcoin, yet we don't > have the UTXO expiration that might make such uses reasonable. > Right now the UTXO set is reasonable small - ~300MB - but that can > and will change if we make it an attractive way to store data. > UTXO commitments do exactly that. You might have to explain this to me, but it is not clear to me how the validation index could be twisted into providing a Namecoin-like system. Or the address index either, which I presume is what you are referring to. Namecoin works by assigning domains to outputs, and then tracking ownership and configuration of that domain through chains of outputs. But the UTXO set doesn't contain connecting information. At best all it would be is a glorified, and expensive time-stamper, unattractive because there are already better solutions. > You're also *not* giving users what they actually want: the > transactions associated with their wallets. Even though Electrum > could easily work via a pure UTXO database they implemented > transaction lookup instead; Electrum servers cough up every > transaction associated with a user's wallet. If you're going to do > that, it's just as easy to do per-block lookup trees which don't > force the UTXO set to be stored. At the cost of having the supposedly lightweight client query for each of its coins on every single block, to construct a negative proof-of-spend. > There's also a more subtle issue: the security model of UTXO > commitments sucks. It encourages wallets to essentially trust > single confirmations because it's unlikely that nodes will want to > store the multiple copies of the UTXO set required to provide > proof of multiple confirmations. Basically the issue is when you > start your wallet you get a proof of UTXO set for the most recent > block; that's just one confirmation. To get more confirmations you > have to wait for subsequent blocks, checking the set on each block. > Per block indexes on the other hand naturally lead wallets to > count confirmations properly. I don't think this is true, or at least you are not considering available optimizations. You certainly don't need to store multiple copies of the UTXO set. I'm a little confused as to the exact situation you are describing. When a key is loaded into a wallet, or a wallet comes online after a significant absence, it looks for coins in the current UTXO set. If any coins are found, their attached transaction record has a block height field, so the confirmation count can be derived from that. As blocks go by that count is naturally increased. I'm not sure how this is different from the current situation. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJStI9aAAoJEAdzVfsmodw4IooP/1uK9cvP1vxXyQRbAHf9oFXw AmZ8p1+t8f6MHUpjkv/Xn0poFNU8qSnNz65drQdq8ErcJnqe4V3Wt6G32/uCxvZs 6AX6bRYQIfhHY0DBPgfacO5/ALdlnS4NdjWFCA5hHDgLd30BpbU1WK1ze985TXrd +ucQkzcMYEDW2lb+sFvfhpi9ZPFd34ZrNzH//oS794eYKWAmj7jXqdgxk5AKat61 Xileq5beE4xom3pChXc3PtIJKsoil5SjE20/FW52wcCdyaEFG0kwl937pEGjQnlP mylK/ilfZ6cvRC8MmVnl/6AC4V2hjB4Ncej03jG3JI2FdaJEOHuHg0uh8/Zl1I4A YVIKyrHQhQb/VGsfXtW3zokHzDeEtJwlx+PPFaLc9QurFirNjSnenhbw4Vpbg3Xt dH1Qd9xWcT85a19Oz8Q4rt3z7UmX9J/geZrUHCuPtr47yXU0e1Cc6ZP9zDYNtfKU q6MjNZiaLJ/Wp0n4IeQ/4/wqy0rM/psP9i5d6IdP96tayVM9aKj5Lh9lU/Od5wGO 2PPB4kvhJfMbx3o+S7UK5vra7ysZzULpoVGDpUR3xRM72l//vlNhSLK5nILVO3r8 sIC5+3WoZLUKvuNo45/BDxXHZajrWLCU84WrwHVm1u7SHfBQcoES/rhcx2zlgyx0 /Iwxsgb7Fznl+eM2bEpZ =TtaV -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 1:47 [Bitcoin-development] BIP proposal: Authenticated prefix trees Mark Friedenbach 2013-12-20 6:48 ` Jeremy Spilman @ 2013-12-20 10:48 ` Peter Todd [not found] ` <52B425BA.6060304@monetize.io> 2013-12-20 19:48 ` Gregory Maxwell 2014-01-05 18:43 ` Thomas Voegtlin 3 siblings, 1 reply; 14+ messages in thread From: Peter Todd @ 2013-12-20 10:48 UTC (permalink / raw) To: Mark Friedenbach; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 1084 bytes --] On Thu, Dec 19, 2013 at 05:47:52PM -0800, Mark Friedenbach wrote: > This BIP describes the authenticated prefix tree and its many > variations in terms of its serialized representation. Additional BIPs > describe the application of authenticated prefix trees to such > applications as committed indices, document time-stamping, and merged > mining. Could you expand more on how prefix trees could be used for time-stamping and merged mining? > >>> dict = AuthTree() > >>> dict['Curie'] = VARINT(1898) > >>> dict('Einstein') = VARINT(1905) > >>> dict['Fleming'] = VARINT(1928) > >>> dict['中本'] = VARINT(2009) I'd be inclined to leave the unicode out of the code examples as many editors and shells still don't copy-and-paste it nicely. Using it in BIP documents themselves is fine and often has advantages re: typesetting, but using it in crypto examples like this just makes it harder to reproduce the results by hand unnecessarily. -- 'peter'[:-1]@petertodd.org 0000000000000002d7a0c56ae2c5b2b3322d5017cfef847455d4d86a6bc12280 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <52B425BA.6060304@monetize.io>]
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees [not found] ` <52B425BA.6060304@monetize.io> @ 2013-12-20 12:47 ` Peter Todd 0 siblings, 0 replies; 14+ messages in thread From: Peter Todd @ 2013-12-20 12:47 UTC (permalink / raw) To: Mark Friedenbach, bitcoin-development [-- Attachment #1: Type: text/plain, Size: 2655 bytes --] On Fri, Dec 20, 2013 at 03:10:50AM -0800, Mark Friedenbach wrote: > On 12/20/2013 02:48 AM, Peter Todd wrote: > > On Thu, Dec 19, 2013 at 05:47:52PM -0800, Mark Friedenbach wrote: > >> This BIP describes the authenticated prefix tree and its many > >> variations in terms of its serialized representation. Additional > >> BIPs describe the application of authenticated prefix trees to > >> such applications as committed indices, document time-stamping, > >> and merged mining. > > > > Could you expand more on how prefix trees could be used for > > time-stamping and merged mining? > > The root hash of a prefix tree is placed in the coinbase at a location > standardized by convention. Right, last txout in an OP_RETURN like we discussed. > For document time-stamping, the key can be > the hash of the document. Don't you mean the value is the hash of the document and the key is irrelevant? > For merged mining, the key is the hash of > the genesis block of the altchain, and the value is the hash of the > aux-pow (for p2pool, the share hash). What's the advantage over the direction-based system I proposed before? Seems to me the code required to validate the proof is significantly more complex in your scheme. http://www.mail-archive.com/bitcoin-development@lists.sourceforge.net/msg03149.html > In the system I have in mind this adds 43 bytes to the coinbase > transaction, By 43 bytes you mean the whole op_return txout right? > >>>>> dict = AuthTree() dict['Curie'] = VARINT(1898) > >>>>> dict('Einstein') = VARINT(1905) dict['Fleming'] = > >>>>> VARINT(1928) dict['中本'] = VARINT(2009) > > > > I'd be inclined to leave the unicode out of the code examples as > > many editors and shells still don't copy-and-paste it nicely. Using > > it in BIP documents themselves is fine and often has advantages re: > > typesetting, but using it in crypto examples like this just makes > > it harder to reproduce the results by hand unnecessarily. > > Thanks for the feedback, I rather agree. When I was creating that > example for some reason I wanted the right branch of the root node to > be used, which is difficult when only 7-bit ASCII keys are used. But I > don't think the illustrative point I had in mind ended up being > particularly relevant, so I'll rework it. That example is python, so I'd suggest just using escape sequences myself. You probably also should include the "b" prefix to make the strings explicitly binary for py3 compatibility, ie dict[b'\xbe\xef'] -- 'peter'[:-1]@petertodd.org 000000000000000216e3750a9ad9584395352d728a3c543844eab3bfc9ce1073 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 1:47 [Bitcoin-development] BIP proposal: Authenticated prefix trees Mark Friedenbach 2013-12-20 6:48 ` Jeremy Spilman 2013-12-20 10:48 ` Peter Todd @ 2013-12-20 19:48 ` Gregory Maxwell 2013-12-20 22:04 ` Mark Friedenbach 2014-01-05 18:43 ` Thomas Voegtlin 3 siblings, 1 reply; 14+ messages in thread From: Gregory Maxwell @ 2013-12-20 19:48 UTC (permalink / raw) To: Mark Friedenbach; +Cc: Bitcoin Dev On Thu, Dec 19, 2013 at 5:47 PM, Mark Friedenbach <mark@monetize.io> wrote: > Hello fellow bitcoin developers. Included below is the first draft of > a BIP for a new Merkle-compressed data structure. The need for this > data structure arose out of the misnamed "Ultimate blockchain > compression" project, but it has since been recognized to have many > other applications. A couple very early comments— I shared some of these with you on IRC but I thought I'd post them to make them more likely to not get lost. Whats a VARCHAR() A zero terminated string? A length prefixed string? How is the length encoded? Hopefully not in a way that has redundancy, since things that don't survive a serialization round trip is a major trap. Is the 'middle' the best place for the extradata? Have you contemplated the possibility that some applications might use midstate compression? On that general subject, since the structure here pretty much always guarantees two compression function invocations. SHA512/256 might actually be faster in this application. Re: using sha256 instead of sha256^2, we need to think carefully about the implications of Merkle-Damgard generic length extension attacks. It would be unfortunately to introduce them here, even though they're currently mostly theoretical for sha256. WRT hash function performance, hash functions are so ludicrously fast (and will be more so as processors get SHA2 instructions) that the performance of the raw compression function would hardly ever be a performance consideration unless you're using a slow interpreted language (... and that sounds like a personal problem to me). So I don't think CPU performance should be a major consideration in this BIP. What I do think should be a consideration is the cost of validating the structure under a zero-knowledge proof. An example application is a blind proof for a SIN or a proof of how much coin you control... or even a proof that a block was a correctly validated one, and in these cases additional compression function calls are indeed pretty expensive. But they're not the only cost, any conditional logic in the hash tree evaluation is expensive, and particular, I think that any place where data from children will be combined with a variable offset (especially if its not word aligned) would potentially be rather expensive. I'm unconvinced about the prefix tree compressed applications, since they break compact update proofs. If we used them in the Bitcoin network they could only be used for data where all verifying nodes had all their data under the tree. I think they add a lot of complexity to the BIP (esp from people reading the wrong section), so perhaps they should be split into another document? In any case, I want to thank you for talking the time to write this up. You've been working on this stuff for a while and I think it will be lead to useful results, even if we don't end up using how it was originally envisioned. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 19:48 ` Gregory Maxwell @ 2013-12-20 22:04 ` Mark Friedenbach 0 siblings, 0 replies; 14+ messages in thread From: Mark Friedenbach @ 2013-12-20 22:04 UTC (permalink / raw) To: Gregory Maxwell, Bitcoin Dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/20/2013 11:48 AM, Gregory Maxwell wrote: > A couple very early comments— I shared some of these with you on > IRC but I thought I'd post them to make them more likely to not get > lost. I got the inputs from IRC, but thank you for posting to the list so that others can see and review. > Whats a VARCHAR() A zero terminated string? A length prefixed > string? How is the length encoded? Hopefully not in a way that > has redundancy, since things that don't survive a serialization > round trip is a major trap. A length-prefixed string, using the shortest representation VARINT for the length. Same as how scripts are serialized in transactions. > Is the 'middle' the best place for the extradata? Have you > contemplated the possibility that some applications might use > midstate compression? Yes I considered midstate compression which is why the branch hashes come last, but "extra" was an oversight. In every application I've considered it's either not used (and therefore a single byte), or updated whenever the node or its children updates. Honestly I don't expect midestate compression to offer much since in the nodes that are updated frequently it is unlikely that there will be enough static data at the front to fill even a 512 bit block of the smaller hash function. But it doesn't hurt to prepare just in case. I'll move it to the end. > On that general subject, since the structure here pretty much > always guarantees two compression function invocations. SHA512/256 > might actually be faster in this application. Yes, this is a great suggestion. Moving to SHA-512/256 will let most inner nodes fit inside a single block, so long as the "extra" field is not too long. Also apparently SHA-512 is faster on 64-bit CPUs, which is a nice advantage. I didn't know that. I'm concerned about speed but I did not go with a faster hash function because users are more likely to have hardware acceleration for the SHA-2 family. > Re: using sha256 instead of sha256^2, we need to think carefully > about the implications of Merkle-Damgard generic length extension > attacks. It would be unfortunately to introduce them here, even > though they're currently mostly theoretical for sha256. The serialization format encodes lengths in such a way that you cannot extend the data structure merely by appending bits. You would have to change the prior, already hashed bits as well. I believe this makes it immune to length extension attacks. > WRT hash function performance, hash functions are so ludicrously > fast (and will be more so as processors get SHA2 instructions) that > the performance of the raw compression function would hardly ever > be a performance consideration unless you're using a slow > interpreted language (... and that sounds like a personal problem > to me). So I don't think CPU performance should be a major > consideration in this BIP. Well.. the UTXO tree is big. Let's assume 5,000 transactions per block, with an average of 3 inputs/outputs per transaction. This is close to the worst-case scenario with the current block size. That's 15,000 insert, update, or delete operations. The number of hashes required when level-compression is used is log2 the number of items in the tree, which for bitcoin is currently about 2.5 million transactions. So that's about ~21 hashes per input/ouput, or 315,000 hash operations. A CPU is able to do about 100,000 hashes per second per core, that'll probably take about a second on a modern 4- or 8-core machine. For updatable proofs, the number of hash operations is equal to the number of bits in the key, which for the validation index is always 256. That means 3.84 million hashes, or about 10 seconds on a 4-core machine. The numbers for the wallet index are worse, as it scales with the number of outputs, which is necessarily larger, and the keys are longer. This is not an insignificant cost in the near term, although it is the type of operation that could be easily offloaded to a GPU or FPGA. > What I do think should be a consideration is the cost of > validating the structure under a zero-knowledge proof. An example > application is a blind proof for a SIN or a proof of how much coin > you control... or even a proof that a block was a correctly > validated one, and in these cases additional compression function > calls are indeed pretty expensive. But they're not the only cost, > any conditional logic in the hash tree evaluation is expensive, and > particular, I think that any place where data from children will be > combined with a variable offset (especially if its not word > aligned) would potentially be rather expensive. This is something I know less about, and I welcome constructive input. There is *no* reason that the hash serialization needs to have fancy space-saving features. You could even make the SIG_HASH node serialization into fixed-size, word-aligned data structures. But this is absolutely not my field, and I may need some hand-holding. Do the fields need to be at fixed offsets? With fixed widths? Should I put variable-length stuff like the level-compressed prefixes and value data at the end (midstate be damned) to keep fixed offsets? What's expected word alignment, 32-bit or 64-bit? > I'm unconvinced about the prefix tree compressed applications, > since they break compact update proofs. If we used them in the > Bitcoin network they could only be used for data where all > verifying nodes had all their data under the tree. I think they add > a lot of complexity to the BIP (esp from people reading the wrong > section), so perhaps they should be split into another document? I believe what you mean by "compact update proofs" is what I call "updatable proofs", where level compression is only used in the disk and network serialization. These are what I propose to use for the validation and wallet indexes, if the computational costs can be brought under control, because it allows composable proofs. Unlike a time-ordered index, it does require that someone, somewhere has random access to the entire UTXO set since you can't predict in advance what your txid will be. But this is a matter of tradeoffs and I don't believe at this time that there is a clearcut advantage of one approach over the other. I'm pursuing a txid-indexed UTXO set because it most closely matches the current operational model. That said, you still want level-compression within the serialization format itself, if for no other reason than to keep proof sizes small. > In any case, I want to thank you for talking the time to write > this up. You've been working on this stuff for a while and I think > it will be lead to useful results, even if we don't end up using > how it was originally envisioned. Thanks, Mark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJStL7nAAoJEAdzVfsmodw4DQcP/ilB2LTPnbK/UoU+y0d/0CUu 4PVo8VJt0KCUgWbEHIohm0rq4FUpb7FpjzAyQ171jzRykDEkUy7nDh/QsWGUvDvA gOHKEsX3E+ei8iQMkwlw5D/Lpbb8GNr3SHrU3lvVbXOoaPua9I16778hv3wBWhiN R70N8dQUwWD1IU0Dfmhi8v2P8OTn4OGTEwS5AQANGCroYyALF+U9EDHjWDMV+bYn 8qrX4v05xjik5YXOv8PNDDp0S9A+KxD72OKL5xlXiE7VbKrYXKt6xNfy1xYgHH8p u9kWDFMkbis/HAiB5aiFTmxX5/k+yeJw8BfG+txj0xo7b7cWKB9cQLT8vUru2QuH lHdurxkaBQ+6jqlxYRk7nh0h+obeAXA/CGMseaDYluBg7qTkeWnLORfm7T7fUnHw fB5sXPUKEeYw48sfs58w/71NbCyl2yYNGlmmugk2SilD3QbUKU1xogNTHEGDuA8M kPsWW7vRIdI3iy9adgh3LZAvySt7/a5VXXs1li7teDgV4QqH7e2hR0KR8n115N7f r30LSctbc/MovE9VPb8I7ssQTB7So+1Ki6DbVeQO/8UlCSK5prM3n2sICmT/EVW7 2hNzwbHuEJEWYE7q89buzMRdqbUYSRdG1T1mFBeZ+/n4HH6cweMl6BH4d46LAfuq BqzTmq5neoCKBwfMfoqg =YmkZ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2013-12-20 1:47 [Bitcoin-development] BIP proposal: Authenticated prefix trees Mark Friedenbach ` (2 preceding siblings ...) 2013-12-20 19:48 ` Gregory Maxwell @ 2014-01-05 18:43 ` Thomas Voegtlin 2014-01-06 18:13 ` Peter Todd 3 siblings, 1 reply; 14+ messages in thread From: Thomas Voegtlin @ 2014-01-05 18:43 UTC (permalink / raw) To: bitcoin-development Hello and happy new year to this mailing list! Thank you Mark for the incredible work you've been doing on this. I am following this very closely, because it is of primary importance for Electrum. I have written a Python-levelDB implementation of this UTXO hashtree, which is currently being tested, and will be added to Electrum servers. My implementation follows Alan Reiner's idea to store the tree as items in a key-value database. I believe that a C++ implementation like yours will be at least an order of magnitude faster, and I am looking forward to it. I too believe that BIPs should define interoperability points, but probably not implementation details. For the UTXO hashtree, this means that a BIP should at least specify how the root hash is constructed. This might be the only thing that needs to be specified. However, I see no pressing issue with writing a BIP; it might be preferable to implement and test different options first, and learn from that. Thomas Le 20/12/2013 02:47, Mark Friedenbach a écrit : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello fellow bitcoin developers. Included below is the first draft of > a BIP for a new Merkle-compressed data structure. The need for this > data structure arose out of the misnamed "Ultimate blockchain > compression" project, but it has since been recognized to have many > other applications. > > In addition to this BIP I am preparing three additional BIPs > describing the use of this data structure in stateless validation & > mining, the UBC address index for "SPV+" operating modes, document > timestamping and merged mining. > > A Python implementation of this data structure is available here: > > https://github.com/monetizeio/python-bitcoin > > A C++ implementation is being worked on. > > As per the BIP-1 procedure, I am submitting this rough draft to the > community for discussion. I welcome all comments and criticisms of > both form and content. > > - -Mark > > > ==Abstract== > > This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle > hash tree] variant of the [http://en.wikipedia.org/wiki/Trie > prefix-tree data structure], ideally suited for encoding key-value > indices which support memory-efficient proofs. > > ==Motivation== > > There are a number of applications which would benefit from having a > data structure with the following properties: > > * '''Arbitrary mapping of keys to values.''' A ''key'' can be any > bytestring, and its ''value'' any other bytestring. > * '''Duplicate keys disallowed.''' Every key has one, and only one > value associated with it. Some applications demand assurance that no > key value is reused, and that this constraint can be checked without > requiring access to the entire data structure. > * '''Efficient look-up by key.''' The data structure should support > sub-linear lookup operations with respect to the number of keys in the > mapping. Logarithmic time or linear with respect to the length of the > key should be achievable and would be sufficient for realistic > applications. > * '''Merkle compression of mapping structure.''' It should be possible > to produce a reduced description of the tree consisting of a single > root hash value which is deterministically calculated from the mapping > structure. > * '''Efficient proofs of inclusion.''' It should be possible to > extract a proof of key/value mapping which is limited in size and > verification time by the length of the key in the worst case. > * '''Computation of updates using local information.''' Given a set of > inclusion proofs, it should be possible to calculate adjustments to > the local mapping structure (update or deletion of included mappings, > or insertion between two included mappings which are adjacent in the > global structure). > > Such applications include committed validation indices which enable > stateless mining nodes, committed wallet indices which enable > trust-less querying of the unspent transaction output set by > <code>scriptPubKey</code>, efficient document time-stamping, and > secure & efficient merged mining. This BIP describes an authenticated > prefix tree which has the above properties, but leaves the myriad > applications to be formalized in future BIPs. > > ==Data structure== > > This BIP defines a binary prefix tree. Such a structure provides a > mapping of bitstrings (the ''keys'') to bytestrings (the ''values''). > It is an acyclic binary tree which implicitly encodes keys within the > traversal path -- a "left" branch is a 0, and a "right" branch is a 1. > Each node is reachable by only one unique path, and reading off the > branches taken (0 for each left, 1 for each right) as one follows the > path from root to target yields the node's key. > > The particular binary prefix tree defined by this BIP is a hybrid > PATRICIA / de la Brandais tree structure. > [http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a > long sequence of non-branching nodes into a single interior node with > a per-branch ''skip prefix''. This achieves significant savings in > storage space, root hash calculation, and traversal time. > > A de la Brandais trie achieves compression by only storing branches > actually taken in a node. The space savings are minimal for a binary > tree, but place the serialized size of a non-branching interior node > under the SHA-256 block size, thereby reducing the number of hash > operations required to perform updates and validate proofs. > > This BIP describes the authenticated prefix tree and its many > variations in terms of its serialized representation. Additional BIPs > describe the application of authenticated prefix trees to such > applications as committed indices, document time-stamping, and merged > mining. > > ==Serialization format== > > As a hierarchical structure, the serialization of an entire tree is > the serialization of its root node. A serialized node is the > concatenation of five structures: > > node := flags || VARCHAR(extra) || value || left || right > > The <code>flags</code> is a single byte field whose composite values > determine the bytes that follow. > > flags = (left_flags << 0) | > (right_flags << 2) | > (has_value << 4) | > (prune_left << 5) | > (prune_right << 6) | > (prune_value << 7) > > The <code>left_flags</code> and <code>right_flags</code> are special > 2-bit enumeration fields. A value of 0 indicates that the node does > not branch in this direction, and the corresponding <code>left</code> > or <code>right</code> branch is missing (replaced with the empty > string in the node serialization). A value of 1 indicates a single bit > key prefix for this branch, implicitly 0 for <code>left</code> and 1 > for <code>right</code>. A 2 indicates up to 7 bits of additional skip > prefix (beyond the implicit first bit, making 8 bits total) are stored > in a compact single-byte format. A 3 indicates a skip prefix with > greater than 7 additional bits, stored length-prefix encoded. > > The single bit <code>has_value</code> indicates whether the node > stores a data bytestring, the value associated with its key prefix. > Since keys may be any value or length, including one key being a > prefix of another, it is possible for interior nodes in addition to > leaf nodes to have values associated with them, and therefore an > explicit value-existence bit is required. > > The remaining three bits are used for proof extraction, and are masked > away prior to hash operations. <code>prune_left</code> indicates that > the entire left branch has been pruned. <code>prune_right</code> has > similar meaning for the right branch. If <code>has_value</code> is > set, <code>prune_value</code> may be set to exclude the node's value > from encoded proof. This is necessary field for interior nodes, since > it is possible that their values may be pruned while their children > are not. > > The <code>value</code> field is only present if the bit > <code>flags.has_value</code> is set, in which case it is a > <code>VARCHAR</code> bytestring: > > switch flags.has_value: > case 0: > value := ε > case 1: > value := VARCHAR(node.value) > > The <code>extra</code> field is always present, and takes on a > bytestring value defined by the particular application. Use of the > <code>extra</code> field is application dependent, and will not be > covered in this specification. It can be set to the empty bytestring > (serialized as a single zero byte) if the application has no use for > the <code>extra</code> field. > > value := VARCHAR(calculate_extra(node)) > > The <code>left</code> and <code>right</code> non-terminals are only > present if the corresponding <code>flags.left_flags</code> or > <code>flags.right_flags</code> are non-zero. The format depends on the > value of this flags setting: > > switch branch_flags: > case 0: > branch := ε > case 1: > branch := branch_node_or_hash > case 2: > prefix = prefix >> 1 > branch := int_to_byte(1 << len(prefix) | bits_to_int(prefix)) || > branch_node_or_hash > case 3: > prefix = prefix >> 1 > branch := VARINT(len(prefix) - 9) || > bits_to_string(prefix) || > branch_node_or_hash > > <code>branch_flags</code> is a stand-in meant to describe either > <code>left_flags</code> or <code>right_flags</code>, and likewise > everywhere else in the above pseudocode <code>branch</code> can be > replaced with either <code>left</code> or <code>right</code>. > > <code>prefix</code> is the key bits between the current node and the > next branching, terminal, and/or leaf node, including the implicit > leading bit for the branch (0 for the left branch, 1 for the right > branch). In the above code, <code>len(prefix)</code> returns the > number of bits in the bitstring, and <code>prefix >> 1</code> drops > the first bit reducing the size of the bitstring by one and > renumbering the indices accordingly. > > The function <code>int_to_byte</code> takes an integer in the range > [0, 255] and returns the octet representing that value. This is a NOP > in many languages, but present in this pseudocode so as to be explicit > about what is going on. > > The function <code>bits_to_int</code> interprets a sequence of bits as > a little-endian integer value. This is analogous to the following > pseudocode: > > def bits_to_int(bits): > result = 0 > for idx in 1..len(bits): > if bits[idx] == 1: > result |= 1<<idx > > The function <code>bits_to_string</code> serializes a sequence of bits > into a binary string. It uses little-endian bit and byte order, as > demonstrated by the following pseudocode: > > def bits_to_string(bits): > bytes = [0] * ceil(len(bits) / 8) > for idx in 1..len(bits): > if bits[idx] == 1: > bytes[idx / 8] |= 1 << idx % 8 > return map(int_to_byte, bytes) > > <code>branch_node_or_hash</code> is either the serialized child node > or its SHA-256 hash and associated meta-data. Context determines which > value to use: during digest calculations, disk/database serialization, > and when the branch is pruned the hash value is used and serialized in > the same way as other SHA-256 values in the bitcoin protocol (note > however that it is single-SHA-256, not the double-SHA-256 more > commonly used in bitcoin). The number of terminal (value-containing) > nodes and the serialized size in bytes of the fully unpruned branch > are suffixed to the branch hash. When serializing a proof or > snapshotting tree state and the branch is not pruned, the serialized > child node is included directly and the count and size are omitted as > they can be derived from the serialization. > > if branch_pruned or SER_HASH: > branch_node_or_hash := SHA-256(branch) || > count(branch) || > size(branch) > else: > branch_node_or_hash := serialize(branch) > > As an example, here is the serialization of a prefix tree mapping the > names men and women of science to the year of their greatest publication: > > >>> dict = AuthTree() > >>> dict['Curie'] = VARINT(1898) > >>> dict('Einstein') = VARINT(1905) > >>> dict['Fleming'] = VARINT(1928) > >>> dict['中本'] = VARINT(2009) > >>> dict.serialize() > # An bytestring, broken out into parts: > > # . Root node: > 0x0e # left_flags: 2, right_flags: 3, has_value: 1 > 0x00 # extra: ε > > # .l Inner node: 0b01000 > 0x11 # 0b01000 > 0x07 # left_flags: 3, right_flags: 1 > 0x00 # extra: ε > > # .l.l Inner node: 0b01000011 0b01110101 0b01110010 0b01101001 > # 'C' 'u' 'r' 'i' > # 0b01100101 > # 'e' > 0x1abb3a599a02 # 0b01101110101011100100110100101100101 > 0x10 # has_value: 1 > 0x00 # extra: ε > 0x03fd6a07 # value: VARINT(1911) > > # .l.r Inner node: 0b010001 > 0x0f # left_flags: 3, right_flags: 3 > 0x00 # extra: ε > > # .l.r.l Inner node: 0b01000101 0b01101001 0b01101110 0b01110011 > # 'E' 'i' 'n' 's' > # 0b01110100 0b01100101 0b01101001 0b01101110 > # 't' 'e' 'i' 'n' > 0x312ded9c5d4c2ded00 # 0b1011010010110111 > # 0b0011100110111010 > # 0b0011001010110100 > # 0b101101110 > 0x10 # has_value: 1 > 0x00 # extra: ε > 0x03fd7107 # value: VARINT(1905) > > # .l.r.r Inner node: 0b01000110 0b01101100 0b01100101 0b01101101 > # 'F' 'l' 'e' 'm' > # 0b01101001 0b01101110 0b01100111 > # 'i' 'n' 'g' > 0x296c4c6d2dedcc01 # 0b0011011000110010 > # 0b1011011010110100 > # 0b10110111001100111 > 0x10 # has_value: 1 > 0x00 # extra: ε > 0x03fd8807 # value: VARINT(1928) > > # .r Inner node: 0b11100100 0b10111000 0b10101101 > # '中' > # 0b11100110 0b10011100 0b10101100 > # '本' > 0x27938edab39c1a # 0b1100100101110001 > # 0b0101101111001101 > # 0b001110010101100 > 0x10 # has_value: 1 > 0x00 # extra: ε > 0x03fdd907 # value: VARINT(2009) > > ==Hashing== > > There are two variations of the authenticated prefix tree presented in > this draft BIP. They differ only in the way in which hash values of a > node and its left/right branches are constructed. The variations, > discussed below, tradeoff computational resources for the ability to > compose operational proofs. Whether the performance hit is > significant, and whether or not the added features are worth the > tradeoff depends very much on the application. > > ===Variation 1: Level-compressed hashing=== > > In this variation the referenced child node's hash is used in > construction of an interior node's hash digest. The interior node is > serialized just as described (using the child node's digest instead of > inline serialization), the resulting bytestring is passed through one > round of SHA-256, and the digest that comes out of that is the hash > value of the node. This is very efficient to calculate, requiring the > absolute minimum number of SHA-256 hash operations, and achieving > level-compression of computational resources in addition to reduction > of space usage. > > For example: > > >>> dict = AuthTree() > >>> dict['a'] = 0xff > >>> dict.serialize() > 0x0200c3100001ff > >>> dict.root > AuthTreeNode( > left_prefix = 0b01100001, > left_hash = > 0xbafa0e2bba3396c5e9804b6cbe61be82bc442c1121aed81f8d5de36e9b20dc2f, > left_count = 1, > left_size = 4) > >>> dict.hash > 0xb4837376022a7c9ddaa7d685ad183bcbd5d16c362b81fa293a7b9e911766cf3c > > Assuming uniform distribution of key values, level-compressed hashing > has time-complexity logarithmic with respect to the number of keys in > the prefix tree. The disadvantage is that it is not possible in > general to "rebase" an operational proof on top of a sibling, > particularly if that sibling deletes branches that result in > reorganization and level compression of internal nodes used by the > rebased proof. > > ===Variation 2: Proof-updatable hashing=== > > In this variation, level-compressed branches are expanded into a > series of chained single-branch internal nodes, each including the > hash of its direct child. For a brach with a prefix N bits in length, > this requires N chained hashes. Thanks to node-compression (excluding > empty branches from the serialization), it is possible for each hash > operation + padding to fit within a single SHA-256 block. > > Note that the serialization semantics are unchanged! The variation > only changes the procedure for calculating the hash values of interior > nodes. The serialization format remains the same (modulo differing > hash values standing in for pruned branches). > > Using the above example, calling <code>dict.hash</code> causes the > following internal nodes to be constructed: > > >>> node1 = AuthTreeNode( > right_prefix = 0b1, > right_hash = > 0xbafa0e2bba3396c5e9804b6cbe61be82bc442c1121aed81f8d5de36e9b20dc2f, > right_count = 1, > right_size = 4) > >>> node2 = AuthTreeNode( left_prefix=0b0, left_hash=node1.hash, > left_count=1, left_size=4) > >>> node3 = AuthTreeNode( left_prefix=0b0, left_hash=node2.hash, > left_count=1, left_size=4) > >>> node4 = AuthTreeNode( left_prefix=0b0, left_hash=node3.hash, > left_count=1, left_size=4) > >>> node5 = AuthTreeNode( left_prefix=0b0, left_hash=node4.hash, > left_count=1, left_size=4) > >>> node6 = AuthTreeNode(right_prefix=0b1, right_hash=node5.hash, > right_count=1, right_size=4) > >>> node7 = AuthTreeNode(right_prefix=0b1, right_hash=node6.hash, > right_count=1, right_size=4) > >>> node8 = AuthTreeNode( left_prefix=0b0, left_hash=node7.hash, > left_count=1, left_size=4, > value=0xff) > >>> dict.hash == node8.hash > True > >>> dict.hash > 0xc3a9328eff06662ed9ff8e82aa9cc094d05f70f0953828ea8c643c4679213895 > > The advantage of proof-updatable hashing is that any operational proof > may be "rebased" onto the tree resulting from a sibling proof, using > only the information locally available in the proofs, even in the > presence of deletion operations that result in level-compression of > the serialized form. The disadvantage is performance: validating an > updatable proof requires a number of hash operations lower-bounded by > the length of the key in bits. > > ==Inclusion proofs== > > An inclusion proof is a prefix tree pruned to contain a subset of its > keys. The serialization of an inclusion proof takes the following form: > > inclusion_proof := variant || root_hash || root_node || checksum > > Where <code>variant</code> is a single-byte value indicating the > presence of level-compression (0 for proof-updatable hashing, 1 for > level-compressed hashing). <code>root_hash</code> is the Merkle > compression hash of the tree, the 32-byte SHA-256 hash of the root > node. <code>tree</code> is the possibly pruned, serialized > representation of the tree. And finally, <code>checksum</code> is the > first 4 bytes of the SHA-256 checksum of <code>variant</code>, > <code>root_hash</code>, and <code>root_node</code>. > > For ease of transport, the standard envelope for display of an > inclusion proof is internet-standard base64 encoding in the following > format: > > - -----BEGIN INCLUSION PROOF----- > ATzPZheRnns6KfqBKzZs0dXLOxithdan2p18KgJ2c4O0DgARBwAauzpZmgIQAAP9agcPADEt7Zxd > TC3tABAAA/1xBylsTG0t7cwBEAAD/YgHJ5OO2rOcGhAAA/3ZByEg+2g= > - -----END INCLUSION PROOF----- > > Decoded, it looks like this: > > 0x01 # Level-compressed hashing > # Merkle root: > 0x3ccf6617919e7b3a29fa812b366cd1d5cb3b18ad85d6a7da9d7c2a02767383b4 > # Serialized tree (unpruned): > 0x0e001107001abb3a599a02100003fd6a070f00312ded9c5d4c2ded00100003fd > 0x7107296c4c6d2dedcc01100003fd880727938edab39c1a100003fdd907 > # Checksum: > 0x2120fb68 > > ==Operational proofs== > > An operational proof is a list of insert/update and delete operations > suffixed to an inclusion proof which contains the pathways necessary > to perform the specified operations. The inclusion proof must contain > the key values to be updated or deleted, and the nearest adjacent key > values for each insertion. The serialization of an operational proof > takes the following form: > > operational_proof := variant || root_hash || tree || > VARLIST(delete*) || VARLIST(update*) || > new_hash || checksum > > delete := VARCHAR(key) > update := VARCHAR(key) || VARCHAR(value) > > The first three fields, <code>variant</code>, <code>root_hash</code>, > and <code>tree</code> are the inclusion proof, and take the same > values described in the previous section. <code>deletes</code> is a > list of key values to be deleted; each key value in this list must > exist in the inclusion proof. <code>updates</code> is a list of key, > value mappings which are to be inserted into the tree, possibly > replacing any mapping for the key which already exists; either the key > itself if it exists (update), or the two lexicographically closest > keys on either side if it does not (insert) must be present in the > insertion proof. <code>new_hash</code> is the resulting Merkle root > after the insertion, updates, and deletes are performed, and > <code>checksum</code> is the initial 4 bytes of the SHA-256 hash of > the preceding fields. > > Just like inclusion proofs, an operational proof is encoded in base64 > for display and transport. Here's the same > > - -----BEGIN OPERATIONAL PROOF----- > ATzPZheRnns6KfqBKzZs0dXLOxithdan2p18KgJ2c4O0LgARaIsVaQi/GdhOPOgA8p4Pu4PiEfEg > lcmy3j7bOc7hXw0DLSeTjtqznBoQAAP92QcBMOS4reacrACzuZJbyP7fqIOf5VEk4iarG4+uPoZC > oun8BztQMQBy0LHVeSY= > - -----END OPERATIONAL PROOF----- > > Decoded and broken into its constituent fields: > > 0x01 # Level-compressed hashing > # Original Merkle root: > 0x3ccf6617919e7b3a29fa812b366cd1d5cb3b18ad85d6a7da9d7c2a02767383b4 > # Serialized tree (included keys: '中本'): > 0x2e0011688b156908bf19d84e3ce800f29e0fbb83e211f12095c9b2de3edb39ce > 0xe15f0d032d27938edab39c1a100003fdd907 > # Deletion list ['中本']: > 0x01 > 0x30e4b8ade69cac > # Insertion list []: > 0x00 > # New Merkle root: > 0xb3b9925bc8fedfa8839fe55124e226ab1b8fae3e8642a2e9fc073b50310072d0 > # Checksum: > 0xb1d57926 > > ~End of File~ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > Comment: GPGTools - http://gpgtools.org > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQIcBAEBAgAGBQJSs6HIAAoJEAdzVfsmodw4gooQAJm7XNsZjgdeTSpKIvUIU38f > tQx2FD08hQdLl48me5mDUbHJgGlYINsKAgoZ8Mqwi/kHEEYhuIlLIX1p6Ovigidb > 21BiVoOLdG1egGOwxp17DuwYaDPTppFTlN9TBjZzW6WKc7+4aNvyc1KtrbHIhtj/ > 04ekFyAn4U5UH0ht7CI79j0u3Kp85p5D4PyYZB2m82mzti6OxpSM4tXlMkDW7ihg > QJwiZSjzejqTd7WF0zr0SLeGVRSN2A0dzUCoVsI98eIa3hkw2N4ae6dRkibyStOT > V8VEDvHArEDlvu8jiryajhsom5mvtOOclNDkVXWAf/Te4gj05iYdTIvNvDEJtqsP > XDbmw6GgV1kBLlLo0mp//t/+wr+nIvy+sVAP+eqtM/0vjaVXBkXxkUMqqNkrtVpB > f3whq7nFahssUMSoWE93jgob1ayAax2XUALVMAXYsJl7b2MqBGlhiTZ8FQZ+TW4A > tIpKeUprPmDvA18rO3SCbmLMQryZqYiH0sRyvUc5kvn3qCRHrISZNkEuK591eS+x > BO1eOluPzVqeXPPSK1jvGeY0FNJtwzbov4nI1mzOvzQHLCvkHn5PhUFCK5tL5tAe > b0Z5qwDV+SvVs7W1R7ejYBzEj77U1zuzZ9AtikOuvy+bNGrkIlpI49EyXHijm7C3 > Q6JacTuI0PelYji2gaBJ > =BbDs > -----END PGP SIGNATURE----- > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2014-01-05 18:43 ` Thomas Voegtlin @ 2014-01-06 18:13 ` Peter Todd 2014-01-07 0:21 ` Mark Friedenbach 0 siblings, 1 reply; 14+ messages in thread From: Peter Todd @ 2014-01-06 18:13 UTC (permalink / raw) To: Thomas Voegtlin; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 1558 bytes --] On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote: > Hello and happy new year to this mailing list! > > > Thank you Mark for the incredible work you've been doing on this. > I am following this very closely, because it is of primary importance > for Electrum. > > I have written a Python-levelDB implementation of this UTXO hashtree, > which is currently being tested, and will be added to Electrum servers. Along the lines of my recent post on blockchain data: Is it going to be possible to do partial prefix queries on that tree? Also have you considered creating per-block indexes of all scriptPubKeys, spent or unspent, queryable via the same partial prefix method? > I too believe that BIPs should define interoperability points, but probably > not implementation details. For the UTXO hashtree, this means that a BIP > should at least specify how the root hash is constructed. This might be the > only thing that needs to be specified. > > However, I see no pressing issue with writing a BIP; it might be preferable > to implement and test different options first, and learn from that. It'd be very good to test this stuff thoroughly on Electrum first and get a feel for the performance and usability before any soft-fork to make it a miner commitment. Similarly a C++ implementation should be simply added to Bitcoin Core as a bloom filter replacement and made available over the P2P network. -- 'peter'[:-1]@petertodd.org 000000000000000009bc28e08b41a74801c5878bf87978c2486aee7ed8a85778 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2014-01-06 18:13 ` Peter Todd @ 2014-01-07 0:21 ` Mark Friedenbach 2014-01-07 6:31 ` Thomas Voegtlin 0 siblings, 1 reply; 14+ messages in thread From: Mark Friedenbach @ 2014-01-07 0:21 UTC (permalink / raw) To: bitcoin-development -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/06/2014 10:13 AM, Peter Todd wrote: > On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote: >> I have written a Python-levelDB implementation of this UTXO >> hashtree, which is currently being tested, and will be added to >> Electrum servers. > > Along the lines of my recent post on blockchain data: > > Is it going to be possible to do partial prefix queries on that > tree? There's really two tree structures being talked about here. Correct me if I'm wrong Thomas, but last time I looked at your code it was still implementing a 256-way PATRICIA trie, correct? This structure lends itself to indexing either scriptPubKey or H(scriptPubKey) with approximately the same performance characteristics, and in the "Ultimate blockchain compression" thread there is much debate about which to use. In the process of experimentation I've since moved from a 256-way PATRICIA trie to a bitwise, non-level-compressed trie structure - what I call proof-updatable trees in the BIP. These have the advantage of allowing stateless application of one proof to another, and as consequence enable mining & mempool operations without access to the UTXO set, so long as proofs are initially provided in the transaction & block wire format. The "disadvantage" is that performance is closely tied to key length, making H(scriptPubKey) the much more desirable option. I'm sure you see that as an advantage, however :) > Also have you considered creating per-block indexes of all > scriptPubKeys, spent or unspent, queryable via the same partial > prefix method? This would be quite easy to do, separate from the UTXO structure but using the same trie format. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSy0iFAAoJEAdzVfsmodw434MQAIA/fDYT7SfMtfLEgDQKhXCn slRqFEx/HXjvgHHSYnbr9V+8LrGzNvT2ImebbV9ge8VlziAFNGIUq2EYhFs4kHWu GVm9aL8Jj/27SvM0tRwr9n2XIifKOh2sVINAjbv+UwPv/O+cULU95/b53DEF6aqI OWxioOR50TPe4t9AevAGVypNLm1DsyDdymhO9xyBN92xGTNj5QKL5hHG3kcsLIl1 7KaxO0w4UC2sdSGj9FeyH1b0zYg8FlzjJHc1CUshHwUwyYo8LRJtRypL5lrayERg Er/kIGEDovcenNBW8G79l+8VKPfB/lMTssT2pDiQL+1e1fg46CIQxHSyap2JSFTE jgleRk/+1NK/ZjOQ8dEBPZK3TE1WY3qlm/ekjG/8W5kXqcxzFBoAkeBNXuJ/8UMi mKe+DTmbp0xnvLO1p+hpugXKfrQSpcFL+ZvJHlFS1lz7O1N3WvuDCNP9El+L6ueM nFzjr1NTnX0z4vYtscI7qBKVqUrB7Z84c3O/lSYpw4Jilxl4trzV4cn7+AF7KWGM ktR9JJeIoNcJ2Zx4EpRp6OSwhtLkWZyLpPnidQ2p6ev2ytXpTpGsW/i5XS2w57UD 2IG5E0Q7Xzvd58lI/YollWQcagVOZdyzYXa+wVZoFQ6gLF47andpUmtUJOhI7gxv T/rWhPhkTMUn8TdvUcV/ =N9zM -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2014-01-07 0:21 ` Mark Friedenbach @ 2014-01-07 6:31 ` Thomas Voegtlin 2014-01-08 1:04 ` Mark Friedenbach 0 siblings, 1 reply; 14+ messages in thread From: Thomas Voegtlin @ 2014-01-07 6:31 UTC (permalink / raw) To: bitcoin-development Le 07/01/2014 01:21, Mark Friedenbach a écrit : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/06/2014 10:13 AM, Peter Todd wrote: >> On Sun, Jan 05, 2014 at 07:43:58PM +0100, Thomas Voegtlin wrote: >>> I have written a Python-levelDB implementation of this UTXO >>> hashtree, which is currently being tested, and will be added to >>> Electrum servers. >> Along the lines of my recent post on blockchain data: >> >> Is it going to be possible to do partial prefix queries on that >> tree? > There's really two tree structures being talked about here. Correct me > if I'm wrong Thomas, but last time I looked at your code it was still > implementing a 256-way PATRICIA trie, correct? This structure lends > itself to indexing either scriptPubKey or H(scriptPubKey) with > approximately the same performance characteristics, and in the > "Ultimate blockchain compression" thread there is much debate about > which to use. You are right. The 256-way branching follows from the fact that the tree was implemented using a key-value database operating with byte strings (leveldb). With this implementation constraint, a different branching would probably be possible but wasteful. My recent code creates one leaf per unspent, and uses 56-byte keys built as: H(scriptPubKey) + txid + txpos (This is not pushed yet, it needs cleanup. Previous code created one leaf per address) Partial prefix queries are possible with database iterators. > In the process of experimentation I've since moved from a 256-way > PATRICIA trie to a bitwise, non-level-compressed trie structure - what > I call proof-updatable trees in the BIP. These have the advantage of > allowing stateless application of one proof to another, and as > consequence enable mining & mempool operations without access to the > UTXO set, so long as proofs are initially provided in the transaction > & block wire format. I see the advantage of doing that, but this looks really far-fetched.. My understanding is that it would require a complete change in the way clients and miners work. Could such a change be brought iteratively? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees 2014-01-07 6:31 ` Thomas Voegtlin @ 2014-01-08 1:04 ` Mark Friedenbach 0 siblings, 0 replies; 14+ messages in thread From: Mark Friedenbach @ 2014-01-08 1:04 UTC (permalink / raw) To: Thomas Voegtlin, Bitcoin Dev -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/06/2014 10:31 PM, Thomas Voegtlin wrote: > You are right. The 256-way branching follows from the fact that the > tree was implemented using a key-value database operating with byte > strings (leveldb). With this implementation constraint, a different > branching would probably be possible but wasteful. Not really. Just use a suffix to determine the number of bits used in the final key byte. For example, the string "abc" would have the key 0x61626308 // "abc\x08" Dropping the final bit would mean masking it off and having a different terminating value: 0x61626207 // "abb\x07" That way you keep the lexical ordering of keys necessary for database iteration, and the efficient binary encoding. > I see the advantage of doing that, but this looks really > far-fetched.. My understanding is that it would require a complete > change in the way clients and miners work. Could such a change be > brought iteratively? It is an iterative change, I believe. You might be confusing this idea with Peter Todd's TXO commitment proposal using MMR trees, which is a drastic change with its own set of tradeoffs. Just to be clear, here's what I'm proposing: 1) Restructure the current UTXO index to be a Merkle tree, basically by splitting coins into individual outputs and adding interior nodes to the leveldb database. 2) Add hash commitments of this structure to the coinbase. It's still mapping txid's to unspent outputs, just as before - this has nothing to do with the script keyed "wallet index." It's just now nodes can prefix optional proofs to block or transaction messages which prove by reference to the current best block's hash the spend status of the inputs of a transaction, or all the inputs of all the transactions of a block. If the more expensive proof-updatable hashing is used, then these proofs can even be composed or "rebased" onto a new block by applying the contents of an "operational proof" representing the diff between two blocks / the application of a series of transactions. This means that a node which does not have access to the UTXO set can nevertheless receive transactions or entire blocks with prefixed proofs and check the validity of the transaction with just the information available (proof + transaction contents). All that is required after the above soft-fork is a protocol version update and/or a service bit to indicate the ability to send or receive proof-prefixed messages. I'd call that an incremental update. [Aside: adding the wallet index requires storing the entire UTXO set in duplicated form, indexed this time by scriptPubKey or H(scriptPubKey), and including proofs of this structure as well. It is unlikely that any soft-fork would occur forcing consensus over the wallet index, but it could be done as a meta-chain or as an index covering just the contents of the block.] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSzKQ2AAoJEAdzVfsmodw4hyoQAJ0f6P3ijZCEw7IPd/RcrmkI Viv4j17ZyAAcbNUplvjzhr/tIIKYPg51ltvfkp8cGRHgez88QsljzvM8B5n+nbPa jaaI6eiJ3AU1bR8hWYKtlXFwMvRjyr3ofl8hhTvYptGv9x3/Tr+2FwxIRY0413m6 2h95vItsvBs8v7clqLoBEqx9uyUpsH3+J32V4oGubrNAFXh1oOHi4Ban+TOKYaQV GHZaIZ3bVAvcMd5riaWSPUPLHwJnxQ8w6SlVRy2UNUPe+9yTuy4n1GW4vk4WHvop FgZFrM3LBmh1MhlYHRdEUUtwk3mfDuGbfW5UJVMri0Nis1PsXr5VK4qQaMbd/9e6 M2uWKslY9QCnzMajnHen9OwotteAJy2I1KHVcxXb0tFqrvqZ6o/auIe0G4VdKYuI XfNF3mokX93tiSflmphDba6qgB/W+Y6UD2gG2AeFuMGhFF/Hy62pVC6Zx7PKZ3vL Kh27rKkO/0FJau2JCQm5xBiQgCnKghqOiHefY3o+l+Y9kJ8fXKWCuwJ0lJ3LxZ2u 8H6sp6Jm9Ct9L90wSn7VmmI5H3bRe8sa7sylH4BR2T6jP3/tKDYTEeNWj+F9FfO1 FxsjYrjAyv1HxYYKd/Y1svEVSsKMv3a2SR9pF36ynBABdFjvx+oEuCyCO4tspFe6 15eA1QoMKvEQe/Ww5kRC =L9WT -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-01-08 1:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-12-20 1:47 [Bitcoin-development] BIP proposal: Authenticated prefix trees Mark Friedenbach 2013-12-20 6:48 ` Jeremy Spilman 2013-12-20 11:21 ` Mark Friedenbach 2013-12-20 13:17 ` Peter Todd 2013-12-20 18:41 ` Mark Friedenbach 2013-12-20 10:48 ` Peter Todd [not found] ` <52B425BA.6060304@monetize.io> 2013-12-20 12:47 ` Peter Todd 2013-12-20 19:48 ` Gregory Maxwell 2013-12-20 22:04 ` Mark Friedenbach 2014-01-05 18:43 ` Thomas Voegtlin 2014-01-06 18:13 ` Peter Todd 2014-01-07 0:21 ` Mark Friedenbach 2014-01-07 6:31 ` Thomas Voegtlin 2014-01-08 1:04 ` Mark Friedenbach
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox