* [bitcoin-dev] PubRef - Script OP Code For Public Data References @ 2019-07-19 6:05 Mike Brooks 2019-07-19 15:17 ` William Casarin ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Mike Brooks @ 2019-07-19 6:05 UTC (permalink / raw) To: bitcoin-dev [-- Attachment #1: Type: text/plain, Size: 4739 bytes --] I noticed that this Merkle tree we are building is made more expensive by including repetitive data. So, this proposal draws inspiration from LZ78, an algorithm which constructs a dictionary of repetitive strings which are then referred to by their index. What if the blockchain already built a dictionary for us, and now we just need to index it? --- Abstract This BIP describes how a new OP code can be used to construct smaller, more compact transactions. With a public reference (PubRef), a newly created transaction can reuse elements of a previously confirmed transaction by representing this information with a smaller numeric offset or “pointer”. Motivation Giving scripts the ability to refer to data on the blockchain will reduce transaction sizes because key material does not have to be repeated in every Script. Users of the network are rewarded with smaller transaction sizes, and miners are able to fit more transactions into new blocks. Pointers are a common feature and it felt like this was missing from Bitcoin Script. Specification This BIP defines a new Script opcode that can be introduced with BIP-0141 (Segregated Witness (Consensus layer)). This new opcode is as follows: Word Opcode Hex Input Output Description OP_PUBREF4 TBD TBD uint4 data The next four bytes is an integer reference to a previously defined PUSHDATA Let there be an ordered list of all invocations of PUSHDATA (OP codes; 0x4c,0x4d,0x4e) across all scripts starting from the genesis block: [t0,t2,...,tn]. With this list a newly created script can refer to a specific PUSHDATA that was used in any previously confirmed block. The values referenced by this list are immutable and uniform to all observers. Lets assume there is some transaction containing a ScriptSig and outputScript pair, here is what an input and an output script would look like: ScriptSig PUSHDATA(72)[30450221008f906b9fe728cb17c81deccd6704f664ed1ac920223bb2eca918f066269c703302203b1c496fd4c3fa5071262b98447fbca5e3ed7a52efe3da26aa58f738bd342d3101] PUSHDATA (65)[04bca69c59dc7a6d8ef4d3043bdcb626e9e29837b9beb143168938ae8165848bfc788d6ff4cdf1ef843e6a9ccda988b323d12a367dd758261dd27a63f18f56ce77] outputScript DUP HASH160 PUSHDATA(20)[dd6cce9f255a8cc17bda8ba0373df8e861cb866e] EQUALVERIFY CHECKSIG The ScriptSig of an input typically contains the full public key which is ~65 bytes. Outputs will typically contain a hash of the public key which is 20 bytes. Using PubRef, the public-key material shown above no longer need to be repeated, instead the needed values can be resolved from previously confirmed transaction. The signature of the input must still be unique, however, the public key can be replaced with a call to PUBREF, as shown below: ScriptSig PUSHDATA(72)[30450221008f906b9fe728cb17c81deccd6704f664ed1ac920223bb2eca918f066269c703302203b1c496fd4c3fa5071262b98447fbca5e3ed7a52efe3da26aa58f738bd342d3101] PUBREF[43123] outputScript DUP HASH160 PUBREF[12123] EQUALVERIFY CHECKSIG The above call to PUBREF removed the need to include the full public-key, or hash of the public key in a given transaction. This is only possible because these values where used previously in a confirmed block. If for example a user was sending Bitcoins to a newly formed address, then no PUBREF can be created in this case - and a outputScript using PUSHDATA would need to be used at least once. If a newly created wallet has never been used on the Bitcoin network, then the full public-key will need to be included in the ScriptSig. Once these transactions have been confirmed, then these values will be indexed and any future script can refer to these public-key values with a PUBREF operation. PubRef is not susceptible to malleability attacks because the blockchain is immutable. The PUSHDATA operation can store at most 520 bytes on the stack, therefore a single PUBREF operation can never obtain more than 520 bytes. In order for a client to make use of the PUBREF operations they’ll need access to a database that look up public-keys and resolve their PUBREF index. A value can be resolved to an index with a hash-table lookup in O(1) constant time. Additionally, all instances of PUSHDATA can be indexed as an ordered list, resolution of a PUBREF index to the intended value would be an O(1) array lookup. Although the data needed to build and resolve public references is already included with every full node, additional computational effort is needed to build and maintain these indices - a tradeoff which provides smaller transaction sizes and relieving the need to store repetitive data on the blockchain. [-- Attachment #2: Type: text/html, Size: 18581 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-19 6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks @ 2019-07-19 15:17 ` William Casarin 2019-07-19 19:17 ` Pieter Wuille 2019-07-19 17:45 ` Yuval Kogman 2019-07-19 18:07 ` ZmnSCPxj 2 siblings, 1 reply; 13+ messages in thread From: William Casarin @ 2019-07-19 15:17 UTC (permalink / raw) To: Mike Brooks, bitcoin-dev Hello Mike, Mike Brooks via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> writes: > Motivation > > Giving scripts the ability to refer to data on the blockchain will reduce > transaction sizes because key material does not have to be repeated in > every Script. Users of the network are rewarded with smaller transaction > sizes, and miners are able to fit more transactions into new blocks. > Pointers are a common feature and it felt like this was missing from > Bitcoin Script. This would incentivize address re-use which would be bad for fungibility. It appears you're trying to optimize a use case which is already discouraged :( Cheers, Will -- https://jb55.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-19 15:17 ` William Casarin @ 2019-07-19 19:17 ` Pieter Wuille 0 siblings, 0 replies; 13+ messages in thread From: Pieter Wuille @ 2019-07-19 19:17 UTC (permalink / raw) To: William Casarin, Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 1026 bytes --] On Fri, Jul 19, 2019, 12:13 William Casarin via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > > Hello Mike, > > Mike Brooks via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> > writes: > > > Motivation > > > > Giving scripts the ability to refer to data on the blockchain will reduce > > transaction sizes because key material does not have to be repeated in > > every Script. Users of the network are rewarded with smaller transaction > > sizes, and miners are able to fit more transactions into new blocks. > > Pointers are a common feature and it felt like this was missing from > > Bitcoin Script. > > This would incentivize address re-use which would be bad for > fungibility. It appears you're trying to optimize a use case which is > already discouraged :( > Furthermore, right now block validation does not require access to the whole historical chain (only to the set of unspent outputs), so a change like this would massively increase storage requirements for validation. Cheers, -- Pieter [-- Attachment #2: Type: text/html, Size: 1849 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-19 6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks 2019-07-19 15:17 ` William Casarin @ 2019-07-19 17:45 ` Yuval Kogman [not found] ` <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com> 2019-07-19 18:07 ` ZmnSCPxj 2 siblings, 1 reply; 13+ messages in thread From: Yuval Kogman @ 2019-07-19 17:45 UTC (permalink / raw) To: Mike Brooks, Bitcoin Protocol Discussion [-- Attachment #1: Type: text/plain, Size: 1720 bytes --] Hi, On Fri, 19 Jul 2019 at 14:00, Mike Brooks via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: Giving scripts the ability to refer to data on the blockchain will reduce > transaction sizes because key material does not have to be repeated in > every Script. > Given that address reuse is discouraged, and as far as I know is predominantly utilized for customer deposit addresses by exchanges, many of which have not invested resources in batching withdrawals or consolidating small UTXOs, I am skeptical that such a feature would actually be utilized by users for whom a potential use exists, especially as mining fees are usually pushed onto customers anyway. Unless extensively utilized such that costs outweigh benefits, this change would impose an externality on validating nodes: With this list a newly created script can refer to a specific PUSHDATA that > was used in any previously confirmed block. > This would make pruning impossible, and also relaxes the bounds on validation costs since it would allow random reads on all historical data as opposed to just the UTXO set. Although it would do nothing for block capacity, perhaps this redundancy might be better addressed as opt-in functionality in the p2p layer? It might help with IBD, though at least in my experience peer latency (as opposed to throughput) is the limiting factor, and as far as I can tell this would increase it. Somewhat relatedly, transaction IDs are another type of pointer which might benefit from being encoded as a (block height, offset). However, here too it seems to me like the complexity is substantial, potentially introducing new DoS vectors, while saving several bytes per input at most. Regards, Yuval [-- Attachment #2: Type: text/html, Size: 3061 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com>]
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References [not found] ` <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com> @ 2019-07-19 22:48 ` Yuval Kogman 2019-07-24 19:49 ` Mike Brooks 0 siblings, 1 reply; 13+ messages in thread From: Yuval Kogman @ 2019-07-19 22:48 UTC (permalink / raw) To: Mike Brooks, bitcoin-dev [-- Attachment #1: Type: text/plain, Size: 2649 bytes --] Hi, On Fri, 19 Jul 2019 at 17:49, Mike Brooks <m@ib.tc> wrote: > Users will adopt whatever the client accepts - this feature would be > transparent. > My skepticism was based in an assumption on my part that most such data is produced by actors with a track record of neglecting transaction efficiency. I'd be happy to be proven wrong, but considering say usage rates of native witness outputs, or the fraction of transactions with more than 2 outputs so far I see little evidence in support of widespread adoption of cost saving measures. Assuming this is intended as a new script version, all fully validating nodes need to commit to keeping all data indefinitely before they can enforce the rules that make transactions benefiting from this opcode safe to broadcast. That said, if successful, the main concern is still that of address reuse - currently there is no incentive in the protocol to do that, and with BIP32 wallets fewer reasons to do it as well, but this proposal introduces such an incentive for users which would otherwise generate new addresses (disregarding the ones that already reuse addresses anyway), and this is problematic for privacy and fungibility. Since address reuse has privacy concerns, I think it's important to draw a distinction between clients accepting and producing such transactions, if the latter were done transparently that would be very concerning IMO, and even the former would not be transparent for users who run currently pruning nodes. I'm not sure how an O(1) time complexity leads to DoS, that seems like a > very creative jump. > For what it's worth, that was in reference to hypothetical deduplication only at the p2p layer, similar to compact blocks, but on further reflection I'd like to retract that, as since both scenarios which I had in mind seem easily mitigated. Based on this response, it makes me want to calculate the savings, what > if it was a staggering reduction in the tree size? > Assuming past address reuse rates are predictive this only places an upper bound on the potential size savings, so personally I would not find that very compelling. Even if the realized size savings would be substantial, I'm not convinced the benefits outweigh the downsides (increased validation costs, monotonically growing unprunable data, and direct incentive for address reuse), especially when compared to other methods/proposals that can reduce on chain footprint generally improve privacy while reducing validation costs for others (e.g. batching, lightning, MAST for sufficiently large scripts, Schnorr signatures (musig, adaptor signatures), {tap,graft}root, ). Regards, Yuval [-- Attachment #2: Type: text/html, Size: 3789 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-19 22:48 ` Yuval Kogman @ 2019-07-24 19:49 ` Mike Brooks 0 siblings, 0 replies; 13+ messages in thread From: Mike Brooks @ 2019-07-24 19:49 UTC (permalink / raw) To: Yuval Kogman; +Cc: bitcoin-dev [-- Attachment #1: Type: text/plain, Size: 4758 bytes --] Fungibility is a good question for inductive reasoning. After all, what is a claim without some rigger? There exists some set of wallets 'w' which contain a balance of satoshi too small to recover because it doesn't meet the minimum transaction fee required to confirm the transaction. These wallets are un-spendable by their very nature - and quantifying un-spendable wallets is one way to measure the fungibility of Bitcoin. The number of un-spendable wallets can be quantified as follows: Let 'p' equal the current price/per-bit for a transaction Let 'n' equal the number of bits in the transaction Let '[a]' be the set of all wallets with a balance Let '[w]' be the set of un-spendable wallets [w0] = [a] > b*n Now, let's introduce a savings measure 'k' which is a constant flat savings per transaction. [w1] = [a] > b*(n - k0) As the savings 'k' increase, the set of 'w' also increases in size. len([w0]) < len([w1]) ... < len([wk]) In this case 'k0' could be the savings from a P2PKH transaction to a 233 byte SegWit transaction, and 'k1' could be 148 byte SegWit+PubRef transaction, and the same model can be used for some future transaction type that is even smaller. As the savings 'k' approaches infinity the set of un-spendable wallets approaches zero. If a user is privacy-aware, and chooses to use single-use p2sh for all transactions, then these users can still gain from PubRef because block-weight should be decreased for everyone. There are cases where a user or merchant might want to reuse their p2sh hash - and PubRef can provide savings. Additionally, the resulting p2sh script could be a multi-sig transaction which could still benefit from PubRef compression. PubRef improves fungibility of Bitcoin in two different ways - both reduced cost of transaction and increasing the max number of transactions that are able to be confirmed by a block. Which is pretty hot - QED. On Fri, Jul 19, 2019 at 3:48 PM Yuval Kogman <nothingmuch@woobling.org> wrote: > Hi, > > On Fri, 19 Jul 2019 at 17:49, Mike Brooks <m@ib.tc> wrote: > >> Users will adopt whatever the client accepts - this feature would be >> transparent. >> > > My skepticism was based in an assumption on my part that most such data is > produced by actors with a track record of neglecting transaction > efficiency. I'd be happy to be proven wrong, but considering say usage > rates of native witness outputs, or the fraction of transactions with more > than 2 outputs so far I see little evidence in support of widespread > adoption of cost saving measures. Assuming this is intended as a new script > version, all fully validating nodes need to commit to keeping all data > indefinitely before they can enforce the rules that make transactions > benefiting from this opcode safe to broadcast. > > That said, if successful, the main concern is still that of address reuse > - currently there is no incentive in the protocol to do that, and with > BIP32 wallets fewer reasons to do it as well, but this proposal introduces > such an incentive for users which would otherwise generate new addresses > (disregarding the ones that already reuse addresses anyway), and this is > problematic for privacy and fungibility. > > Since address reuse has privacy concerns, I think it's important to draw a > distinction between clients accepting and producing such transactions, if > the latter were done transparently that would be very concerning IMO, and > even the former would not be transparent for users who run currently > pruning nodes. > > I'm not sure how an O(1) time complexity leads to DoS, that seems like a >> very creative jump. >> > > For what it's worth, that was in reference to hypothetical deduplication > only at the p2p layer, similar to compact blocks, but on further reflection > I'd like to retract that, as since both scenarios which I had in mind seem > easily mitigated. > > Based on this response, it makes me want to calculate the savings, what >> if it was a staggering reduction in the tree size? >> > > Assuming past address reuse rates are predictive this only places an upper > bound on the potential size savings, so personally I would not find that > very compelling. Even if the realized size savings would be substantial, > I'm not convinced the benefits outweigh the downsides (increased validation > costs, monotonically growing unprunable data, and direct incentive for > address reuse), especially when compared to other methods/proposals that > can reduce on chain footprint generally improve privacy while reducing > validation costs for others (e.g. batching, lightning, MAST for > sufficiently large scripts, Schnorr signatures (musig, adaptor signatures), > {tap,graft}root, ). > > Regards, > Yuval > [-- Attachment #2: Type: text/html, Size: 6471 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-19 6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks 2019-07-19 15:17 ` William Casarin 2019-07-19 17:45 ` Yuval Kogman @ 2019-07-19 18:07 ` ZmnSCPxj 2019-07-27 20:03 ` Mike Brooks 2 siblings, 1 reply; 13+ messages in thread From: ZmnSCPxj @ 2019-07-19 18:07 UTC (permalink / raw) To: Mike Brooks, Bitcoin Protocol Discussion Good morning Mike, > PubRef is not susceptible to malleability attacks because the blockchain is immutable. This is not quite accurate. While very old blocks are indeed immutable-in-practice, chain tips are not, and are often replaced. At least specify that such data can only be referred to if buried under 100 blocks. -- There are a number of other issues: * It strongly encourages pubkey reuse, reducing privacy. * There is a design-decision wherein a SCRIPT can only access data in the transaction that triggers its execution. In particular, it cannot access data in the block the transaction is in, or in past blocks. For example, `OP_CHECKLOCKTIMEVERIFY` does not check the blockheight of the block that the transaction is confirmed in, but instead checks only `nLockTime`, a field in the transaction. * This lets us run SCRIPT in isolation on a transaction, exactly one time, when the transaction is about to be put into our mempool. When a new block arrives, transactions in our mempool that are in the block do not need to have their SCRIPTs re-executed or re-validated. > In order for a client to make use of the PUBREF operations they’ll need access to a database that look up public-keys and resolve their PUBREF index. A value can be resolved to an index with a hash-table lookup in O(1) constant time. Additionally, all instances of PUSHDATA can be indexed as an ordered list, resolution of a PUBREF index to the intended value would be an O(1) array lookup. Although the data needed to build and resolve public references is already included with every full node, additional computational effort is needed to build and maintain these indices - a tradeoff which provides smaller transaction sizes and relieving the need to store repetitive data on the blockchain. This is not only necessary at the creator of the transaction --- it is also necessary at every validator. In particular, consider existing pruning nodes, which cannot refer to previous block data. We would need to have another new database containing every `PUSHDATA` in existence. And existing pruning nodes would need to restart from genesis, as this database would not exist yet. Regards, ZmnSCPxj ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-19 18:07 ` ZmnSCPxj @ 2019-07-27 20:03 ` Mike Brooks 2019-07-29 1:46 ` ZmnSCPxj 0 siblings, 1 reply; 13+ messages in thread From: Mike Brooks @ 2019-07-27 20:03 UTC (permalink / raw) To: ZmnSCPxj, pieter.wuille; +Cc: Bitcoin Protocol Discussion [-- Attachment #1: Type: text/plain, Size: 4765 bytes --] Hey ZmnSCPxj, As to your first point. I wasn't aware there was so much volatility at the tip, also 100 blocks is quite the difference! I agree no one could references a transaction in a newly formed blocks, but I'm curious how this number was chosen. Do you have any documentation or code that you can share related to how re-orgs are handled? Do we have a kind of 'consensus checkpoint' when a re-org is no longer possible? This is a very interesting topic. > * It strongly encourages pubkey reuse, reducing privacy. Privacy-aware users are free to have single-use p2sh transactions, and they are free to use the same SCRIPT opcodes we have now. Adding an extra opcode helps with the utility of SCRIPT by compressing the smallest SegWit transactions by a further 40% from 233 bytes to 148 bytes. Cost savings is a great utility - and it need not undermine anyones privacy. The resulting p2sh SCRIPT could end up using public key material that could be compressed with a PubRef - everyone wins. > * There is a design-decision wherein a SCRIPT can only access data in the transaction that triggers its execution. In order for a compression algorithm like LZ78 to be written in a stack-based language like SCRIPT, there needs to be pointer arithmetic to refer back to the dictionary or a larger application state. If Bitcoin's entire stack was made available to the SCRIPT language as an application state, then LZ78-like compression could be accomplished using PubRef. If a Script can reuse a PUSHDATA, then transactions will be less repetitious... and this isn't free. There is a cost in supporting this opcode. Giving the SCRIPT language access to more data opens the door for interesting algorithms, not just LZ78. This is interesting to discuss how this application state could be brought to the language. It strikes me that your concerns(ZmnSCPxj), as well as the topic of pruning brought up by others (including Pieter Wuille) could be fixed by the creation of a side-chain of indexes. A validator would not need a hash table which is only needed for O(1) PUBREF creation, these nodes need not be burdened with this added index. A validator only needs an array of PUSHDATA elements and can then validate any given SCRIPT at O(1). Just a thought. Best Regards, Mike On Fri, Jul 19, 2019 at 11:08 AM ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote: > Good morning Mike, > > > PubRef is not susceptible to malleability attacks because the blockchain > is immutable. > > This is not quite accurate. > While very old blocks are indeed immutable-in-practice, chain tips are > not, and are often replaced. > At least specify that such data can only be referred to if buried under > 100 blocks. > > -- > > There are a number of other issues: > > * It strongly encourages pubkey reuse, reducing privacy. > * There is a design-decision wherein a SCRIPT can only access data in the > transaction that triggers its execution. > In particular, it cannot access data in the block the transaction is in, > or in past blocks. > For example, `OP_CHECKLOCKTIMEVERIFY` does not check the blockheight of > the block that the transaction is confirmed in, but instead checks only > `nLockTime`, a field in the transaction. > * This lets us run SCRIPT in isolation on a transaction, exactly one > time, when the transaction is about to be put into our mempool. > When a new block arrives, transactions in our mempool that are in the > block do not need to have their SCRIPTs re-executed or re-validated. > > > In order for a client to make use of the PUBREF operations they’ll need > access to a database that look up public-keys and resolve their PUBREF > index. A value can be resolved to an index with a hash-table lookup in > O(1) constant time. Additionally, all instances of PUSHDATA can be indexed > as an ordered list, resolution of a PUBREF index to the intended value > would be an O(1) array lookup. Although the data needed to build and > resolve public references is already included with every full node, > additional computational effort is needed to build and maintain these > indices - a tradeoff which provides smaller transaction sizes and relieving > the need to store repetitive data on the blockchain. > > This is not only necessary at the creator of the transaction --- it is > also necessary at every validator. > > In particular, consider existing pruning nodes, which cannot refer to > previous block data. > > We would need to have another new database containing every `PUSHDATA` in > existence. > And existing pruning nodes would need to restart from genesis, as this > database would not exist yet. > > Regards, > ZmnSCPxj > [-- Attachment #2: Type: text/html, Size: 5264 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-27 20:03 ` Mike Brooks @ 2019-07-29 1:46 ` ZmnSCPxj 2019-07-29 2:19 ` Mike Brooks 0 siblings, 1 reply; 13+ messages in thread From: ZmnSCPxj @ 2019-07-29 1:46 UTC (permalink / raw) To: Mike Brooks; +Cc: Bitcoin Protocol Discussion, pieter.wuille Good morning Mike, Sent with ProtonMail Secure Email. ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Sunday, July 28, 2019 4:03 AM, Mike Brooks <m@ib.tc> wrote: > Hey ZmnSCPxj, > > As to your first point. I wasn't aware there was so much volatility at the tip, also 100 blocks is quite the difference! I agree no one could references a transaction in a newly formed blocks, but I'm curious how this number was chosen. Do you have any documentation or code that you can share related to how re-orgs are handled? Do we have a kind of 'consensus checkpoint' when a re-org is no longer possible? This is a very interesting topic. > Miner coinbases need 100 blocks for maturity, which is the basis of my suggestion to use 100 blocks. It might be too high, but I doubt there will be good reason to be less than 100. There is a potential for a targeted attack where a large payout going to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction finds that recently-confirmed transaction is replaced with one that pays to a different public key, via a history-rewrite attack. Such an attack is doable by miners, and if we consider that we accept 100 blocks for miner coinbase maturity as "acceptably low risk" against miner shenanigans, then we might consider that 100 blocks might be acceptable for this also. Whether 100 is too high or not largely depends on your risk appetite. > A validator only needs an array of PUSHDATA elements and can then validate any given SCRIPT at O(1). Data derived from > 220Gb of perpetually-growing blockchain is hardly, to my mind, "only needs an array". Such an array would not fit in memory for many devices that today are practical for running fullnodes. It is keeping that array and indexing it which is the problem, i.e. the devil in the details. Reiterating also, current pruned nodes did not retain that data and would be forced to re-download the entire blockchain. Unless you propose that we can refer only to `OP_PUSHDATA` after activation of `OP_PUSHREF`. Regards, ZmnSCPxj ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-29 1:46 ` ZmnSCPxj @ 2019-07-29 2:19 ` Mike Brooks 2019-07-29 2:49 ` ZmnSCPxj 0 siblings, 1 reply; 13+ messages in thread From: Mike Brooks @ 2019-07-29 2:19 UTC (permalink / raw) To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion, pieter.wuille [-- Attachment #1: Type: text/plain, Size: 4072 bytes --] ZmnSCPxj, I think that this implication affects other applications built on the blockchain, not just the PubRef proposal: > There is a potential for a targeted attack where a large payout going to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction finds that recently-confirmed transaction is replaced with one that pays to a different public key, via a history-rewrite attack. > Such an attack is doable by miners, and if we consider that we accept 100 blocks for miner coinbase maturity as "acceptably low risk" against miner shenanigans, then we might consider that 100 blocks might be acceptable for this also. > Whether 100 is too high or not largely depends on your risk appetite. I agree 100% this attack is unexpected and very interesting. However, I find the arbitrary '100' to be unsatisfying - I'll have to do some more digging. It would be interesting to trigger this on the testnet to see what happens. Do you know if anyone has pushed these limits? I am so taken by this attack I might attempt it. > Data derived from > 220Gb of perpetually-growing blockchain is hardly, to my mind, "only needs an array". There are other open source projects that have to deal with larger data sets and have accounted for the real-world limits on computability. Apache HTTPD's Bucket-Brigade comes to mind, which has been well tested and can account for limited RAM when accessing linear data structures. For a more general purpose utility leveldb (bsd-license) provides random access to arbitrary data collections. Pruning can also be a real asset for PubRef. If all transactions for a wallet have been pruned, then there is no need to index this PubRef - a validator can safely skip over it. Best Regards, Mike On Sun, Jul 28, 2019 at 6:46 PM ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote: > Good morning Mike, > > > Sent with ProtonMail Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Sunday, July 28, 2019 4:03 AM, Mike Brooks <m@ib.tc> wrote: > > > Hey ZmnSCPxj, > > > > As to your first point. I wasn't aware there was so much volatility at > the tip, also 100 blocks is quite the difference! I agree no one could > references a transaction in a newly formed blocks, but I'm curious how this > number was chosen. Do you have any documentation or code that you can share > related to how re-orgs are handled? Do we have a kind of 'consensus > checkpoint' when a re-org is no longer possible? This is a very interesting > topic. > > > > Miner coinbases need 100 blocks for maturity, which is the basis of my > suggestion to use 100 blocks. > It might be too high, but I doubt there will be good reason to be less > than 100. > > There is a potential for a targeted attack where a large payout going to a > `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction > finds that recently-confirmed transaction is replaced with one that pays to > a different public key, via a history-rewrite attack. > Such an attack is doable by miners, and if we consider that we accept 100 > blocks for miner coinbase maturity as "acceptably low risk" against miner > shenanigans, then we might consider that 100 blocks might be acceptable for > this also. > Whether 100 is too high or not largely depends on your risk appetite. > > > A validator only needs an array of PUSHDATA elements and can then > validate any given SCRIPT at O(1). > > Data derived from > 220Gb of perpetually-growing blockchain is hardly, to > my mind, "only needs an array". > Such an array would not fit in memory for many devices that today are > practical for running fullnodes. > It is keeping that array and indexing it which is the problem, i.e. the > devil in the details. > > Reiterating also, current pruned nodes did not retain that data and would > be forced to re-download the entire blockchain. > Unless you propose that we can refer only to `OP_PUSHDATA` after > activation of `OP_PUSHREF`. > > Regards, > ZmnSCPxj > [-- Attachment #2: Type: text/html, Size: 4664 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-29 2:19 ` Mike Brooks @ 2019-07-29 2:49 ` ZmnSCPxj 2019-07-29 3:07 ` Mike Brooks 0 siblings, 1 reply; 13+ messages in thread From: ZmnSCPxj @ 2019-07-29 2:49 UTC (permalink / raw) To: Mike Brooks; +Cc: Bitcoin Protocol Discussion, pieter.wuille Good morning Mike, > I think that this implication affects other applications built on the blockchain, not just the PubRef proposal: > I believe not? Current applications use txids to refer to previous transactions, so even a short-ranged history rewrite will mostly not affect them --- they can just rebroadcast the transactions they are spending and get those reconfirmed again. There is admittedly a risk of double-spending, but each individual application can just spend deeply-confirmed transactions, and tune what it considers "deeply-confirmed" depending on how large the value being spent is. The point is that history rewrites are costly, but if the value being put in a `scriptPubKey` that uses `OP_PUBREF` is large enough, it may justify the cost of history rewrites --- but if the value is small, the individual application (which refers to transactions by their txid anyway) can generally assume miners will not bother to history-rewrite. Since `OP_PUBREF` would be a consensus rule, we need to select a "deeply-confirmed" point that is deep enough for *all* cases, unlike applications **on top of the blockchain** which can tune their rule of "deeply-confirmed" based on value. Thus my suggestion to use 100, which we consider "deep enough" to risk allowing miners to sell their coins. Lightning uses a "short channel ID" which is basically an index of block number + index of transaction + index of output to refer to channels. This is not a problem, however, even in case of short-ranged history rewrites. The short channel ID is only used for public routing. Between the channel counterparties, no security is based on short channel ID being stable; it just loses you potential routing fees from the channel (and can be fixed by increasing your "deeply-confirmed" enough level before you announce the channel for public routing). > > There is a potential for a targeted attack where a large payout going to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction finds that recently-confirmed transaction is replaced with one that pays to a different public key, via a history-rewrite attack. > > Such an attack is doable by miners, and if we consider that we accept 100 blocks for miner coinbase maturity as "acceptably low risk" against miner shenanigans, then we might consider that 100 blocks might be acceptable for this also. > > Whether 100 is too high or not largely depends on your risk appetite. > > I agree 100% this attack is unexpected and very interesting. It is precisely because of this possibility that we tend to avoid making SCRIPT validity dependent on anything that is not in the transaction. We would have to re-evaluate the SCRIPT every time there is a chain tip reorganization (increasing validation CPU load), unless we do something like "only allow `OP_PUBREF` to data that is more than 100 blocks confirmed". > However, I find the arbitrary '100' to be unsatisfying - I'll have to do some more digging. It would be interesting to trigger this on the testnet to see what happens. Do you know if anyone has pushed these limits? I am so taken by this attack I might attempt it. > > > Data derived from > 220Gb of perpetually-growing blockchain is hardly, to my mind, "only needs an array". > > There are other open source projects that have to deal with larger data sets and have accounted for the real-world limits on computability. Apache HTTPD's Bucket-Brigade comes to mind, which has been well tested and can account for limited RAM when accessing linear data structures. For a more general purpose utility leveldb (bsd-license) provides random access to arbitrary data collections. Which is the point: we need to use something, the details need to be considered during implementation, implementation details may leak in the effective spec (e.g. DER-encoding), etc. > Pruning can also be a real asset for PubRef. If all transactions for a wallet have been pruned, then there is no need to index this PubRef - a validator can safely skip over it. What? The problem with transaction being pruned is that the data in them might now be used in a *future* `OP_PUBREF`. Further, pruned nodes are still full validators --- transactions may be pruned, but the pruned node will ***still*** validate any `OP_PUBREF` it uses, because it is still a full validator, it just does not archive old blocks in local storage. Regards, ZmnSCPxj ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-29 2:49 ` ZmnSCPxj @ 2019-07-29 3:07 ` Mike Brooks 2019-07-29 3:39 ` ZmnSCPxj 0 siblings, 1 reply; 13+ messages in thread From: Mike Brooks @ 2019-07-29 3:07 UTC (permalink / raw) To: ZmnSCPxj, bitcoin-dev [-- Attachment #1: Type: text/plain, Size: 5712 bytes --] ZmnSCPxj, > Lightning uses a "short channel ID" which is basically an index of block number + index of transaction + index of output to refer to channels. Oh wow, this is very similar to the PUBREF proposal. In fact the OP_PUBREF4 operation could be modified to take the tuple: (block number, index of transaction, index of PUSHDATA) and it would be functionally equivalent. It looks like the construction of the short channel ID was chosen for the performance needed to resolve the lookup. > The problem with transaction being pruned is that the data in them might now be used in a *future* `OP_PUBREF`. I can see how pruning is needed for scalability, and pruning can be made compatible with a reference to a transaction. If a transaction is pruned, then the key material used in a prune'ed block's PUSHDATA operations are of no value. A user of the network shouldn't need to make this kind of PUBREF, and if a user did want to bring a wallet back from the dead - then the utility of PUBREF wouldn't be available to them. Best Regards, Mike On Sun, Jul 28, 2019 at 7:49 PM ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote: > Good morning Mike, > > > I think that this implication affects other applications built on the > blockchain, not just the PubRef proposal: > > > > I believe not? > Current applications use txids to refer to previous transactions, so even > a short-ranged history rewrite will mostly not affect them --- they can > just rebroadcast the transactions they are spending and get those > reconfirmed again. > There is admittedly a risk of double-spending, but each individual > application can just spend deeply-confirmed transactions, and tune what it > considers "deeply-confirmed" depending on how large the value being spent > is. > The point is that history rewrites are costly, but if the value being put > in a `scriptPubKey` that uses `OP_PUBREF` is large enough, it may justify > the cost of history rewrites --- but if the value is small, the individual > application (which refers to transactions by their txid anyway) can > generally assume miners will not bother to history-rewrite. > > Since `OP_PUBREF` would be a consensus rule, we need to select a > "deeply-confirmed" point that is deep enough for *all* cases, unlike > applications **on top of the blockchain** which can tune their rule of > "deeply-confirmed" based on value. > Thus my suggestion to use 100, which we consider "deep enough" to risk > allowing miners to sell their coins. > > Lightning uses a "short channel ID" which is basically an index of block > number + index of transaction + index of output to refer to channels. > This is not a problem, however, even in case of short-ranged history > rewrites. > The short channel ID is only used for public routing. > Between the channel counterparties, no security is based on short channel > ID being stable; it just loses you potential routing fees from the channel > (and can be fixed by increasing your "deeply-confirmed" enough level before > you announce the channel for public routing). > > > > There is a potential for a targeted attack where a large payout going > to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed > transaction finds that recently-confirmed transaction is replaced with one > that pays to a different public key, via a history-rewrite attack. > > > Such an attack is doable by miners, and if we consider that we accept > 100 blocks for miner coinbase maturity as "acceptably low risk" against > miner shenanigans, then we might consider that 100 blocks might be > acceptable for this also. > > > Whether 100 is too high or not largely depends on your risk appetite. > > > > I agree 100% this attack is unexpected and very interesting. > > It is precisely because of this possibility that we tend to avoid making > SCRIPT validity dependent on anything that is not in the transaction. > We would have to re-evaluate the SCRIPT every time there is a chain tip > reorganization (increasing validation CPU load), unless we do something > like "only allow `OP_PUBREF` to data that is more than 100 blocks > confirmed". > > > However, I find the arbitrary '100' to be unsatisfying - I'll have to > do some more digging. It would be interesting to trigger this on the > testnet to see what happens. Do you know if anyone has pushed these > limits? I am so taken by this attack I might attempt it. > > > > > Data derived from > 220Gb of perpetually-growing blockchain is > hardly, to my mind, "only needs an array". > > > > There are other open source projects that have to deal with larger data > sets and have accounted for the real-world limits on computability. Apache > HTTPD's Bucket-Brigade comes to mind, which has been well tested and can > account for limited RAM when accessing linear data structures. For a more > general purpose utility leveldb (bsd-license) provides random access to > arbitrary data collections. > > Which is the point: we need to use something, the details need to be > considered during implementation, implementation details may leak in the > effective spec (e.g. DER-encoding), etc. > > > Pruning can also be a real asset for PubRef. If all transactions for a > wallet have been pruned, then there is no need to index this PubRef - a > validator can safely skip over it. > > What? > The problem with transaction being pruned is that the data in them might > now be used in a *future* `OP_PUBREF`. > > Further, pruned nodes are still full validators --- transactions may be > pruned, but the pruned node will ***still*** validate any `OP_PUBREF` it > uses, because it is still a full validator, it just does not archive old > blocks in local storage. > > Regards, > ZmnSCPxj > [-- Attachment #2: Type: text/html, Size: 6417 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References 2019-07-29 3:07 ` Mike Brooks @ 2019-07-29 3:39 ` ZmnSCPxj 0 siblings, 0 replies; 13+ messages in thread From: ZmnSCPxj @ 2019-07-29 3:39 UTC (permalink / raw) To: Mike Brooks; +Cc: bitcoin-dev Good morning Mike, > > The problem with transaction being pruned is that the data in them might now be used in a *future* `OP_PUBREF`. > > I can see how pruning is needed for scalability, and pruning can be made compatible with a reference to a transaction. If a transaction is pruned, then the key material used in a prune'ed block's PUSHDATA operations are of no value. A user of the network shouldn't need to make this kind of PUBREF, and if a user did want to bring a wallet back from the dead - then the utility of PUBREF wouldn't be available to them. I believe you misunderstand the point completely. Currently Bitcoin Core has a mode, called pruning, where: 1. It validates ***all*** transactions. 2. It throws away the transaction data of a block ***after*** validation. 3. It keeps only the UTXO set of ***all*** addresses, not just a specific wallet. Given the above, if a transaction block 1000 `OP_PUBREF`s to a `OP_PUSHDATA` in block 100, how does the pruned validator get access to this data (Which was pruned after the pruned validator handled block 100)? Note that pruned nodes ***are*** fullnodes and validate all transactions, not just those that involve their wallet. Regards, ZmnSCPxj ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-07-29 3:39 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-19 6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks 2019-07-19 15:17 ` William Casarin 2019-07-19 19:17 ` Pieter Wuille 2019-07-19 17:45 ` Yuval Kogman [not found] ` <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com> 2019-07-19 22:48 ` Yuval Kogman 2019-07-24 19:49 ` Mike Brooks 2019-07-19 18:07 ` ZmnSCPxj 2019-07-27 20:03 ` Mike Brooks 2019-07-29 1:46 ` ZmnSCPxj 2019-07-29 2:19 ` Mike Brooks 2019-07-29 2:49 ` ZmnSCPxj 2019-07-29 3:07 ` Mike Brooks 2019-07-29 3:39 ` ZmnSCPxj
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox