* [Bitcoin-development] NODE_BLOOM service bit @ 2014-06-06 8:19 Peter Todd 2014-06-06 8:48 ` Adam Back 0 siblings, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-06 8:19 UTC (permalink / raw) To: bitcoin-development, Gregory Maxwell [-- Attachment #1: Type: text/plain, Size: 1740 bytes --] BIP: https://github.com/petertodd/bips/blob/bip-node-bloom/bip-node-bloom.mediawiki Pull-req: https://github.com/bitcoin/bitcoin/pull/2900 Pretty simple really: service bit NODE_BLOOM is defined to allow nodes to advertise to their peers that they support bloom filters. The network protocol version number is also bumped. Recommended behavior for nodes that do not support bloom is to simply disconnect peers who send a filter* message anyway to let them quickly find another peer. Rational: Bloom filters are not always supported or appropriate. For instance no node implementation other than Bitcoin Core supports them, e.g btcd and obelisk. (which ironically implement this BIP already, modulo the version number bump) In the long term bloom filters will be obsoleted eventually as they have poor scaling properties - problematic with blocksize increases - and are incompatible with UTXO/TXO commitments, which will use prefix-based filtering instead. (already being implemented in electrum and obelisk) In the short term bloom filters have high IO loads, which have lead to DoS attacks, and are not an optimal use of resources for nodes which are IO constrained rather than bandwidth constrained. (common in VPS setups which could better help the network by serving full blocks) Adding NODE_BLOOM as a service bit now will save us some hassle later down the road, reflects what actual implementations do anyway, has been already deployed on Litecoin, Dogecoin, and a zillion other alts with no issues, (including SPV client support) and is a trivial patch. Gregory Maxwell: Please assign a BIP # -- 'peter'[:-1]@petertodd.org 000000000000000049066dab2483c9e069656239f5782da204bd4995bd92c19f [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] NODE_BLOOM service bit 2014-06-06 8:19 [Bitcoin-development] NODE_BLOOM service bit Peter Todd @ 2014-06-06 8:48 ` Adam Back 2014-06-06 9:03 ` Gregory Maxwell 2014-06-06 9:04 ` Peter Todd 0 siblings, 2 replies; 20+ messages in thread From: Adam Back @ 2014-06-06 8:48 UTC (permalink / raw) To: Peter Todd; +Cc: bitcoin-development Advertising NODE BLOOM as a service sounds good. But the critique of bloom filters, I am not so sure prefix filters are better. Prefix filters offer questionable privacy tradeoffs in my opinion. Same problem as with stealth address proposed use of prefixes. All for scalability, efficiency and decentralization but not ideally at the expense of nuking privacy. The effects on privacy are cumulative, and affect everyone not just the user. Same pattern of local decision, global effect as with reused addresses. Adam On Fri, Jun 06, 2014 at 04:19:33AM -0400, Peter Todd wrote: >In the short term bloom filters have high IO loads, which have lead to >DoS attacks, and are not an optimal use of resources for nodes which are >IO constrained rather than bandwidth constrained. (common in VPS setups >which could better help the network by serving full blocks) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] NODE_BLOOM service bit 2014-06-06 8:48 ` Adam Back @ 2014-06-06 9:03 ` Gregory Maxwell 2014-06-06 9:11 ` Peter Todd 2014-06-06 9:04 ` Peter Todd 1 sibling, 1 reply; 20+ messages in thread From: Gregory Maxwell @ 2014-06-06 9:03 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Development On Fri, Jun 6, 2014 at 1:48 AM, Adam Back <adam@cypherspace.org> wrote: > Advertising NODE BLOOM as a service sounds good. > > But the critique of bloom filters, I am not so sure prefix filters are > better. Prefix filters offer questionable privacy tradeoffs in my opinion. > Same problem as with stealth address proposed use of prefixes. > > All for scalability, efficiency and decentralization but not ideally at the > expense of nuking privacy. The effects on privacy are cumulative, and > affect everyone not just the user. Same pattern of local decision, global > effect as with reused addresses. The performance Bytecoin/Monero/Fantom/etc. systems that use ECDH addresses for all transactions seem to be suggesting that the prefixes aren't really needed. At least with current system rules doing the ECDH for each transaction seems pretty reasonable. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] NODE_BLOOM service bit 2014-06-06 9:03 ` Gregory Maxwell @ 2014-06-06 9:11 ` Peter Todd 0 siblings, 0 replies; 20+ messages in thread From: Peter Todd @ 2014-06-06 9:11 UTC (permalink / raw) To: Gregory Maxwell; +Cc: Bitcoin Development [-- Attachment #1: Type: text/plain, Size: 1449 bytes --] On Fri, Jun 06, 2014 at 02:03:29AM -0700, Gregory Maxwell wrote: > On Fri, Jun 6, 2014 at 1:48 AM, Adam Back <adam@cypherspace.org> wrote: > > Advertising NODE BLOOM as a service sounds good. > > > > But the critique of bloom filters, I am not so sure prefix filters are > > better. Prefix filters offer questionable privacy tradeoffs in my opinion. > > Same problem as with stealth address proposed use of prefixes. > > > > All for scalability, efficiency and decentralization but not ideally at the > > expense of nuking privacy. The effects on privacy are cumulative, and > > affect everyone not just the user. Same pattern of local decision, global > > effect as with reused addresses. > > The performance Bytecoin/Monero/Fantom/etc. systems that use ECDH > addresses for all transactions seem to be suggesting that the prefixes > aren't really needed. > > At least with current system rules doing the ECDH for each transaction > seems pretty reasonable. Yup. Obelisk's indexing is sufficiently fast that they hadn't even bothered making Dark Wallet store transaction information between sessions until recently. Prefix brute-forcing isn't yet implemented, although prefix filters is being implemented for lookups in general. (at the very least it gives the server operators some valuable plausible deniability) -- 'peter'[:-1]@petertodd.org 00000000000000003a68ee16d702ca5dd5547fb4aead910a004747cb06241dd6 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] NODE_BLOOM service bit 2014-06-06 8:48 ` Adam Back 2014-06-06 9:03 ` Gregory Maxwell @ 2014-06-06 9:04 ` Peter Todd 2014-06-06 10:45 ` Adam Back 1 sibling, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-06 9:04 UTC (permalink / raw) To: Adam Back; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 1707 bytes --] On Fri, Jun 06, 2014 at 10:48:52AM +0200, Adam Back wrote: > Advertising NODE BLOOM as a service sounds good. > > But the critique of bloom filters, I am not so sure prefix filters are > better. Prefix filters offer questionable privacy tradeoffs in my > opinion. Same problem as with stealth address proposed use of > prefixes. That's assuming you're doing the proposed prefix brute forcing - if you don't do that they have privacy equal or better than bloom filters, but with better scalability. In particular that better scalability lets you efficiently query multiple servers for blockchain data, only giving up info on a subset of the addresses in your wallet to each server. This can be a significant improvement to bloom filters if your attacker is running logging nodes to try to, say, deanonymize CoinJoin transactions. > All for scalability, efficiency and decentralization but not ideally at the > expense of nuking privacy. The effects on privacy are cumulative, and > affect everyone not just the user. Same pattern of local decision, global > effect as with reused addresses. Indeed. But again, remember that the existing systems suck too; prefix-brute forcing is a engineering tradeoff implementable with existing and well understood technology. Now if you want to come up with something better and write code, please do! I'm sure the math exists; what doesn't exist is robust and well tested code in multiple languages. Stealth addresses at least have been designed so that future blockchain filter upgrades can be added in a backwards compatible way. -- 'peter'[:-1]@petertodd.org 00000000000000003a68ee16d702ca5dd5547fb4aead910a004747cb06241dd6 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] NODE_BLOOM service bit 2014-06-06 9:04 ` Peter Todd @ 2014-06-06 10:45 ` Adam Back 2014-06-06 16:46 ` [Bitcoin-development] Bloom bait Peter Todd 0 siblings, 1 reply; 20+ messages in thread From: Adam Back @ 2014-06-06 10:45 UTC (permalink / raw) To: Peter Todd; +Cc: bitcoin-development On Fri, Jun 06, 2014 at 05:04:41AM -0400, Peter Todd wrote: >On Fri, Jun 06, 2014 at 10:48:52AM +0200, Adam Back wrote: >> Prefix filters offer questionable privacy tradeoffs in my opinion. Same >> problem as with stealth address proposed use of prefixes. > >That's assuming you're doing the proposed prefix brute forcing As I recall prefix brute forcing was a bit twiddle saving, ie searching for EDH key that has the users public prefix. That does not improve privacy over an explicit prefix, it saves a byte or so at the expense of average 128 EDH exchanges to send and provides also some probably relatively ineffective plausible deniability by enabling the transaction to be indistinguishable from some other (not very common) types of transaction. >don't do that they have privacy equal or better than bloom filters, but >with better scalability. either its full node only where prefixes are not used, which is less scalable than bloom; or prefixes are used explicitly or implicitly (brute-force) and either way privacy is weakened by the extra correlation hook provided by elimination from the network graph of payments with mismatched prefixes. >In particular that better scalability lets you efficiently query multiple >servers for blockchain data, only giving up info on a subset of the >addresses in your wallet to each server. This can be a significant >improvement to bloom filters if your attacker is running logging nodes to >try to, say, deanonymize CoinJoin transactions. We need to consider the two types of privacy involved. Privacy from the full node an SPV client is relying on to find its payments, vs privacy from analysis of the public transaction graph. The latter is more damaging. Better to design for privacy against future analysis of public info, than privacy by argument to select non-hostile nodes. Tor has changed recently to account for the fact that chosing fresh random nodes is actually worse. ie you have a probability of identity/address identification per route/node, and repeatedly selecting routes/nodes just cumulatively increases the chance you'll be identified. Better to pick a random node, identify it and stick to it and hope you chose well. >Now if you want to come up with something better and write code, please >do! I'm sure the math exists; what doesn't exist is robust and well >tested code in multiple languages. Maybe other simpler, but yes there was the proof of concept that the math exists in the form of the weil pairing. https://bitcointalk.org/index.php?topic=431756.new But what problem are we trying to solve here? Is it an immediate problem? Maybe better to figure out a more privacy compatible solution which will take longer, than let coding drive protocol. Adam ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-06 10:45 ` Adam Back @ 2014-06-06 16:46 ` Peter Todd 2014-06-06 16:58 ` Gregory Maxwell 0 siblings, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-06 16:46 UTC (permalink / raw) To: Adam Back; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 4080 bytes --] On Fri, Jun 06, 2014 at 12:45:43PM +0200, Adam Back wrote: (changed subject line as this discussion has nothing to do with NODE_BLOOM) > As I recall prefix brute forcing was a bit twiddle saving, ie searching for > EDH key that has the users public prefix. That does not improve privacy > over an explicit prefix, it saves a byte or so at the expense of average 128 > EDH exchanges to send and provides also some probably relatively ineffective > plausible deniability by enabling the transaction to be indistinguishable > from some other (not very common) types of transaction. I think you should re-read my original proposal; there's a whole host of misunderstandings above, for instance I have no idea where you got the idea that it has anything to do with "saving a byte" came from, or where the number 128 came from. > >don't do that they have privacy equal or better than bloom filters, but > >with better scalability. > > either its full node only where prefixes are not used, which is less > scalable than bloom; or prefixes are used explicitly or implicitly > (brute-force) and either way privacy is weakened by the extra correlation > hook provided by elimination from the network graph of payments with > mismatched prefixes. Again, you have a misunderstanding. Both bloom filters and prefix filters are just ways of giving a peer a probabalistic filter to match transactions against. Where they differ is that bloom filters has O(n) scaling, where n is the size of a block, and prefix filters have O(log n) scaling with slightly(1) higher k. Again, if you *don't* use brute forcing in conjunction with prefixes they have no different transactional graph privacy than bloom filters, but the better scalability lets you do things like split your queries across multiple peers that give you better protection against hostile nodes. Additionally prefix filters are compatible with future miner committed indexes to make the data authenticated. 1) see Amir's experience implementing prefix lookup in Obelisk > We need to consider the two types of privacy involved. Privacy from the > full node an SPV client is relying on to find its payments, vs privacy from > analysis of the public transaction graph. Agreed > The latter is more damaging. Maybe! If adversaries are operating a significant fraction of the peers you are connecting to the current design of bloom filters + HD wallets results in situations where those adversaries have better transactional graph information than the alternative. > Better to design for privacy against future analysis of > public info, than > privacy by argument to select non-hostile nodes. Tor has changed recently > to account for the fact that chosing fresh random nodes is actually > worse. ie you have a probability of identity/address identification > per route/node, > and repeatedly selecting routes/nodes just cumulatively increases the chance > you'll be identified. Better to pick a random node, identify it and stick > to it and hope you chose well. That's basically what Electrum and Obelisk already do - by default you connect to a relatively small set of blockchain data servers operated by well known people and use the same server repeatedly. Applying that to the P2P network however is tricky as there is a huge amount of churn in the nodes: #bitcoin-dev/14/06/14-06-06.log:11:18 < hearn> bitcoinj can't use [service bits] as it relies on DNS seeds and that is unlikely to change any time soon due to the general churn rate in the network making it hard to bootstrap quickly using just remembered sets of IPs. > Maybe other simpler, but yes there was the proof of concept that the math > exists in the form of the weil pairing. > > https://bitcointalk.org/index.php?topic=431756.new I know, where can I find running code? Remember that a bug can easily lose thousands of dollars worth of Bitcoins. -- 'peter'[:-1]@petertodd.org 00000000000000001d2af1653c415b7801ce4c9b18ac7e87bef597e652b203e6 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-06 16:46 ` [Bitcoin-development] Bloom bait Peter Todd @ 2014-06-06 16:58 ` Gregory Maxwell 2014-06-06 17:05 ` Peter Todd 0 siblings, 1 reply; 20+ messages in thread From: Gregory Maxwell @ 2014-06-06 16:58 UTC (permalink / raw) To: Peter Todd; +Cc: Bitcoin Development On Fri, Jun 6, 2014 at 9:46 AM, Peter Todd <pete@petertodd.org> wrote: > transactions against. Where they differ is that bloom filters has O(n) > scaling, where n is the size of a block, and prefix filters have O(log n) > scaling with slightly(1) higher k. Again, if you *don't* use brute forcing > in conjunction with prefixes they have no different transactional graph > privacy than bloom filters, Huh? How are you thinking that something that gets put in transactions and burned forever into the blockchain that lets you (statically) link txout ownership is "no different" from something which is shared directly with a couple peers, potentially peers you trust and which are run by yourself or your organization? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-06 16:58 ` Gregory Maxwell @ 2014-06-06 17:05 ` Peter Todd 2014-06-06 17:10 ` Gregory Maxwell 0 siblings, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-06 17:05 UTC (permalink / raw) To: Gregory Maxwell; +Cc: Bitcoin Development [-- Attachment #1: Type: text/plain, Size: 1230 bytes --] On Fri, Jun 06, 2014 at 09:58:19AM -0700, Gregory Maxwell wrote: > On Fri, Jun 6, 2014 at 9:46 AM, Peter Todd <pete@petertodd.org> wrote: > > transactions against. Where they differ is that bloom filters has O(n) > > scaling, where n is the size of a block, and prefix filters have O(log n) > > scaling with slightly(1) higher k. Again, if you *don't* use brute forcing > > in conjunction with prefixes they have no different transactional graph > > privacy than bloom filters, > > Huh? How are you thinking that something that gets put in transactions > and burned forever into the blockchain that lets you (statically) link > txout ownership is "no different" from something which is shared > directly with a couple peers, potentially peers you trust and which > are run by yourself or your organization? Again, you *don't* have to use brute-force prefix selection. You can just as easily give your peer multiple prefixes, each of which corresponds at least one address in your wallet with some false positive rate. I explained all this in detail in my original blockchain data privacy writeup months ago. -- 'peter'[:-1]@petertodd.org 000000000000000029d945c3832c7f4afabce11e6cb1c27b6f5e8c0f2bbb356c [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-06 17:05 ` Peter Todd @ 2014-06-06 17:10 ` Gregory Maxwell 2014-06-06 17:45 ` Peter Todd 0 siblings, 1 reply; 20+ messages in thread From: Gregory Maxwell @ 2014-06-06 17:10 UTC (permalink / raw) To: Peter Todd; +Cc: Bitcoin Development On Fri, Jun 6, 2014 at 10:05 AM, Peter Todd <pete@petertodd.org> wrote: > Again, you *don't* have to use brute-force prefix selection. You can > just as easily give your peer multiple prefixes, each of which > corresponds at least one address in your wallet with some false positive > rate. I explained all this in detail in my original blockchain data > privacy writeup months ago. I'm not trying to pick nits about all the options, I just found it surprising that you were saying that data published in a super public manner is no different than something used between nodes. > I explained all this in detail in my original blockchain data privacy writeup months ago. Communication is a two way street, Adam and I (and others) are earnestly trying— that we're not following your arguments may be a suggestion that they need to be communicated somewhat differently. I'm still failing to see the usefulness of having any prefix filtering for DH-private outputs. It really complicates the security story— in particular you don't know _now_ what activities will turn your prior information leaks into compromising ones retrospectivelly, and doesn't seem at very necessary for scanning performance. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-06 17:10 ` Gregory Maxwell @ 2014-06-06 17:45 ` Peter Todd 2014-06-07 11:22 ` Mike Hearn 0 siblings, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-06 17:45 UTC (permalink / raw) To: Gregory Maxwell; +Cc: Bitcoin Development [-- Attachment #1: Type: text/plain, Size: 2447 bytes --] On Fri, Jun 06, 2014 at 10:10:51AM -0700, Gregory Maxwell wrote: > On Fri, Jun 6, 2014 at 10:05 AM, Peter Todd <pete@petertodd.org> wrote: > > Again, you *don't* have to use brute-force prefix selection. You can > > just as easily give your peer multiple prefixes, each of which > > corresponds at least one address in your wallet with some false positive > > rate. I explained all this in detail in my original blockchain data > > privacy writeup months ago. > > I'm not trying to pick nits about all the options, I just found it > surprising that you were saying that data published in a super public > manner is no different than something used between nodes. Because I was designing a system under the assumption that you were highly likely to connect to an attacker at some point, and the trade-off available with the available math was to either give very detailed info to that attacker, or give away some probabalistic info to everyone. > > I explained all this in detail in my original blockchain data privacy writeup months ago. > > Communication is a two way street, Adam and I (and others) are > earnestly trying— that we're not following your arguments may be a > suggestion that they need to be communicated somewhat differently. Quite likely - I think most of this disagreement stems from the fact that we have different starting assumptions. In particular my assumption that you are likely to end up connecting to an attacker logging data, and my desire to have a standard that can be implemented with existing cryptographic primatives. Remember that I'm spending a lot of time working with wallet authors; they have approximately zero interest in standards that require crypto any more fancy than HD wallets do. > I'm still failing to see the usefulness of having any prefix filtering > for DH-private outputs. It really complicates the security story— in > particular you don't know _now_ what activities will turn your prior > information leaks into compromising ones retrospectivelly, and doesn't > seem at very necessary for scanning performance. Scanning performance is different from bandwidth performance. Prefix brute-forcing was designed to address the latter concern for cases where you are bandwidth limited and don't have a trusted peer to do the scanning for you. -- 'peter'[:-1]@petertodd.org 00000000000000001c5e0fca7bd6d96211a37543c1d0cc2f594c15423ee3cdd8 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-06 17:45 ` Peter Todd @ 2014-06-07 11:22 ` Mike Hearn 2014-06-07 19:44 ` Alan Reiner 2014-06-08 21:35 ` Peter Todd 0 siblings, 2 replies; 20+ messages in thread From: Mike Hearn @ 2014-06-07 11:22 UTC (permalink / raw) To: Peter Todd; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 471 bytes --] You can send different bloom filters to different peers too, so I'm not sure why you're listing subsetting as a unique advantage of prefix filters. The main advantage of prefix filters seems to be faster lookups if the node is calculating a sorted index for each block, and the utxo commitment stuff, both of those would be cool but involve imposing extra costs on nodes. We lack models that let us understand the tradeoffs involved in various indexing schemes, I feel. [-- Attachment #2: Type: text/html, Size: 517 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-07 11:22 ` Mike Hearn @ 2014-06-07 19:44 ` Alan Reiner 2014-06-08 21:45 ` Peter Todd 2014-06-08 21:35 ` Peter Todd 1 sibling, 1 reply; 20+ messages in thread From: Alan Reiner @ 2014-06-07 19:44 UTC (permalink / raw) To: bitcoin-development On 06/07/2014 07:22 AM, Mike Hearn wrote: > > You can send different bloom filters to different peers too, so I'm > not sure why you're listing subsetting as a unique advantage of prefix > filters. > > Please let me know if we've gone down this path before, but it would seem that the more different bloom filters you create, the more information you give away. It would be most useful to create a single bloom filter that captures every address you ever intend to use (say a look ahead of 1000 addresses), and then only ever communicate that. Once people see multiple filters that you produce, they can start looking at the intersection of them to reduce the identity space. I would expect that after enough bloom variants, they could figure out a perfect subset of blockchain addresses in your wallet. (I suppose you could intentionally select an extra 20% addresses to include in every bloom filter, but it's a hack). Similarly, if you keep updating your bloom filter to include more addresses, the difference in what passes through the previous one and the new one gives away information about new addresses you created. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-07 19:44 ` Alan Reiner @ 2014-06-08 21:45 ` Peter Todd 2014-06-10 10:41 ` Mike Hearn 0 siblings, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-08 21:45 UTC (permalink / raw) To: Alan Reiner; +Cc: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 2347 bytes --] On Sat, Jun 07, 2014 at 03:44:07PM -0400, Alan Reiner wrote: > > On 06/07/2014 07:22 AM, Mike Hearn wrote: > > > > You can send different bloom filters to different peers too, so I'm > > not sure why you're listing subsetting as a unique advantage of prefix > > filters. > > > > > > Please let me know if we've gone down this path before, but it would > seem that the more different bloom filters you create, the more > information you give away. It would be most useful to create a single > bloom filter that captures every address you ever intend to use (say a > look ahead of 1000 addresses), and then only ever communicate that. > Once people see multiple filters that you produce, they can start > looking at the intersection of them to reduce the identity space. I > would expect that after enough bloom variants, they could figure out a > perfect subset of blockchain addresses in your wallet. (I suppose you > could intentionally select an extra 20% addresses to include in every > bloom filter, but it's a hack). > > Similarly, if you keep updating your bloom filter to include more > addresses, the difference in what passes through the previous one and > the new one gives away information about new addresses you created. You're completely correct. You can use the same nTweak value for each filter and then slice up the filters bitwise, but then you end up linking every query you make to your identity unless you just used a fixed nTweak that everyone else uses. (e.g. zero) If you do that, you still have the problem that you're greatly increasing the load on the network. In any case, all this shows is that in the future bloom filters will very likely go away eventually, and to make that upgrade path smooth it only makes sense to define a way for nodes to let others know whether or not bloom is supported. A NODE_BLOOM service bit is a very reasonable and simple way to do exactly that, and is defacto what implementations that don't support bloom filters do anyway. Note BTW that re: DNS seeds, once the NODE_BLOOM BIP is accepted and the NODE_BLOOM patch merged into bitcoind, I'll write a patch for sipa's seeder to make it only return seeds with bloom filter support. -- 'peter'[:-1]@petertodd.org 00000000000000003afb1fdf0867fc063775e69f9ae79870bb8727f25b49e88f [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-08 21:45 ` Peter Todd @ 2014-06-10 10:41 ` Mike Hearn 0 siblings, 0 replies; 20+ messages in thread From: Mike Hearn @ 2014-06-10 10:41 UTC (permalink / raw) To: Peter Todd; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 672 bytes --] > > A NODE_BLOOM service bit is a very reasonable > and simple way to do exactly that, and is defacto what implementations > that don't support bloom filters do anyway. > BTW, I find it curious that any nodes have code to disconnect peers that send Bloom filters. It shouldn't be necessary. Bitcoinj is the only large scale user of filtering and it will disconnect itself if a peer advertises support for a version lower than 70000. If a node advertises support for this version or higher then it is supposed to implement BIP37. It sounds like some node authors decided to advertise support for a protocol version they didn't bother implementing, which would be a bug. [-- Attachment #2: Type: text/html, Size: 1001 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-07 11:22 ` Mike Hearn 2014-06-07 19:44 ` Alan Reiner @ 2014-06-08 21:35 ` Peter Todd 2014-06-10 10:38 ` Mike Hearn 1 sibling, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-08 21:35 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 866 bytes --] On Sat, Jun 07, 2014 at 07:22:56PM +0800, Mike Hearn wrote: > You can send different bloom filters to different peers too, so I'm not > sure why you're listing subsetting as a unique advantage of prefix filters. As I explained in the email you're replying to and didn't quote, bloom filters has O(n) cost per query, so sending different bloom filters to different peers for privacy reasons costs the network significant disk IO resources. If I were to actually implement it it'd look like a DoS attack on the network. Essentially with bloom filters you have to make a tradeoff between scalability and privacy; with prefix filters you don't have to make that ugly tradeoff. Notably that tradeoff gets worse if we ever increase the Bitcoin blocksize. -- 'peter'[:-1]@petertodd.org 00000000000000003afb1fdf0867fc063775e69f9ae79870bb8727f25b49e88f [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-08 21:35 ` Peter Todd @ 2014-06-10 10:38 ` Mike Hearn 2014-06-10 13:02 ` Jeff Garzik 2014-06-10 17:08 ` Peter Todd 0 siblings, 2 replies; 20+ messages in thread From: Mike Hearn @ 2014-06-10 10:38 UTC (permalink / raw) To: Peter Todd; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 3107 bytes --] > > As I explained in the email you're replying to and didn't quote, bloom > filters has O(n) cost per query, so sending different bloom filters to > different peers for privacy reasons costs the network significant disk > IO resources. If I were to actually implement it it'd look like a DoS > attack on the network. > DoS attack? Nice try. Performance is subtle, disk iops especially so. I suspect you'd find - if you implemented it - that for the kinds of loads Bitcoin is processing both today and tomorrow prefix filtering either doesn't save any disk seeks or actively makes it worse. Consider a client that is syncing the last 24 hours of chain. bitcoind pre-allocates space for blocks in large chunks, so most blocks are laid out sequentially on disk. Almost all the cost of a disk read is rotational latency. Once the head is in place data can be read in very fast and modern kernels will attempt to adaptively read ahead in order to exploit this, especially if a program seems to be working through a disk file sequentially. The work of Bloom filtering parts of the chain for this client boils down to a handful of disk seeks at best and the rest of the work is all CPU/memory bound as the block is parsed into objects and tested against the filter. A smarter filtering implementation than ours could do SAX-style parsing of the block and avoid the overhead of turning it all into objects. Now consider a prefix filtering implementation. You need to calculate a sorted list of all the data elements and tx hashes in the block, that maps to the location in the block where the tx data can be found. These per-block indexes take up extra disk space and, realistically, would likely be implemented using LevelDB as that's a tool which is designed for creating and using these kinds of tables, so then you're both loading the block data itself (blocks are sized about right currently to always fit in the default kernel readahead window) AND also seeking through the indexes, and building them too. A smart implementation might try and pack the index next to each block so it's possible to load both at once with a single seek, but that would probably be more work, as it'd force building of the index to be synchronous with saving the block to disk thus slowing down block relay. In contrast a LevelDB based index would do the bulk of the index-building work on a separate core. At *some* block size and client load the additional data storage and increased pressure on the page cache would probably make it worthwhile. But I find it unlikely to be true at current traffic levels, or double or triple today's levels. So I'd rather we spend our very limited collective time on finding ways to increase usage rather than worrying about resources which are not presently scarce. (as an aside, some of the above analysis would be invalidated if most nodes end up running on SSDs, but I doubt most are. It'd be neat to export storage tech via some kind of stats message - LevelDB is designed for HDDs not SSDs so at some point a new storage subsystem might make sense if the network switched over). [-- Attachment #2: Type: text/html, Size: 3555 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-10 10:38 ` Mike Hearn @ 2014-06-10 13:02 ` Jeff Garzik 2014-06-10 17:08 ` Peter Todd 1 sibling, 0 replies; 20+ messages in thread From: Jeff Garzik @ 2014-06-10 13:02 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev Most of this description of disk activity is true, but it omits one key point: Total cached data (working set). It is a binary, first order question: are you hitting pagecache, or the disk? When nodes act as archival data sources, the pagecache pressure is immense. When nodes just primarily serve recent blocks, that data is being served out of pagecache. As I directly observed running public nodes, the disks were running constantly, impacting all clients, even clients downloading only recent blocks. Luckily, headers are served out of RAM, so that part of the sync is always fast. NODE_BLOOM -- and block download in general -- will tend to be slower than it could be, due to the working set almost always being larger than available pagecache. Fix that problem, NODE_BLOOM will always operate out of pagecache, and disk activity will not be an issue. Once you start hitting the disk, you've already lost. On Tue, Jun 10, 2014 at 6:38 AM, Mike Hearn <mike@plan99.net> wrote: >> As I explained in the email you're replying to and didn't quote, bloom >> filters has O(n) cost per query, so sending different bloom filters to >> different peers for privacy reasons costs the network significant disk >> IO resources. If I were to actually implement it it'd look like a DoS >> attack on the network. > > > DoS attack? Nice try. > > Performance is subtle, disk iops especially so. I suspect you'd find - if > you implemented it - that for the kinds of loads Bitcoin is processing both > today and tomorrow prefix filtering either doesn't save any disk seeks or > actively makes it worse. > > Consider a client that is syncing the last 24 hours of chain. bitcoind > pre-allocates space for blocks in large chunks, so most blocks are laid out > sequentially on disk. Almost all the cost of a disk read is rotational > latency. Once the head is in place data can be read in very fast and modern > kernels will attempt to adaptively read ahead in order to exploit this, > especially if a program seems to be working through a disk file > sequentially. The work of Bloom filtering parts of the chain for this client > boils down to a handful of disk seeks at best and the rest of the work is > all CPU/memory bound as the block is parsed into objects and tested against > the filter. A smarter filtering implementation than ours could do SAX-style > parsing of the block and avoid the overhead of turning it all into objects. > > Now consider a prefix filtering implementation. You need to calculate a > sorted list of all the data elements and tx hashes in the block, that maps > to the location in the block where the tx data can be found. These per-block > indexes take up extra disk space and, realistically, would likely be > implemented using LevelDB as that's a tool which is designed for creating > and using these kinds of tables, so then you're both loading the block data > itself (blocks are sized about right currently to always fit in the default > kernel readahead window) AND also seeking through the indexes, and building > them too. A smart implementation might try and pack the index next to each > block so it's possible to load both at once with a single seek, but that > would probably be more work, as it'd force building of the index to be > synchronous with saving the block to disk thus slowing down block relay. In > contrast a LevelDB based index would do the bulk of the index-building work > on a separate core. > > At some block size and client load the additional data storage and increased > pressure on the page cache would probably make it worthwhile. But I find it > unlikely to be true at current traffic levels, or double or triple today's > levels. So I'd rather we spend our very limited collective time on finding > ways to increase usage rather than worrying about resources which are not > presently scarce. > > (as an aside, some of the above analysis would be invalidated if most nodes > end up running on SSDs, but I doubt most are. It'd be neat to export storage > tech via some kind of stats message - LevelDB is designed for HDDs not SSDs > so at some point a new storage subsystem might make sense if the network > switched over). > > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > -- Jeff Garzik Bitcoin core developer and open source evangelist BitPay, Inc. https://bitpay.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-10 10:38 ` Mike Hearn 2014-06-10 13:02 ` Jeff Garzik @ 2014-06-10 17:08 ` Peter Todd 2014-06-11 8:57 ` Mike Hearn 1 sibling, 1 reply; 20+ messages in thread From: Peter Todd @ 2014-06-10 17:08 UTC (permalink / raw) To: Mike Hearn, Jeff Garzik; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 4389 bytes --] On Tue, Jun 10, 2014 at 06:38:23PM +0800, Mike Hearn wrote: > > > > As I explained in the email you're replying to and didn't quote, bloom > > filters has O(n) cost per query, so sending different bloom filters to > > different peers for privacy reasons costs the network significant disk > > IO resources. If I were to actually implement it it'd look like a DoS > > attack on the network. > > > > DoS attack? Nice try. Suppose I wrote an single address lookup tool for Android that connected to multiple peers and used bloom filters to find the history of a specific address. Of course, I don't want to use too much bandwidth being on mobile, so I'll use as specific a bloom filter as possible. I might even connect to multiple peers to speed up the lookup. Is this any different from my bloom filter IO attack code? Nope. Hence, splitting up bloom filter requests for better privacy will certainly look like a DoS attack and will certainly greatly increase the load on the network. > Now consider a prefix filtering implementation. You need to calculate a > sorted list of all the data elements and tx hashes in the block, that maps > to the location in the block where the tx data can be found. These > per-block indexes take up extra disk space and, realistically, would likely > be implemented using LevelDB as that's a tool which is designed for > creating and using these kinds of tables, so then you're both loading the > block data itself (blocks are sized about right currently to always fit in > the default kernel readahead window) AND also seeking through the indexes, > and building them too. A smart implementation might try and pack the index > next to each block so it's possible to load both at once with a single > seek, but that would probably be more work, as it'd force building of the > index to be synchronous with saving the block to disk thus slowing down > block relay. In contrast a LevelDB based index would do the bulk of the > index-building work on a separate core. That's exactly the kinds of optimizations obelisk is implementing to make its prefix lookup database fast. Also those optimizations are situation dependent, for instance "packing the index next to each block" is irrelevant if you put archival blockchain data on a slow HD, and indexes on a fast SSD, something some obelisk servers do. More to the point, your showing quite clearly there isn't just one optimal way to do it. Applying a bloom filter, or a prefix filter, or some as yet unknown filter, to blockchain data is a service and that service has different tradeoffs compared to just serving up archival block history. There is zero reason not to make that service something you advertise with NODE_BLOOM - after all, you already have the code in bitcoinj to do the exact same thing by checking the advertised protocol version. On Tue, Jun 10, 2014 at 09:02:00AM -0400, Jeff Garzik wrote: > Most of this description of disk activity is true, but it omits one > key point: Total cached data (working set). It is a binary, first > order question: are you hitting pagecache, or the disk? When nodes > act as archival data sources, the pagecache pressure is immense. When > nodes just primarily serve recent blocks, that data is being served > out of pagecache. As I directly observed running public nodes, the > disks were running constantly, impacting all clients, even clients > downloading only recent blocks. > > Luckily, headers are served out of RAM, so that part of the sync is always fast. > > NODE_BLOOM -- and block download in general -- will tend to be slower > than it could be, due to the working set almost always being larger > than available pagecache. Fix that problem, NODE_BLOOM will always > operate out of pagecache, and disk activity will not be an issue. > > Once you start hitting the disk, you've already lost. Yup. I discussed this with Matt Corallo at the financial crypto conference a few months back and he made the same point. Unfortunately we'll need an upgrade to let nodes advertise ranges of blocks to begin to fix that issue, and even then it still shows quite clearly how it's not optimal if we force everyone to share blockchain data in the same way. -- 'peter'[:-1]@petertodd.org 000000000000000023c7fc084ed84b891cc2fa90e4a34708d6b2370d3ec1c85d [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 685 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Bitcoin-development] Bloom bait 2014-06-10 17:08 ` Peter Todd @ 2014-06-11 8:57 ` Mike Hearn 0 siblings, 0 replies; 20+ messages in thread From: Mike Hearn @ 2014-06-11 8:57 UTC (permalink / raw) To: Peter Todd; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 1823 bytes --] > > Is this any different from my bloom filter IO attack code? Nope. > It's obviously different; a thin client trying to obtain more privacy is not attempting to deny service to anyone. You can't simply state that a feature which uses resources for a legitimate reason is a DoS attack, that's a spurious definition that would reclassify innocuous things like web browser prefetching as DoS attacks too. With respect to the work you're proposing, trying to retroactively make Bloom filtering an optional part of the protocol is just wasting people's time at this point: - There is no evidence there's an actual problem today. - There is no evidence there will be a problem any time soon, given the meagre levels of traffic growth we have. - It involves a complicated rollout strategy that would create work for many people. - Gavin, Wladimir and myself have all concluded it's not worth the cost. - The only justification you have provided is that two node implementations hardly anyone uses can't be bothered to implement Bloom filtering, but want to advertise support in their ver message anyway. They can simply lower the number they advertise in their ver message. That said, if you want to implement better support for archival nodes then that'd be great. The way to do it would be either to implement block ranges in the addr announcements as has been discussed many times before, or perhaps introduce a new protocol optimised for serving the chain. Nodes that are CPU limited won't want to take part in the rest of the P2P protocol anyway, it's not just Bloom filtering that uses CPU time but also block and tx relay. But until you have done these things, please stop attempting to reclassify any feature you can imagine a more efficient version of as an "attack". It's just silly. [-- Attachment #2: Type: text/html, Size: 2284 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-06-11 8:57 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-06 8:19 [Bitcoin-development] NODE_BLOOM service bit Peter Todd 2014-06-06 8:48 ` Adam Back 2014-06-06 9:03 ` Gregory Maxwell 2014-06-06 9:11 ` Peter Todd 2014-06-06 9:04 ` Peter Todd 2014-06-06 10:45 ` Adam Back 2014-06-06 16:46 ` [Bitcoin-development] Bloom bait Peter Todd 2014-06-06 16:58 ` Gregory Maxwell 2014-06-06 17:05 ` Peter Todd 2014-06-06 17:10 ` Gregory Maxwell 2014-06-06 17:45 ` Peter Todd 2014-06-07 11:22 ` Mike Hearn 2014-06-07 19:44 ` Alan Reiner 2014-06-08 21:45 ` Peter Todd 2014-06-10 10:41 ` Mike Hearn 2014-06-08 21:35 ` Peter Todd 2014-06-10 10:38 ` Mike Hearn 2014-06-10 13:02 ` Jeff Garzik 2014-06-10 17:08 ` Peter Todd 2014-06-11 8:57 ` Mike Hearn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox