[Bitcoin-development] NODE_BLOOM service bit

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] NODE_BLOOM service bit
@ 2014-06-06  8:19 Peter Todd
  2014-06-06  8:48 ` Adam Back
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-06  8:19 UTC (permalink / raw)
  To: bitcoin-development, Gregory Maxwell

[-- Attachment #1: Type: text/plain, Size: 1740 bytes --]

BIP: https://github.com/petertodd/bips/blob/bip-node-bloom/bip-node-bloom.mediawiki

Pull-req: https://github.com/bitcoin/bitcoin/pull/2900

Pretty simple really: service bit NODE_BLOOM is defined to allow nodes
to advertise to their peers that they support bloom filters. The network
protocol version number is also bumped. Recommended behavior for nodes
that do not support bloom is to simply disconnect peers who send a
filter* message anyway to let them quickly find another peer.

Rational: Bloom filters are not always supported or appropriate. For
instance no node implementation other than Bitcoin Core supports them,
e.g btcd and obelisk. (which ironically implement this BIP already,
modulo the version number bump) In the long term bloom filters will be
obsoleted eventually as they have poor scaling properties - problematic
with blocksize increases - and are incompatible with UTXO/TXO
commitments, which will use prefix-based filtering instead. (already
being implemented in electrum and obelisk)

In the short term bloom filters have high IO loads, which have lead to
DoS attacks, and are not an optimal use of resources for nodes which are
IO constrained rather than bandwidth constrained. (common in VPS setups
which could better help the network by serving full blocks)

Adding NODE_BLOOM as a service bit now will save us some hassle later
down the road, reflects what actual implementations do anyway, has been
already deployed on Litecoin, Dogecoin, and a zillion other alts with no
issues, (including SPV client support) and is a trivial patch.

Gregory Maxwell: Please assign a BIP #

-- 
'peter'[:-1]@petertodd.org
000000000000000049066dab2483c9e069656239f5782da204bd4995bd92c19f

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] NODE_BLOOM service bit
  2014-06-06  8:19 [Bitcoin-development] NODE_BLOOM service bit Peter Todd
@ 2014-06-06  8:48 ` Adam Back
  2014-06-06  9:03   ` Gregory Maxwell
  2014-06-06  9:04   ` Peter Todd
  0 siblings, 2 replies; 20+ messages in thread
From: Adam Back @ 2014-06-06  8:48 UTC (permalink / raw)
  To: Peter Todd; +Cc: bitcoin-development

Advertising NODE BLOOM as a service sounds good.

But the critique of bloom filters, I am not so sure prefix filters are
better.  Prefix filters offer questionable privacy tradeoffs in my opinion. 
Same problem as with stealth address proposed use of prefixes.

All for scalability, efficiency and decentralization but not ideally at the
expense of nuking privacy.  The effects on privacy are cumulative, and
affect everyone not just the user.  Same pattern of local decision, global
effect as with reused addresses.

Adam

On Fri, Jun 06, 2014 at 04:19:33AM -0400, Peter Todd wrote:
>In the short term bloom filters have high IO loads, which have lead to
>DoS attacks, and are not an optimal use of resources for nodes which are
>IO constrained rather than bandwidth constrained. (common in VPS setups
>which could better help the network by serving full blocks)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] NODE_BLOOM service bit
  2014-06-06  8:48 ` Adam Back
@ 2014-06-06  9:03   ` Gregory Maxwell
  2014-06-06  9:11     ` Peter Todd
  2014-06-06  9:04   ` Peter Todd
  1 sibling, 1 reply; 20+ messages in thread
From: Gregory Maxwell @ 2014-06-06  9:03 UTC (permalink / raw)
  To: Adam Back; +Cc: Bitcoin Development

On Fri, Jun 6, 2014 at 1:48 AM, Adam Back <adam@cypherspace.org> wrote:
> Advertising NODE BLOOM as a service sounds good.
>
> But the critique of bloom filters, I am not so sure prefix filters are
> better.  Prefix filters offer questionable privacy tradeoffs in my opinion.
> Same problem as with stealth address proposed use of prefixes.
>
> All for scalability, efficiency and decentralization but not ideally at the
> expense of nuking privacy.  The effects on privacy are cumulative, and
> affect everyone not just the user.  Same pattern of local decision, global
> effect as with reused addresses.

The performance Bytecoin/Monero/Fantom/etc. systems that use ECDH
addresses for all transactions seem to be suggesting that the prefixes
aren't really needed.

At least with current system rules doing the ECDH for each transaction
seems pretty reasonable.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] NODE_BLOOM service bit
  2014-06-06  9:03   ` Gregory Maxwell
@ 2014-06-06  9:11     ` Peter Todd
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Todd @ 2014-06-06  9:11 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

On Fri, Jun 06, 2014 at 02:03:29AM -0700, Gregory Maxwell wrote:
> On Fri, Jun 6, 2014 at 1:48 AM, Adam Back <adam@cypherspace.org> wrote:
> > Advertising NODE BLOOM as a service sounds good.
> >
> > But the critique of bloom filters, I am not so sure prefix filters are
> > better.  Prefix filters offer questionable privacy tradeoffs in my opinion.
> > Same problem as with stealth address proposed use of prefixes.
> >
> > All for scalability, efficiency and decentralization but not ideally at the
> > expense of nuking privacy.  The effects on privacy are cumulative, and
> > affect everyone not just the user.  Same pattern of local decision, global
> > effect as with reused addresses.
> 
> The performance Bytecoin/Monero/Fantom/etc. systems that use ECDH
> addresses for all transactions seem to be suggesting that the prefixes
> aren't really needed.
> 
> At least with current system rules doing the ECDH for each transaction
> seems pretty reasonable.

Yup. Obelisk's indexing is sufficiently fast that they hadn't even
bothered making Dark Wallet store transaction information between
sessions until recently. Prefix brute-forcing isn't yet implemented,
although prefix filters is being implemented for lookups in general. (at
the very least it gives the server operators some valuable plausible
deniability)

-- 
'peter'[:-1]@petertodd.org
00000000000000003a68ee16d702ca5dd5547fb4aead910a004747cb06241dd6

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] NODE_BLOOM service bit
  2014-06-06  8:48 ` Adam Back
  2014-06-06  9:03   ` Gregory Maxwell
@ 2014-06-06  9:04   ` Peter Todd
  2014-06-06 10:45     ` Adam Back
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-06  9:04 UTC (permalink / raw)
  To: Adam Back; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]

On Fri, Jun 06, 2014 at 10:48:52AM +0200, Adam Back wrote:
> Advertising NODE BLOOM as a service sounds good.
> 
> But the critique of bloom filters, I am not so sure prefix filters are
> better.  Prefix filters offer questionable privacy tradeoffs in my
> opinion. Same problem as with stealth address proposed use of
> prefixes.

That's assuming you're doing the proposed prefix brute forcing - if you
don't do that they have privacy equal or better than bloom filters, but
with better scalability. In particular that better scalability lets you
efficiently query multiple servers for blockchain data, only giving up
info on a subset of the addresses in your wallet to each server. This
can be a significant improvement to bloom filters if your attacker is
running logging nodes to try to, say, deanonymize CoinJoin transactions.

> All for scalability, efficiency and decentralization but not ideally at the
> expense of nuking privacy.  The effects on privacy are cumulative, and
> affect everyone not just the user.  Same pattern of local decision, global
> effect as with reused addresses.

Indeed. But again, remember that the existing systems suck too;
prefix-brute forcing is a engineering tradeoff implementable with
existing and well understood technology.

Now if you want to come up with something better and write code, please
do! I'm sure the math exists; what doesn't exist is robust and well
tested code in multiple languages. Stealth addresses at least have been
designed so that future blockchain filter upgrades can be added in a
backwards compatible way.

-- 
'peter'[:-1]@petertodd.org
00000000000000003a68ee16d702ca5dd5547fb4aead910a004747cb06241dd6

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] NODE_BLOOM service bit
  2014-06-06  9:04   ` Peter Todd
@ 2014-06-06 10:45     ` Adam Back
  2014-06-06 16:46       ` [Bitcoin-development] Bloom bait Peter Todd
  0 siblings, 1 reply; 20+ messages in thread
From: Adam Back @ 2014-06-06 10:45 UTC (permalink / raw)
  To: Peter Todd; +Cc: bitcoin-development

On Fri, Jun 06, 2014 at 05:04:41AM -0400, Peter Todd wrote:
>On Fri, Jun 06, 2014 at 10:48:52AM +0200, Adam Back wrote:
>> Prefix filters offer questionable privacy tradeoffs in my opinion.  Same
>> problem as with stealth address proposed use of prefixes.
>
>That's assuming you're doing the proposed prefix brute forcing

As I recall prefix brute forcing was a bit twiddle saving, ie searching for
EDH key that has the users public prefix.  That does not improve privacy
over an explicit prefix, it saves a byte or so at the expense of average 128
EDH exchanges to send and provides also some probably relatively ineffective
plausible deniability by enabling the transaction to be indistinguishable
from some other (not very common) types of transaction.

>don't do that they have privacy equal or better than bloom filters, but
>with better scalability. 

either its full node only where prefixes are not used, which is less
scalable than bloom; or prefixes are used explicitly or implicitly
(brute-force) and either way privacy is weakened by the extra correlation
hook provided by elimination from the network graph of payments with
mismatched prefixes.

>In particular that better scalability lets you efficiently query multiple
>servers for blockchain data, only giving up info on a subset of the
>addresses in your wallet to each server.  This can be a significant
>improvement to bloom filters if your attacker is running logging nodes to
>try to, say, deanonymize CoinJoin transactions.

We need to consider the two types of privacy involved.  Privacy from the
full node an SPV client is relying on to find its payments, vs privacy from
analysis of the public transaction graph.  The latter is more damaging. 
Better to design for privacy against future analysis of public info, than
privacy by argument to select non-hostile nodes.  Tor has changed recently
to account for the fact that chosing fresh random nodes is actually worse. 
ie you have a probability of identity/address identification per route/node,
and repeatedly selecting routes/nodes just cumulatively increases the chance
you'll be identified.  Better to pick a random node, identify it and stick
to it and hope you chose well.

>Now if you want to come up with something better and write code, please
>do! I'm sure the math exists; what doesn't exist is robust and well
>tested code in multiple languages. 

Maybe other simpler, but yes there was the proof of concept that the math
exists in the form of the weil pairing.

https://bitcointalk.org/index.php?topic=431756.new

But what problem are we trying to solve here?  Is it an immediate problem? 
Maybe better to figure out a more privacy compatible solution which will
take longer, than let coding drive protocol.

Adam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-06 10:45     ` Adam Back
@ 2014-06-06 16:46       ` Peter Todd
  2014-06-06 16:58         ` Gregory Maxwell
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-06 16:46 UTC (permalink / raw)
  To: Adam Back; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 4080 bytes --]

On Fri, Jun 06, 2014 at 12:45:43PM +0200, Adam Back wrote:

(changed subject line as this discussion has nothing to do with
NODE_BLOOM)

> As I recall prefix brute forcing was a bit twiddle saving, ie searching for
> EDH key that has the users public prefix.  That does not improve privacy
> over an explicit prefix, it saves a byte or so at the expense of average 128
> EDH exchanges to send and provides also some probably relatively ineffective
> plausible deniability by enabling the transaction to be indistinguishable
> from some other (not very common) types of transaction.

I think you should re-read my original proposal; there's a whole host of
misunderstandings above, for instance I have no idea where you got the
idea that it has anything to do with "saving a byte" came from, or where
the number 128 came from.

> >don't do that they have privacy equal or better than bloom filters, but
> >with better scalability.
> 
> either its full node only where prefixes are not used, which is less
> scalable than bloom; or prefixes are used explicitly or implicitly
> (brute-force) and either way privacy is weakened by the extra correlation
> hook provided by elimination from the network graph of payments with
> mismatched prefixes.

Again, you have a misunderstanding. Both bloom filters and prefix
filters are just ways of giving a peer a probabalistic filter to match
transactions against. Where they differ is that bloom filters has O(n)
scaling, where n is the size of a block, and prefix filters have O(log n)
scaling with slightly(1) higher k. Again, if you *don't* use brute forcing
in conjunction with prefixes they have no different transactional graph
privacy than bloom filters, but the better scalability lets you do
things like split your queries across multiple peers that give you
better protection against hostile nodes.  Additionally prefix filters
are compatible with future miner committed indexes to make the data
authenticated.

1) see Amir's experience implementing prefix lookup in Obelisk

> We need to consider the two types of privacy involved.  Privacy from the
> full node an SPV client is relying on to find its payments, vs privacy from
> analysis of the public transaction graph.

Agreed

> The latter is more damaging.

Maybe! If adversaries are operating a significant fraction of the peers
you are connecting to the current design of bloom filters + HD wallets
results in situations where those adversaries have better transactional
graph information than the alternative.

> Better to design for privacy against future analysis of
> public info, than
> privacy by argument to select non-hostile nodes.  Tor has changed recently
> to account for the fact that chosing fresh random nodes is actually
> worse. ie you have a probability of identity/address identification
> per route/node,
> and repeatedly selecting routes/nodes just cumulatively increases the chance
> you'll be identified.  Better to pick a random node, identify it and stick
> to it and hope you chose well.

That's basically what Electrum and Obelisk already do - by default you
connect to a relatively small set of blockchain data servers operated by
well known people and use the same server repeatedly.

Applying that to the P2P network however is tricky as there is a huge
amount of churn in the nodes:

    #bitcoin-dev/14/06/14-06-06.log:11:18 < hearn> bitcoinj can't use
    [service bits] as it relies on DNS seeds and that is unlikely to change
    any time soon due to the general churn rate in the network making it
    hard to bootstrap quickly using just remembered sets of IPs.

> Maybe other simpler, but yes there was the proof of concept that the math
> exists in the form of the weil pairing.
> 
> https://bitcointalk.org/index.php?topic=431756.new

I know, where can I find running code? Remember that a bug can easily
lose thousands of dollars worth of Bitcoins.

-- 
'peter'[:-1]@petertodd.org
00000000000000001d2af1653c415b7801ce4c9b18ac7e87bef597e652b203e6

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-06 16:46       ` [Bitcoin-development] Bloom bait Peter Todd
@ 2014-06-06 16:58         ` Gregory Maxwell
  2014-06-06 17:05           ` Peter Todd
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Maxwell @ 2014-06-06 16:58 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Development

On Fri, Jun 6, 2014 at 9:46 AM, Peter Todd <pete@petertodd.org> wrote:
> transactions against. Where they differ is that bloom filters has O(n)
> scaling, where n is the size of a block, and prefix filters have O(log n)
> scaling with slightly(1) higher k. Again, if you *don't* use brute forcing
> in conjunction with prefixes they have no different transactional graph
> privacy than bloom filters,

Huh? How are you thinking that something that gets put in transactions
and burned forever into the blockchain that lets you (statically) link
txout ownership is "no different" from something which is shared
directly with a couple peers, potentially peers you trust and which
are run by yourself or your organization?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-06 16:58         ` Gregory Maxwell
@ 2014-06-06 17:05           ` Peter Todd
  2014-06-06 17:10             ` Gregory Maxwell
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-06 17:05 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

On Fri, Jun 06, 2014 at 09:58:19AM -0700, Gregory Maxwell wrote:
> On Fri, Jun 6, 2014 at 9:46 AM, Peter Todd <pete@petertodd.org> wrote:
> > transactions against. Where they differ is that bloom filters has O(n)
> > scaling, where n is the size of a block, and prefix filters have O(log n)
> > scaling with slightly(1) higher k. Again, if you *don't* use brute forcing
> > in conjunction with prefixes they have no different transactional graph
> > privacy than bloom filters,
> 
> Huh? How are you thinking that something that gets put in transactions
> and burned forever into the blockchain that lets you (statically) link
> txout ownership is "no different" from something which is shared
> directly with a couple peers, potentially peers you trust and which
> are run by yourself or your organization?

Again, you *don't* have to use brute-force prefix selection. You can
just as easily give your peer multiple prefixes, each of which
corresponds at least one address in your wallet with some false positive
rate. I explained all this in detail in my original blockchain data
privacy writeup months ago.

-- 
'peter'[:-1]@petertodd.org
000000000000000029d945c3832c7f4afabce11e6cb1c27b6f5e8c0f2bbb356c

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-06 17:05           ` Peter Todd
@ 2014-06-06 17:10             ` Gregory Maxwell
  2014-06-06 17:45               ` Peter Todd
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Maxwell @ 2014-06-06 17:10 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Development

On Fri, Jun 6, 2014 at 10:05 AM, Peter Todd <pete@petertodd.org> wrote:
> Again, you *don't* have to use brute-force prefix selection. You can
> just as easily give your peer multiple prefixes, each of which
> corresponds at least one address in your wallet with some false positive
> rate. I explained all this in detail in my original blockchain data
> privacy writeup months ago.

I'm not trying to pick nits about all the options,  I just found it
surprising that you were saying that data published in a super public
manner is no different than something used between nodes.

> I explained all this in detail in my original blockchain data privacy writeup months ago.

Communication is a two way street, Adam and I (and others) are
earnestly trying— that we're not following your arguments may be a
suggestion that they need to be communicated somewhat differently.

I'm still failing to see the usefulness of having any prefix filtering
for DH-private outputs. It really complicates the security story— in
particular you don't know _now_ what activities will turn your prior
information leaks into compromising ones retrospectivelly, and doesn't
seem at very necessary for scanning performance.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-06 17:10             ` Gregory Maxwell
@ 2014-06-06 17:45               ` Peter Todd
  2014-06-07 11:22                 ` Mike Hearn
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-06 17:45 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

On Fri, Jun 06, 2014 at 10:10:51AM -0700, Gregory Maxwell wrote:
> On Fri, Jun 6, 2014 at 10:05 AM, Peter Todd <pete@petertodd.org> wrote:
> > Again, you *don't* have to use brute-force prefix selection. You can
> > just as easily give your peer multiple prefixes, each of which
> > corresponds at least one address in your wallet with some false positive
> > rate. I explained all this in detail in my original blockchain data
> > privacy writeup months ago.
> 
> I'm not trying to pick nits about all the options,  I just found it
> surprising that you were saying that data published in a super public
> manner is no different than something used between nodes.

Because I was designing a system under the assumption that you were
highly likely to connect to an attacker at some point, and the trade-off
available with the available math was to either give very detailed info
to that attacker, or give away some probabalistic info to everyone.

> > I explained all this in detail in my original blockchain data privacy writeup months ago.
> 
> Communication is a two way street, Adam and I (and others) are
> earnestly trying— that we're not following your arguments may be a
> suggestion that they need to be communicated somewhat differently.

Quite likely - I think most of this disagreement stems from the fact
that we have different starting assumptions. In particular my assumption
that you are likely to end up connecting to an attacker logging data,
and my desire to have a standard that can be implemented with existing
cryptographic primatives. Remember that I'm spending a lot of time
working with wallet authors; they have approximately zero interest in
standards that require crypto any more fancy than HD wallets do.

> I'm still failing to see the usefulness of having any prefix filtering
> for DH-private outputs. It really complicates the security story— in
> particular you don't know _now_ what activities will turn your prior
> information leaks into compromising ones retrospectivelly, and doesn't
> seem at very necessary for scanning performance.

Scanning performance is different from bandwidth performance. Prefix
brute-forcing was designed to address the latter concern for cases where
you are bandwidth limited and don't have a trusted peer to do the
scanning for you.

-- 
'peter'[:-1]@petertodd.org
00000000000000001c5e0fca7bd6d96211a37543c1d0cc2f594c15423ee3cdd8

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-06 17:45               ` Peter Todd
@ 2014-06-07 11:22                 ` Mike Hearn
  2014-06-07 19:44                   ` Alan Reiner
  2014-06-08 21:35                   ` Peter Todd
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Hearn @ 2014-06-07 11:22 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 471 bytes --]

You can send different bloom filters to different peers too, so I'm not
sure why you're listing subsetting as a unique advantage of prefix filters.

The main advantage of prefix filters seems to be faster lookups if the node
is calculating a sorted index for each block, and the utxo commitment
stuff, both of those would be cool but involve imposing extra costs on
nodes. We lack models that let us understand the tradeoffs involved in
various indexing schemes, I feel.

[-- Attachment #2: Type: text/html, Size: 517 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-07 11:22                 ` Mike Hearn
@ 2014-06-07 19:44                   ` Alan Reiner
  2014-06-08 21:45                     ` Peter Todd
  2014-06-08 21:35                   ` Peter Todd
  1 sibling, 1 reply; 20+ messages in thread
From: Alan Reiner @ 2014-06-07 19:44 UTC (permalink / raw)
  To: bitcoin-development

On 06/07/2014 07:22 AM, Mike Hearn wrote:
>
> You can send different bloom filters to different peers too, so I'm
> not sure why you're listing subsetting as a unique advantage of prefix
> filters.
>
>

Please let me know if we've gone down this path before, but it would
seem that the more different bloom filters you create, the more
information you give away.  It would be most useful to create a single
bloom filter that captures every address you ever intend to use (say a
look ahead of 1000 addresses), and then only ever communicate that. 
Once people see multiple filters that you produce, they can start
looking at the intersection of them to reduce the identity space.  I
would expect that after enough bloom variants, they could figure out a
perfect subset of blockchain addresses in your wallet.  (I suppose you
could intentionally select an extra 20% addresses to include in every
bloom filter, but it's a hack).

Similarly, if you keep updating your bloom filter to include more
addresses, the difference in what passes through the previous one and
the new one gives away information about new addresses you created.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-07 19:44                   ` Alan Reiner
@ 2014-06-08 21:45                     ` Peter Todd
  2014-06-10 10:41                       ` Mike Hearn
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-08 21:45 UTC (permalink / raw)
  To: Alan Reiner; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

On Sat, Jun 07, 2014 at 03:44:07PM -0400, Alan Reiner wrote:
> 
> On 06/07/2014 07:22 AM, Mike Hearn wrote:
> >
> > You can send different bloom filters to different peers too, so I'm
> > not sure why you're listing subsetting as a unique advantage of prefix
> > filters.
> >
> >
> 
> Please let me know if we've gone down this path before, but it would
> seem that the more different bloom filters you create, the more
> information you give away.  It would be most useful to create a single
> bloom filter that captures every address you ever intend to use (say a
> look ahead of 1000 addresses), and then only ever communicate that. 
> Once people see multiple filters that you produce, they can start
> looking at the intersection of them to reduce the identity space.  I
> would expect that after enough bloom variants, they could figure out a
> perfect subset of blockchain addresses in your wallet.  (I suppose you
> could intentionally select an extra 20% addresses to include in every
> bloom filter, but it's a hack).
> 
> Similarly, if you keep updating your bloom filter to include more
> addresses, the difference in what passes through the previous one and
> the new one gives away information about new addresses you created.

You're completely correct. You can use the same nTweak value for each
filter and then slice up the filters bitwise, but then you end up
linking every query you make to your identity unless you just used a
fixed nTweak that everyone else uses.  (e.g. zero) If you do that, you
still have the problem that you're greatly increasing the load on the
network.

In any case, all this shows is that in the future bloom filters will
very likely go away eventually, and to make that upgrade path smooth it
only makes sense to define a way for nodes to let others know whether or
not bloom is supported. A NODE_BLOOM service bit is a very reasonable
and simple way to do exactly that, and is defacto what implementations
that don't support bloom filters do anyway.

Note BTW that re: DNS seeds, once the NODE_BLOOM BIP is accepted and the
NODE_BLOOM patch merged into bitcoind, I'll write a patch for sipa's
seeder to make it only return seeds with bloom filter support.

-- 
'peter'[:-1]@petertodd.org
00000000000000003afb1fdf0867fc063775e69f9ae79870bb8727f25b49e88f

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-08 21:45                     ` Peter Todd
@ 2014-06-10 10:41                       ` Mike Hearn
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Hearn @ 2014-06-10 10:41 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

>
>  A NODE_BLOOM service bit is a very reasonable
> and simple way to do exactly that, and is defacto what implementations
> that don't support bloom filters do anyway.
>

BTW, I find it curious that any nodes have code to disconnect peers that
send Bloom filters. It shouldn't be necessary. Bitcoinj is the only large
scale user of filtering and it will disconnect itself if a peer advertises
support for a version lower than 70000. If a node advertises support for
this version or higher then it is supposed to implement BIP37.

It sounds like some node authors decided to advertise support for a
protocol version they didn't bother implementing, which would be a bug.

[-- Attachment #2: Type: text/html, Size: 1001 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-07 11:22                 ` Mike Hearn
  2014-06-07 19:44                   ` Alan Reiner
@ 2014-06-08 21:35                   ` Peter Todd
  2014-06-10 10:38                     ` Mike Hearn
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-08 21:35 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

On Sat, Jun 07, 2014 at 07:22:56PM +0800, Mike Hearn wrote:
> You can send different bloom filters to different peers too, so I'm not
> sure why you're listing subsetting as a unique advantage of prefix filters.

As I explained in the email you're replying to and didn't quote, bloom
filters has O(n) cost per query, so sending different bloom filters to
different peers for privacy reasons costs the network significant disk
IO resources. If I were to actually implement it it'd look like a DoS
attack on the network.

Essentially with bloom filters you have to make a tradeoff between
scalability and privacy; with prefix filters you don't have to make that
ugly tradeoff. Notably that tradeoff gets worse if we ever increase the
Bitcoin blocksize.

-- 
'peter'[:-1]@petertodd.org
00000000000000003afb1fdf0867fc063775e69f9ae79870bb8727f25b49e88f

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-08 21:35                   ` Peter Todd
@ 2014-06-10 10:38                     ` Mike Hearn
  2014-06-10 13:02                       ` Jeff Garzik
  2014-06-10 17:08                       ` Peter Todd
  0 siblings, 2 replies; 20+ messages in thread
From: Mike Hearn @ 2014-06-10 10:38 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3107 bytes --]

>
> As I explained in the email you're replying to and didn't quote, bloom
> filters has O(n) cost per query, so sending different bloom filters to
> different peers for privacy reasons costs the network significant disk
> IO resources. If I were to actually implement it it'd look like a DoS
> attack on the network.
>

DoS attack? Nice try.

Performance is subtle, disk iops especially so. I suspect you'd find - if
you implemented it - that for the kinds of loads Bitcoin is processing both
today and tomorrow prefix filtering either doesn't save any disk seeks or
actively makes it worse.

Consider a client that is syncing the last 24 hours of chain. bitcoind
pre-allocates space for blocks in large chunks, so most blocks are laid out
sequentially on disk. Almost all the cost of a disk read is rotational
latency. Once the head is in place data can be read in very fast and modern
kernels will attempt to adaptively read ahead in order to exploit this,
especially if a program seems to be working through a disk file
sequentially. The work of Bloom filtering parts of the chain for this
client boils down to a handful of disk seeks at best and the rest of the
work is all CPU/memory bound as the block is parsed into objects and tested
against the filter. A smarter filtering implementation than ours could do
SAX-style parsing of the block and avoid the overhead of turning it all
into objects.

Now consider a prefix filtering implementation. You need to calculate a
sorted list of all the data elements and tx hashes in the block, that maps
to the location in the block where the tx data can be found. These
per-block indexes take up extra disk space and, realistically, would likely
be implemented using LevelDB as that's a tool which is designed for
creating and using these kinds of tables, so then you're both loading the
block data itself (blocks are sized about right currently to always fit in
the default kernel readahead window) AND also seeking through the indexes,
and building them too. A smart implementation might try and pack the index
next to each block so it's possible to load both at once with a single
seek, but that would probably be more work, as it'd force building of the
index to be synchronous with saving the block to disk thus slowing down
block relay. In contrast a LevelDB based index would do the bulk of the
index-building work on a separate core.

At *some* block size and client load the additional data storage and
increased pressure on the page cache would probably make it worthwhile. But
I find it unlikely to be true at current traffic levels, or double or
triple today's levels. So I'd rather we spend our very limited collective
time on finding ways to increase usage rather than worrying about resources
which are not presently scarce.

(as an aside, some of the above analysis would be invalidated if most nodes
end up running on SSDs, but I doubt most are. It'd be neat to export
storage tech via some kind of stats message - LevelDB is designed for HDDs
not SSDs so at some point a new storage subsystem might make sense if the
network switched over).

[-- Attachment #2: Type: text/html, Size: 3555 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-10 10:38                     ` Mike Hearn
@ 2014-06-10 13:02                       ` Jeff Garzik
  2014-06-10 17:08                       ` Peter Todd
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Garzik @ 2014-06-10 13:02 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

Most of this description of disk activity is true, but it omits one
key point:  Total cached data (working set).  It is a binary, first
order question:  are you hitting pagecache, or the disk?  When nodes
act as archival data sources, the pagecache pressure is immense.  When
nodes just primarily serve recent blocks, that data is being served
out of pagecache. As I directly observed running public nodes, the
disks were running constantly, impacting all clients, even clients
downloading only recent blocks.

Luckily, headers are served out of RAM, so that part of the sync is always fast.

NODE_BLOOM -- and block download in general -- will tend to be slower
than it could be, due to the working set almost always being larger
than available pagecache.  Fix that problem, NODE_BLOOM will always
operate out of pagecache, and disk activity will not be an issue.

Once you start hitting the disk, you've already lost.



On Tue, Jun 10, 2014 at 6:38 AM, Mike Hearn <mike@plan99.net> wrote:
>> As I explained in the email you're replying to and didn't quote, bloom
>> filters has O(n) cost per query, so sending different bloom filters to
>> different peers for privacy reasons costs the network significant disk
>> IO resources. If I were to actually implement it it'd look like a DoS
>> attack on the network.
>
>
> DoS attack? Nice try.
>
> Performance is subtle, disk iops especially so. I suspect you'd find - if
> you implemented it - that for the kinds of loads Bitcoin is processing both
> today and tomorrow prefix filtering either doesn't save any disk seeks or
> actively makes it worse.
>
> Consider a client that is syncing the last 24 hours of chain. bitcoind
> pre-allocates space for blocks in large chunks, so most blocks are laid out
> sequentially on disk. Almost all the cost of a disk read is rotational
> latency. Once the head is in place data can be read in very fast and modern
> kernels will attempt to adaptively read ahead in order to exploit this,
> especially if a program seems to be working through a disk file
> sequentially. The work of Bloom filtering parts of the chain for this client
> boils down to a handful of disk seeks at best and the rest of the work is
> all CPU/memory bound as the block is parsed into objects and tested against
> the filter. A smarter filtering implementation than ours could do SAX-style
> parsing of the block and avoid the overhead of turning it all into objects.
>
> Now consider a prefix filtering implementation. You need to calculate a
> sorted list of all the data elements and tx hashes in the block, that maps
> to the location in the block where the tx data can be found. These per-block
> indexes take up extra disk space and, realistically, would likely be
> implemented using LevelDB as that's a tool which is designed for creating
> and using these kinds of tables, so then you're both loading the block data
> itself (blocks are sized about right currently to always fit in the default
> kernel readahead window) AND also seeking through the indexes, and building
> them too. A smart implementation might try and pack the index next to each
> block so it's possible to load both at once with a single seek, but that
> would probably be more work, as it'd force building of the index to be
> synchronous with saving the block to disk thus slowing down block relay. In
> contrast a LevelDB based index would do the bulk of the index-building work
> on a separate core.
>
> At some block size and client load the additional data storage and increased
> pressure on the page cache would probably make it worthwhile. But I find it
> unlikely to be true at current traffic levels, or double or triple today's
> levels. So I'd rather we spend our very limited collective time on finding
> ways to increase usage rather than worrying about resources which are not
> presently scarce.
>
> (as an aside, some of the above analysis would be invalidated if most nodes
> end up running on SSDs, but I doubt most are. It'd be neat to export storage
> tech via some kind of stats message - LevelDB is designed for HDDs not SSDs
> so at some point a new storage subsystem might make sense if the network
> switched over).
>
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>



-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-10 10:38                     ` Mike Hearn
  2014-06-10 13:02                       ` Jeff Garzik
@ 2014-06-10 17:08                       ` Peter Todd
  2014-06-11  8:57                         ` Mike Hearn
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Todd @ 2014-06-10 17:08 UTC (permalink / raw)
  To: Mike Hearn, Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 4389 bytes --]

On Tue, Jun 10, 2014 at 06:38:23PM +0800, Mike Hearn wrote:
> >
> > As I explained in the email you're replying to and didn't quote, bloom
> > filters has O(n) cost per query, so sending different bloom filters to
> > different peers for privacy reasons costs the network significant disk
> > IO resources. If I were to actually implement it it'd look like a DoS
> > attack on the network.
> >
> 
> DoS attack? Nice try.

Suppose I wrote an single address lookup tool for Android that connected
to multiple peers and used bloom filters to find the history of a
specific address. Of course, I don't want to use too much bandwidth
being on mobile, so I'll use as specific a bloom filter as possible. I
might even connect to multiple peers to speed up the lookup.

Is this any different from my bloom filter IO attack code? Nope. Hence,
splitting up bloom filter requests for better privacy will certainly
look like a DoS attack and will certainly greatly increase the load on
the network.

> Now consider a prefix filtering implementation. You need to calculate a
> sorted list of all the data elements and tx hashes in the block, that maps
> to the location in the block where the tx data can be found. These
> per-block indexes take up extra disk space and, realistically, would likely
> be implemented using LevelDB as that's a tool which is designed for
> creating and using these kinds of tables, so then you're both loading the
> block data itself (blocks are sized about right currently to always fit in
> the default kernel readahead window) AND also seeking through the indexes,
> and building them too. A smart implementation might try and pack the index
> next to each block so it's possible to load both at once with a single
> seek, but that would probably be more work, as it'd force building of the
> index to be synchronous with saving the block to disk thus slowing down
> block relay. In contrast a LevelDB based index would do the bulk of the
> index-building work on a separate core.

That's exactly the kinds of optimizations obelisk is implementing to
make its prefix lookup database fast. Also those optimizations are
situation dependent, for instance "packing the index next to each block"
is irrelevant if you put archival blockchain data on a slow HD, and
indexes on a fast SSD, something some obelisk servers do.

More to the point, your showing quite clearly there isn't just one
optimal way to do it. Applying a bloom filter, or a prefix filter, or
some as yet unknown filter, to blockchain data is a service and that
service has different tradeoffs compared to just serving up archival
block history. There is zero reason not to make that service something
you advertise with NODE_BLOOM - after all, you already have the code in
bitcoinj to do the exact same thing by checking the advertised protocol
version.

On Tue, Jun 10, 2014 at 09:02:00AM -0400, Jeff Garzik wrote:
> Most of this description of disk activity is true, but it omits one
> key point:  Total cached data (working set).  It is a binary, first
> order question:  are you hitting pagecache, or the disk?  When nodes
> act as archival data sources, the pagecache pressure is immense.  When
> nodes just primarily serve recent blocks, that data is being served
> out of pagecache. As I directly observed running public nodes, the
> disks were running constantly, impacting all clients, even clients
> downloading only recent blocks.
> 
> Luckily, headers are served out of RAM, so that part of the sync is always fast.
> 
> NODE_BLOOM -- and block download in general -- will tend to be slower
> than it could be, due to the working set almost always being larger
> than available pagecache.  Fix that problem, NODE_BLOOM will always
> operate out of pagecache, and disk activity will not be an issue.
> 
> Once you start hitting the disk, you've already lost.

Yup. I discussed this with Matt Corallo at the financial crypto
conference a few months back and he made the same point. Unfortunately
we'll need an upgrade to let nodes advertise ranges of blocks to begin
to fix that issue, and even then it still shows quite clearly how it's
not optimal if we force everyone to share blockchain data in the same
way.

-- 
'peter'[:-1]@petertodd.org
000000000000000023c7fc084ed84b891cc2fa90e4a34708d6b2370d3ec1c85d

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bitcoin-development] Bloom bait
  2014-06-10 17:08                       ` Peter Todd
@ 2014-06-11  8:57                         ` Mike Hearn
  0 siblings, 0 replies; 20+ messages in thread
From: Mike Hearn @ 2014-06-11  8:57 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1823 bytes --]

>
> Is this any different from my bloom filter IO attack code? Nope.
>

It's obviously different; a thin client trying to obtain more privacy is
not attempting to deny service to anyone. You can't simply state that a
feature which uses resources for a legitimate reason is a DoS attack,
that's a spurious definition that would reclassify innocuous things like
web browser prefetching as DoS attacks too.

With respect to the work you're proposing, trying to retroactively make
Bloom filtering an optional part of the protocol is just wasting people's
time at this point:

   - There is no evidence there's an actual problem today.
   - There is no evidence there will be a problem any time soon, given the
   meagre levels of traffic growth we have.
   - It involves a complicated rollout strategy that would create work for
   many people.
   - Gavin, Wladimir and myself have all concluded it's not worth the cost.
   - The only justification you have provided is that two node
   implementations hardly anyone uses can't be bothered to implement Bloom
   filtering, but want to advertise support in their ver message anyway. They
   can simply lower the number they advertise in their ver message.

That said, if you want to implement better support for archival nodes then
that'd be great. The way to do it would be either to implement block ranges
in the addr announcements as has been discussed many times before, or
perhaps introduce a new protocol optimised for serving the chain. Nodes
that are CPU limited won't want to take part in the rest of the P2P
protocol anyway, it's not just Bloom filtering that uses CPU time but also
block and tx relay.

But until you have done these things, please stop attempting to reclassify
any feature you can imagine a more efficient version of as an "attack".
It's just silly.

[-- Attachment #2: Type: text/html, Size: 2284 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-06-11  8:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-06  8:19 [Bitcoin-development] NODE_BLOOM service bit Peter Todd
2014-06-06  8:48 ` Adam Back
2014-06-06  9:03   ` Gregory Maxwell
2014-06-06  9:11     ` Peter Todd
2014-06-06  9:04   ` Peter Todd
2014-06-06 10:45     ` Adam Back
2014-06-06 16:46       ` [Bitcoin-development] Bloom bait Peter Todd
2014-06-06 16:58         ` Gregory Maxwell
2014-06-06 17:05           ` Peter Todd
2014-06-06 17:10             ` Gregory Maxwell
2014-06-06 17:45               ` Peter Todd
2014-06-07 11:22                 ` Mike Hearn
2014-06-07 19:44                   ` Alan Reiner
2014-06-08 21:45                     ` Peter Todd
2014-06-10 10:41                       ` Mike Hearn
2014-06-08 21:35                   ` Peter Todd
2014-06-10 10:38                     ` Mike Hearn
2014-06-10 13:02                       ` Jeff Garzik
2014-06-10 17:08                       ` Peter Todd
2014-06-11  8:57                         ` Mike Hearn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox