* [Bitcoin-development] bloom filtering, privacy @ 2015-02-20 12:44 Adam Back 2015-02-20 16:18 ` Wladimir 2015-02-20 16:54 ` Mike Hearn 0 siblings, 2 replies; 19+ messages in thread From: Adam Back @ 2015-02-20 12:44 UTC (permalink / raw) To: Bitcoin Dev I saw there was some discussion on this topic on the bitcoinj list. (I dont think I can post there without subscribing probably.) Someone had posted about the lack of privacy provision from the current implementation parameters and real-world factors similar to described in this academic paper http://eprint.iacr.org/2014/763.pdf Mike had posted a detailed response on the topic on why its complex and becomes bandwidth inefficient to improve it usefully. https://groups.google.com/forum/#!msg/bitcoinj/Ys13qkTwcNg/9qxnhwnkeoIJ The basic summary of which I think is that its not even intended to provide any practical privacy protection, its just about compacting the query for a set of addresses. So I was wondering what about changing to committing a bloom filter of the addresses in the block. Its seems surprising no one thought of it that way before (as it seems obvious when you hear it) but that seems to address the privacy issues as the user can fetch the block bloom filters and then scan it in complete privacy. (Someone appeared on bitcoin wizards IRC a while back and made this observation.) From there its a question of fetching the candidate TXOs. Am I missing anything? Adam ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 12:44 [Bitcoin-development] bloom filtering, privacy Adam Back @ 2015-02-20 16:18 ` Wladimir 2015-02-20 16:38 ` Tamas Blummer 2015-02-20 16:54 ` Mike Hearn 1 sibling, 1 reply; 19+ messages in thread From: Wladimir @ 2015-02-20 16:18 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev Hello Adam, On Fri, 20 Feb 2015, Adam Back wrote: > So I was wondering what about changing to committing a bloom filter of > the addresses in the block. Its seems surprising no one thought of it > that way before (as it seems obvious when you hear it) but that seems > to address the privacy issues as the user can fetch the block bloom > filters and then scan it in complete privacy. (Someone appeared on > bitcoin wizards IRC a while back and made this observation.) I have heard this idea of inverting the bloom filter before (possibly in #bitcoin-wizards), and as I see it it would indeed improve the privacy. Apart from privacy it would also lower the burden for nodes. A block scan with bloom filter is effectively a cheap DoS on a node. In addition to that it will also avoid the 'transaction withholding attack' that is possible with the current bloom filtering, at least if the filter is e.g. committed to in the block header. The drawback would be overhead - the bloom filter per block will have a significant size (to avoid false positives), and the client would have to fetch entire blocks that have its transactions in it. I don't think that is so bad in practice, after all the % of blocks that will have transactions for a given wallet will generally be low, so the block size is amortized in a way. Of course, if the block size would be increased this would become worse. Wladimir ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 16:18 ` Wladimir @ 2015-02-20 16:38 ` Tamas Blummer 0 siblings, 0 replies; 19+ messages in thread From: Tamas Blummer @ 2015-02-20 16:38 UTC (permalink / raw) To: Wladimir; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 570 bytes --] On Feb 20, 2015, at 5:18 PM, Wladimir <laanwj@gmail.com> wrote: > On Fri, 20 Feb 2015, Adam Back wrote: > >> So I was wondering what about changing to committing a bloom filter of >> the addresses in the block. Its seems surprising no one thought of it >> that way before (as it seems obvious when you hear it) but that seems Such a bloom filter was present in the Bits of Proof block store in its last public version, so the idea obvious, but not new. It did support well scanning for BIP32 addresses as the query set extends while progressing. Tamas Blummer [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 496 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 12:44 [Bitcoin-development] bloom filtering, privacy Adam Back 2015-02-20 16:18 ` Wladimir @ 2015-02-20 16:54 ` Mike Hearn 2015-02-20 17:35 ` Adam Back 2015-02-20 17:50 ` Gregory Maxwell 1 sibling, 2 replies; 19+ messages in thread From: Mike Hearn @ 2015-02-20 16:54 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 2616 bytes --] Hey Adam, > Mike had posted a detailed response on the topic on why its complex > and becomes bandwidth inefficient to improve it usefully. > To clarify, we *could* improve privacy and still preserve usefully high performance, it's just a lot of complicated programming work. You need to find out from the OS how much bandwidth you have to play with, for example, and do all the very complex tracking to surf the wave and keep yourself in roughly the right place. The basic summary of which I think is that its not even intended to > provide any practical privacy protection, its just about compacting > the query for a set of addresses. > The original intent of Bloom filtering was to allow both. We want our cake and we want to eat it. The protocol can still do that, with sufficiently smart clients. The problem is that being sufficiently smart in this regard has never come to the top of the TODO list - users are always complaining about other things, so those things are what gets priority. It's not IMO a protocol issue per se. It's a code complexity and manpower issue. > Its seems surprising no one thought of it > that way before (as it seems obvious when you hear it) but that seems > to address the privacy issues as the user can fetch the block bloom > filters and then scan it in complete privacy. And then what? So you know the block matches. But with reasonable FP rates every block will match at least a few transactions (this is already the case - the FP rate is low but high enough that we get back FPs on nearly every block). So you end up downloading every block? That won't work. Eventually, wallets need to stop doing linear scans of the entire block chain to find tx data. That worked fine when blocks were 10kb, it's still working OK even though we scaled through two orders of magnitude, but we can imagine that if we reach 10mb blocks then this whole approach will just be too slow. The main reason wallets are scanning the chain today (beyond lack of protocol support for querying the UTXO set by script), is that they want to show users time-ordered lists of transactions. Financial apps should show you payment histories, everyone knows this, and without knowing roughly when a tx happened and which inputs/outputs were mine, providing a useful rendering is hard. Even with this data the UI is pretty useless, but at least it's not actually missing. By combining Subspace and BIP70 we can finally replace the payments list UI with actual proper metadata that isn't extracted from the block chain, and at that point non-scanning architectures become a lot more deployable. [-- Attachment #2: Type: text/html, Size: 3336 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 16:54 ` Mike Hearn @ 2015-02-20 17:35 ` Adam Back 2015-02-20 17:43 ` Mike Hearn 2015-02-20 17:50 ` Gregory Maxwell 1 sibling, 1 reply; 19+ messages in thread From: Adam Back @ 2015-02-20 17:35 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev Mike Hearn wrote: > Adam Back wrote: > > Its seems surprising no one thought of it > > that way before (as it seems obvious when you hear it) but that seems > > to address the privacy issues as the user can fetch the block bloom > > filters and then scan it in complete privacy. > > And then what? So you know the block matches. But with reasonable FP > rates every block will match at least a few transactions (this is already the > case - the FP rate is low but high enough that we get back FPs on nearly > every block). So you end up downloading every block? I mean because the user is scanning he can binary search which set of addresses from his wallet are possibly in the block and then request the specific addresses and some will be false positives and some real, but with the bloom commitment (and UTXO trie organised commitment) he can verify that the positive hits are correct via the merkle path, and that the false positives are not being wrongly withheld by obtaining merkle path proof that they are not in the trie. Adam On 20 February 2015 at 16:54, Mike Hearn <mike@plan99.net> wrote: > Hey Adam, > >> >> Mike had posted a detailed response on the topic on why its complex >> and becomes bandwidth inefficient to improve it usefully. > > > To clarify, we could improve privacy and still preserve usefully high > performance, it's just a lot of complicated programming work. You need to > find out from the OS how much bandwidth you have to play with, for example, > and do all the very complex tracking to surf the wave and keep yourself in > roughly the right place. > >> The basic summary of which I think is that its not even intended to >> provide any practical privacy protection, its just about compacting >> the query for a set of addresses. > > > The original intent of Bloom filtering was to allow both. We want our cake > and we want to eat it. > > The protocol can still do that, with sufficiently smart clients. The problem > is that being sufficiently smart in this regard has never come to the top of > the TODO list - users are always complaining about other things, so those > things are what gets priority. > > It's not IMO a protocol issue per se. It's a code complexity and manpower > issue. > >> >> Its seems surprising no one thought of it >> that way before (as it seems obvious when you hear it) but that seems >> to address the privacy issues as the user can fetch the block bloom >> filters and then scan it in complete privacy. > > > And then what? So you know the block matches. But with reasonable FP rates > every block will match at least a few transactions (this is already the case > - the FP rate is low but high enough that we get back FPs on nearly every > block). So you end up downloading every block? That won't work. > > Eventually, wallets need to stop doing linear scans of the entire block > chain to find tx data. That worked fine when blocks were 10kb, it's still > working OK even though we scaled through two orders of magnitude, but we can > imagine that if we reach 10mb blocks then this whole approach will just be > too slow. > > The main reason wallets are scanning the chain today (beyond lack of > protocol support for querying the UTXO set by script), is that they want to > show users time-ordered lists of transactions. Financial apps should show > you payment histories, everyone knows this, and without knowing roughly when > a tx happened and which inputs/outputs were mine, providing a useful > rendering is hard. Even with this data the UI is pretty useless, but at > least it's not actually missing. > > By combining Subspace and BIP70 we can finally replace the payments list UI > with actual proper metadata that isn't extracted from the block chain, and > at that point non-scanning architectures become a lot more deployable. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 17:35 ` Adam Back @ 2015-02-20 17:43 ` Mike Hearn 2015-02-20 17:59 ` Adam Back 0 siblings, 1 reply; 19+ messages in thread From: Mike Hearn @ 2015-02-20 17:43 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 1323 bytes --] Ah, I see, I didn't catch that this scheme relies on UTXO commitments (presumably with Mark's PATRICIA tree system?). If you're doing a binary search over block contents then does that imply multiple protocol round trips per synced block? I'm still having trouble visualising how this works. Perhaps you could write down an example run for me. How does it interact with the need to download chains rather than individual transactions, and do so without round-tripping to the remote node for each block? Bloom filtering currently pulls down blocks in batches without much client/server interaction and that is useful for performance. Like I said, I'd rather just junk the whole notion of chain scanning and get to a point where clients are only syncing headers. If nodes were calculating a script->(outpoint, merkle branch) map in LevelDB and allowing range queries over it, then you could quickly pull down relevant UTXOs along with the paths that indicated they did at one point exist. Nodes can still withhold evidence that those outputs were spent, but the same is true today and in practice this doesn't seem to be an issue. The primary advantage of that approach is it does not require a change to the consensus rules. But there are lots of unanswered questions about how it interacts with HD lookahead and so on. [-- Attachment #2: Type: text/html, Size: 1560 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 17:43 ` Mike Hearn @ 2015-02-20 17:59 ` Adam Back 2015-02-20 18:10 ` Mike Hearn 2015-02-20 18:20 ` Gregory Maxwell 0 siblings, 2 replies; 19+ messages in thread From: Adam Back @ 2015-02-20 17:59 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev The idea is not mine, some random guy appeared in #bitcoin-wizards one day and said something about it, and lots of people reacted, wow why didnt we think about that before. It goes something like each block contains a commitment to a bloom filter that has all of the addresses in the block stored in it. Now the user downloads the headers and bloom data for all blocks. The know the bloom data is correct in an SPV sense because of the commitment. They can scan it offline and locally by searching for addresses from their wallet in it. Not sure off hand what is the most efficient strategy, probably its pretty fast locally anyway. Now they know (modulo false positives) which addresses of theirs maybe in the block. So now they ask a full node for merkle paths + transactions for the addresses from the UTXO set from the block(s) that it was found in. Separately UTXO commitments could optionally be combined to improve security in two ways: - the normal SPV increase that you can also see that the transaction is actually in the last blocks UTXO set. - to avoid withholding by the full node, if the UTXO commitment is a trie (sorted) they can expect a merkle path to lexically adjacent nodes either side of where the claimed missing address would be as a proof that there really are no transactions for that address in the block. (Distinguishing false positive from node withholding) Adam On 20 February 2015 at 17:43, Mike Hearn <mike@plan99.net> wrote: > Ah, I see, I didn't catch that this scheme relies on UTXO commitments > (presumably with Mark's PATRICIA tree system?). > > If you're doing a binary search over block contents then does that imply > multiple protocol round trips per synced block? I'm still having trouble > visualising how this works. Perhaps you could write down an example run for > me. > > How does it interact with the need to download chains rather than individual > transactions, and do so without round-tripping to the remote node for each > block? Bloom filtering currently pulls down blocks in batches without much > client/server interaction and that is useful for performance. > > Like I said, I'd rather just junk the whole notion of chain scanning and get > to a point where clients are only syncing headers. If nodes were calculating > a script->(outpoint, merkle branch) map in LevelDB and allowing range > queries over it, then you could quickly pull down relevant UTXOs along with > the paths that indicated they did at one point exist. Nodes can still > withhold evidence that those outputs were spent, but the same is true today > and in practice this doesn't seem to be an issue. > > The primary advantage of that approach is it does not require a change to > the consensus rules. But there are lots of unanswered questions about how it > interacts with HD lookahead and so on. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 17:59 ` Adam Back @ 2015-02-20 18:10 ` Mike Hearn 2015-02-20 18:20 ` Gregory Maxwell 1 sibling, 0 replies; 19+ messages in thread From: Mike Hearn @ 2015-02-20 18:10 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 629 bytes --] > > So now they ask a full node for merkle paths + transactions for the > addresses from the UTXO set from the block(s) that it was found in. This is the part where I get lost. How does this improve privacy? If I have to specify which addresses are mine in this block, to get the tx data, the node learns which addresses are mine at this point, no? Also, are you saying each block needs a record of the entire UTXO set at the time the block was made? I'm not sure how to parse this sentence. Could you please walk me through precisely what happens and what data is sent, once I learn that a block has interesting data in it? [-- Attachment #2: Type: text/html, Size: 934 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 17:59 ` Adam Back 2015-02-20 18:10 ` Mike Hearn @ 2015-02-20 18:20 ` Gregory Maxwell 2015-02-20 19:03 ` Mike Hearn 1 sibling, 1 reply; 19+ messages in thread From: Gregory Maxwell @ 2015-02-20 18:20 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev On Fri, Feb 20, 2015 at 5:59 PM, Adam Back <adam@cypherspace.org> wrote: > So now they ask a full node for merkle paths + transactions for the > addresses from the UTXO set from the block(s) that it was found in. Use of the words "UTXO set" here is probably confusing people as it's likely to make people think of the complete verification state. In this case it's simply referring to block-local data. (and thus avoids the large IO overheads of an actual UTXO set). It's a straight forward idea: there is a scriptpubkey bitmap per block which is committed. Users can request the map, and be SPV confident that they received a faithful one. If there are hits, they can request the block and be confident there was no censoring. It's possible to tree structure additional layers to the bitmap, so one can incrementally query to trade0off map size for false request overhead, it's not clear to me how much of a win this would be for normal parameters.. It's also possible to extract the txout list for the whole block and commit to that too so it's possible to request just the outputs and get a faithful copy of them, which is _much_ smaller than the block overall. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 18:20 ` Gregory Maxwell @ 2015-02-20 19:03 ` Mike Hearn 2015-02-21 5:12 ` Adam Back 0 siblings, 1 reply; 19+ messages in thread From: Mike Hearn @ 2015-02-20 19:03 UTC (permalink / raw) To: Gregory Maxwell; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 2713 bytes --] > > It's a straight forward idea: there is a scriptpubkey bitmap per block > which is committed. Users can request the map, and be SPV confident > that they received a faithful one. If there are hits, they can request > the block and be confident there was no censoring. OK, I see now, thanks Gregory. You're right, the use of UTXO set in that context was confusing me. If I go back to when we first did Bloom filtering and imagine the same proposal being made, I guess I would have been wondering about the following issues. Perhaps they have solutions: 1. Miners have to upgrade to generate the per-block filters. Any block that doesn't have such a filter has to be downloaded in full, still. So it would have taken quite a while for the bandwidth savings to materialise. 2. If checking the filter isn't a consensus rule, any miner who builds a wrong filter breaks the entire SPV userbase. With per-node filtering, a malicious or wrong node breaks an individual sync, but if the wallet user notices they don't seem to be properly synchronised they can just press "Rescan chain" and most likely get fixed. In practice broken nodes have never been reported, but it's worth considering. 3. Downloading full blocks is still a lot of data. If you have a wallet that receives tips a couple of times per day, and you open your wallet once per week, then with 1mb blocks you would be downloading ~14mb of data each time. Pretty pokey even on a desktop. Much sadness if you're on mobile. 4. Header size is constant, but per-block filters wouldn't be. So even the null sync would download more data as the network got busier. Of course Bloom filtering has the same scaling problem, but only between hard disk -> RAM -> CPU rather than across the network. 5. Is it really more private? Imagine we see a hit in block 100, so we download the full block and find our transaction. But now we must also learn when that transaction is spent, so we can keep the wallet-local UTXO set up to date. So we scan forward and see another hit in block 105, so now we download block 105. The remote node can now select all transactions in block 105 that spend transactions in block 100 and narrow down which txs are ours. If we are watching a wallet that does multi-sends then it feels like this problem gets much worse. I'd really like to find a solution that has O(1) scaling on sync. If we see nodes just as sources of UTXO data then shovelling the client (tx, existing merkle path) pairs keyed off script prefixes would (with one additional index) be O(1) and provide the same security guarantees as Bloom filtering today. It creates lots of other problems to solve, but at least it would scale into the forseeable future. [-- Attachment #2: Type: text/html, Size: 3229 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 19:03 ` Mike Hearn @ 2015-02-21 5:12 ` Adam Back 2015-02-21 13:28 ` Mike Hearn 0 siblings, 1 reply; 19+ messages in thread From: Adam Back @ 2015-02-21 5:12 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev Seems like Greg & I may be saying different things. Maybe I am misunderstanding something at the wire level or in size/scalability but what I was trying to say is I think simpler. By UTXO commitment I mean a merkle tree of unspent addresses is committed in each block. Also a bloom filter containing addresses in the block is committed. Now the user downloads the bloom filter for each block, and searches it locally. They see which addresses of theirs maybe in the block (with some false positives). Then they make fresh random connections to different nodes and request download of the respective individual transactions from the full node. The node can respond either a) here is the transaction and here is its merkle path in the merkle tree (this is the way SPV works today); or b) there is no such transaction, this is a false positive, and here is a pair of merkle trie paths in the UTXO commitment (a trie) that proves the full node is not withholding and its true that no such transaction is in the block. Additionally with UTXO commitments in case a) the user can keep up to date with the chain tip and request from the full node a merkle path in the UTXO commitment to show that the coin is still unspent. (Otherwise you get long range attacks where someone can grind away until they belatedly find a PoW on an old low hashrate block with UTXO and fool an SPV node they know the address for into accepting a spend of something long spent). About privacy the node can make different random connections to different nodes to fetch addresses. Nothing is leaked by downloading the bloom filter. Scanning happens locally. The full node cant correlate the addresses as belonging to the same person by correlating the download requests for them, because they are made via different nodes. Its not a surprise nor privacy revealing that someone would want to check receipt of the funds. The limit is the interactive nature of ToR which isnt very secure against pervasive monitoring. But for basic full-node passive attack resistant privacy pretty good. Contrast with increasing the false positive on bloom queries: here the full node can test correlation (modulo the false positive ratio) and combine that with network flow analysis to narrow down who the user might be. Plus query size and privacy are in conflict. Plus the query size has to be continually retuned to even create a reliable false positive rate relative to the current UTXO set. Is that is even happening now (other than parameter sets)? About the bitmap: >Greg Maxwell wrote: >> there is a scriptpubkey bitmap per block >> which is committed. Users can request the map, and be SPV confident >> that they received a faithful one. If there are hits, they can request >> the block and be confident there was no censoring. how does the SPV client know what the bits in this map mean to scan? I presume these would be one bit per address and one would need to know the full UTXO set in order to know whats in there. I am not sure an SPV node would want the hit of keeping up to date with the full UTXO set? s/address/scriptpubkey for accuracy) Adam On 20 February 2015 at 19:03, Mike Hearn <mike@plan99.net> wrote: >> It's a straight forward idea: there is a scriptpubkey bitmap per block >> which is committed. Users can request the map, and be SPV confident >> that they received a faithful one. If there are hits, they can request >> the block and be confident there was no censoring. > > > OK, I see now, thanks Gregory. You're right, the use of UTXO set in that > context was confusing me. > > If I go back to when we first did Bloom filtering and imagine the same > proposal being made, I guess I would have been wondering about the following > issues. Perhaps they have solutions: > > 1. Miners have to upgrade to generate the per-block filters. Any block that > doesn't have such a filter has to be downloaded in full, still. So it would > have taken quite a while for the bandwidth savings to materialise. > > 2. If checking the filter isn't a consensus rule, any miner who builds a > wrong filter breaks the entire SPV userbase. With per-node filtering, a > malicious or wrong node breaks an individual sync, but if the wallet user > notices they don't seem to be properly synchronised they can just press > "Rescan chain" and most likely get fixed. In practice broken nodes have > never been reported, but it's worth considering. > > 3. Downloading full blocks is still a lot of data. If you have a wallet that > receives tips a couple of times per day, and you open your wallet once per > week, then with 1mb blocks you would be downloading ~14mb of data each time. > Pretty pokey even on a desktop. Much sadness if you're on mobile. > > 4. Header size is constant, but per-block filters wouldn't be. So even the > null sync would download more data as the network got busier. Of course > Bloom filtering has the same scaling problem, but only between hard disk -> > RAM -> CPU rather than across the network. > > 5. Is it really more private? Imagine we see a hit in block 100, so we > download the full block and find our transaction. But now we must also learn > when that transaction is spent, so we can keep the wallet-local UTXO set up > to date. So we scan forward and see another hit in block 105, so now we > download block 105. The remote node can now select all transactions in block > 105 that spend transactions in block 100 and narrow down which txs are ours. > If we are watching a wallet that does multi-sends then it feels like this > problem gets much worse. > > > > I'd really like to find a solution that has O(1) scaling on sync. If we see > nodes just as sources of UTXO data then shovelling the client (tx, existing > merkle path) pairs keyed off script prefixes would (with one additional > index) be O(1) and provide the same security guarantees as Bloom filtering > today. It creates lots of other problems to solve, but at least it would > scale into the forseeable future. > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-21 5:12 ` Adam Back @ 2015-02-21 13:28 ` Mike Hearn 2015-02-21 14:30 ` Adam Back 0 siblings, 1 reply; 19+ messages in thread From: Mike Hearn @ 2015-02-21 13:28 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 5333 bytes --] Let's put the UTXO commitments/anti-fraud proofs to one side for a moment. I would like to see them happen one day, but they aren't critical to these protocols and are just proving to be a distraction. > Then they make fresh random connections to different nodes and request > download of the respective individual transactions from the full node. > ... About privacy the node can make different random connections to > different nodes to fetch addresses ..... The full node cant > correlate the addresses as belonging to the same person by correlating > the download requests for them, because they are made via different > nodes. Apologies for the wall of text, but I don't think this will work nor solve any real problem. And I must justify such a strong statement clearly. *First: technical issues* When you download the per-block Bloom filter and test, what you get back is a set of script elements (addresses, keys, OP_RETURN tags etc). But then in the next step you are saying that you connect to random peers and request individual transactions. We don't know that at this point. All we know are a set of addresses that possibly matched. So I think what you mean is "wallets connect to random peers and request transactions in block N that match a given set of addresses". This is what Bloom filtering already does, of course. Doing the test against the per-block filter first doesn't seem to buy us much because with thousands of transactions per block, even a very tiny FP rate will still trigger a match on every single one. The second problem I see is that we can't do this in parallel because of the following edge case: wallet contains key K and someone sends it money using an OP_CHECKSIG output. The input which spends this output does not contain any predictable data, thus we do not know what to look for in the following blocks to detect a spend of it until we have seen the first transaction and know its hash. In practice this means we must either scan through the chain in sequence and update our matching criteria if we see such an output (this is what the Bloom filtering protocol already does server-side), or we must constrain the user such that output scripts always force repetition of predictable data - this is what mostly happens today due to pay-to-address outputs, but not always, and correctness is more important than completeness. If we can't do it in parallel then we must suffer a node round-trip for every single block we traverse, because we can't request long runs of blocks with a single command. That latency will kill performance dead. It's a non starter. But let's imagine we don't care about OP_CHECKSIG outputs and are willing to ignore them. There are cases where they are the best and most efficient technical solution, but let's put that to one side. The primary difference after making the above changes are that no one node gets a filter containing *all* our keys and addresses. I don't think a per block pre-test filter would gain us much efficiency so from a privacy perspective this is what it boils down to - sharding of the scan. But we can already do this with the current Bloom filtering protocol. BitcoinJ doesn't do so because having multiple parallel scans uses up network IOPs which are a resource of unknown quantity, and because stepping through the chain in parallel with multiple peers complicates the chain sync implementation quite a bit. *Second: this doesn't solve any real problem* Who cares about collecting Bloom filters off the wire? Commercial fraudsters? Doubtful. There are much easier ways to steal money. Spies? Yes! Without a doubt NSA/GCHQ are building or have built databases of IP addresses to Bitcoin addresses and are correlating it via XKEYSCORE with other identifiable information. However, just requesting data from different nodes doesn't help with that, because they are doing DPI and can still see all the connections, so can still combine all the filters or received transactions. Ah, you say, but we're requesting everything via Tor. Yes, about that. We've implemented that already. Some wallets even use it by default, like Alon & Chris' Bitcoin Authenticator wallet. It's just one line of code to activate. Unfortunately there are severe practical problems to using Tor: 1. If you don't have a warm consensus then booting it up is very slow. We're already slower than our competitors like blockchain.info and VISA/MasterCard, we can't make this any worse. This one is possibly not that big a deal and can be solved with more technical tricks. 2. Bitcoin Core's DoS strategy means anyone can block all of Tor quite trivially. So we'd need some complicated fallback mechanism to disable Tor remotely, in case someone did this. 3. Bitcoin wire traffic isn't encrypted or authenticated so it makes it much easier for trolls to tamper with lots of wire traffic at once, whereas without Tor it's much harder. Let's ignore the fact that the Tor project insists on poking the law enforcement bear with rusty nails, and has been receiving tipoffs about plans to seize directory authorities. How much Bitcoin wallets should rely on Tor sticking around is a debate for some other time. There's a much simpler way to fix all of this - add opportunistic encryption to the wire protocol. [-- Attachment #2: Type: text/html, Size: 6413 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-21 13:28 ` Mike Hearn @ 2015-02-21 14:30 ` Adam Back 2015-02-21 14:45 ` Mike Hearn 0 siblings, 1 reply; 19+ messages in thread From: Adam Back @ 2015-02-21 14:30 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev If you want to be constructive and index transactions that are not p2sh but non-simple and contain checksig so the address is visible, you could do that with a block bloom filter also. I wasnt sure if the comments about the need to batch requests was about downloading headers & filters, or about transactions, there is no harm downloading headers & bloom filters without Tor - there is no identity nor addresses revealed by doing so. So over Tor you would just be fetching transactions that match the address. For downloading transactions unless you frequently receive transactions you wont be fetching every block. Or are you assuming bloom filters dialled up to the point of huge false positives? You said otherwise. Mid-term I'd say you want some basic request tunneling as part of bitcoin, that maybe isnt Tor, to avoid sharing their fate if Tor controversies are a risk to Tor service. Some of the bitcoin-Tor specific weak points could maybe then be addressed. Relatedly I think bitcoin could do with a store-and-forward message bus with privacy and strong reliability via redundancy (but less redundancy maybe than consensus all-nodes must receiving and agree and store forever). That provides an efficient store-and-forward SPV receivable stealth-address solution that doesnt suck: send the recipient their payment, if they like it they broadcast it themselves. As a bonus store-and-forward message mixes are better able to provide meaningful network privacy than interactive privacy networks. You could spend over the same channel You seem to be saying at one point that Tor is useless against pervasive eavesdropper threat model (which I am not sure I agree with, minimally it makes them work for the info and adds uncertainty; and not been paying super close attention but I think some of the Snowden releases suggest Tor is a net win) and secondly that other types of attackers are disinterested (how do we know that?) or maybe that you dont care about privacy vs them (maybe some users do!) It would certainly be nice to get real privacy from a wider range of attackers but nothing (current situation) is clearly worse; using block bloom filters we'd make the pervasive case harder work, and the nosy full node learn nothing. Adam On 21 February 2015 at 13:28, Mike Hearn <mike@plan99.net> wrote: > Let's put the UTXO commitments/anti-fraud proofs to one side for a moment. I > would like to see them happen one day, but they aren't critical to these > protocols and are just proving to be a distraction. > > >> >> Then they make fresh random connections to different nodes and request >> download of the respective individual transactions from the full node. > > > ... > >> About privacy the node can make different random connections to >> different nodes to fetch addresses ..... The full node cant >> correlate the addresses as belonging to the same person by correlating >> the download requests for them, because they are made via different >> nodes. > > > Apologies for the wall of text, but I don't think this will work nor solve > any real problem. And I must justify such a strong statement clearly. > > First: technical issues > > When you download the per-block Bloom filter and test, what you get back is > a set of script elements (addresses, keys, OP_RETURN tags etc). But then in > the next step you are saying that you connect to random peers and request > individual transactions. We don't know that at this point. All we know are a > set of addresses that possibly matched. So I think what you mean is "wallets > connect to random peers and request transactions in block N that match a > given set of addresses". > > This is what Bloom filtering already does, of course. Doing the test against > the per-block filter first doesn't seem to buy us much because with > thousands of transactions per block, even a very tiny FP rate will still > trigger a match on every single one. > > The second problem I see is that we can't do this in parallel because of the > following edge case: wallet contains key K and someone sends it money using > an OP_CHECKSIG output. The input which spends this output does not contain > any predictable data, thus we do not know what to look for in the following > blocks to detect a spend of it until we have seen the first transaction and > know its hash. > > In practice this means we must either scan through the chain in sequence and > update our matching criteria if we see such an output (this is what the > Bloom filtering protocol already does server-side), or we must constrain the > user such that output scripts always force repetition of predictable data - > this is what mostly happens today due to pay-to-address outputs, but not > always, and correctness is more important than completeness. > > If we can't do it in parallel then we must suffer a node round-trip for > every single block we traverse, because we can't request long runs of blocks > with a single command. That latency will kill performance dead. It's a non > starter. > > But let's imagine we don't care about OP_CHECKSIG outputs and are willing to > ignore them. There are cases where they are the best and most efficient > technical solution, but let's put that to one side. > > The primary difference after making the above changes are that no one node > gets a filter containing all our keys and addresses. I don't think a per > block pre-test filter would gain us much efficiency so from a privacy > perspective this is what it boils down to - sharding of the scan. > > But we can already do this with the current Bloom filtering protocol. > BitcoinJ doesn't do so because having multiple parallel scans uses up > network IOPs which are a resource of unknown quantity, and because stepping > through the chain in parallel with multiple peers complicates the chain sync > implementation quite a bit. > > Second: this doesn't solve any real problem > > Who cares about collecting Bloom filters off the wire? > > Commercial fraudsters? Doubtful. There are much easier ways to steal money. > > Spies? Yes! Without a doubt NSA/GCHQ are building or have built databases of > IP addresses to Bitcoin addresses and are correlating it via XKEYSCORE with > other identifiable information. > > However, just requesting data from different nodes doesn't help with that, > because they are doing DPI and can still see all the connections, so can > still combine all the filters or received transactions. > > Ah, you say, but we're requesting everything via Tor. > > Yes, about that. We've implemented that already. Some wallets even use it by > default, like Alon & Chris' Bitcoin Authenticator wallet. It's just one line > of code to activate. > > Unfortunately there are severe practical problems to using Tor: > > If you don't have a warm consensus then booting it up is very slow. We're > already slower than our competitors like blockchain.info and > VISA/MasterCard, we can't make this any worse. > > This one is possibly not that big a deal and can be solved with more > technical tricks. > > Bitcoin Core's DoS strategy means anyone can block all of Tor quite > trivially. So we'd need some complicated fallback mechanism to disable Tor > remotely, in case someone did this. > > Bitcoin wire traffic isn't encrypted or authenticated so it makes it much > easier for trolls to tamper with lots of wire traffic at once, whereas > without Tor it's much harder. > > Let's ignore the fact that the Tor project insists on poking the law > enforcement bear with rusty nails, and has been receiving tipoffs about > plans to seize directory authorities. How much Bitcoin wallets should rely > on Tor sticking around is a debate for some other time. > > There's a much simpler way to fix all of this - add opportunistic encryption > to the wire protocol. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-21 14:30 ` Adam Back @ 2015-02-21 14:45 ` Mike Hearn 0 siblings, 0 replies; 19+ messages in thread From: Mike Hearn @ 2015-02-21 14:45 UTC (permalink / raw) To: Adam Back; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 2757 bytes --] > > For downloading transactions unless you frequently receive > transactions you wont be fetching every block. Or are you assuming > bloom filters dialled up to the point of huge false positives? You > said otherwise. > Well, what I mean is, bitcoinj already gets criticised for having very low FP rates, but even with those rates we're applying them to hundreds of thousands of transactions per sync. So it's still enough FPs to trigger at least one per block, often several, yet people tell us this isn't enough to give meaningful privacy. > Relatedly I think bitcoin could do with a store-and-forward message > bus with privacy and strong reliability via redundancy (but less > redundancy maybe than consensus all-nodes must receiving and agree and > store forever). > Yup, see here: https://www.bitcoinauthenticator.org/subspace/ https://groups.google.com/forum/#!topic/bitcoinj/_S15jo5mcDI Subspace looks like it's developing into what we need. > You seem to be saying at one point that Tor is useless against > pervasive eavesdropper threat model No, Tor is effective against in that threat model. What I meant is that without Tor, someone doing wire intercepts isn't going to be fazed by using multiple peers together, and with Tor it's not clear that syncing from multiple peers in parallel gives much an additional win. Also, getting Tor practical enough to activate by default is tricky. Though the same people who are doing Subspace are trying it out to see what happens. secondly that other types of attackers are disinterested (how do we know > that?) or maybe that you > dont care about privacy vs them (maybe some users do!) > Some of my opinions are based on experience of HTTPS deployments, where many of the same issues apply. > It would certainly be nice to get real privacy from a wider range of > attackers but nothing (current situation) is clearly worse; using > block bloom filters we'd make the pervasive case harder work, and the > nosy full node learn nothing. Yes, but what's the best way to fix that? The calculation goes like this: we have ~80 hours of hacking time to spend on privacy this quarter. Do we: a) Do wire encryption b) Make Bloom filter clients smarter c) Optimise Tor d) Do a new PIR protocol from scratch and possibly run out of time having failed to launch Of these (d) is the least appealing to me, especially because I don't feel like submitting SPV related stuff to Bitcoin Core any more. If I were to work on the protocol it'd be in the context of Bitcoin XT, which rules out consensus changes or other things that rely on miners. Wire encryption would probably raise the bar for our spooky friends quite a lot, with minimal effort. The ROI looks good, compared to more complex PIR. [-- Attachment #2: Type: text/html, Size: 4283 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 16:54 ` Mike Hearn 2015-02-20 17:35 ` Adam Back @ 2015-02-20 17:50 ` Gregory Maxwell 2015-02-20 17:53 ` Mike Hearn 1 sibling, 1 reply; 19+ messages in thread From: Gregory Maxwell @ 2015-02-20 17:50 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev On Fri, Feb 20, 2015 at 4:54 PM, Mike Hearn <mike@plan99.net> wrote: > And then what? So you know the block matches. But with reasonable FP rates > every block will match at least a few transactions (this is already the case This approach needs a filter set with a lower FP rate. It doesn't depend on having a high FP rate for privacy (which is good, since counting on filter false positives seems to more or less fail to deliver actual privacy in any case.) Larger filters mean a somewhat higher baseline bandwidth, though when users do not reuse addresses and have more addresses than there are txouts in the block the gap is narrower. > Ah, I see, I didn't catch that this scheme relies on UTXO commitments This is talking about a committed bloom filter. Not a committed UTXO set. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 17:50 ` Gregory Maxwell @ 2015-02-20 17:53 ` Mike Hearn 2015-02-21 16:03 ` Chris Pacia 0 siblings, 1 reply; 19+ messages in thread From: Mike Hearn @ 2015-02-20 17:53 UTC (permalink / raw) To: Gregory Maxwell; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 551 bytes --] > > This is talking about a committed bloom filter. Not a committed UTXO set. > I read the following comment to mean it requires the UTXO commitments. Otherwise I'm not sure how you prove absence of withholding with just current block structures+an extra filter included in the block: but with the bloom commitment (and UTXO trie organised commitment) he > can verify that the positive hits are correct via the merkle path, and > that the false positives are not being wrongly withheld by obtaining > merkle path proof that they are not in the trie [-- Attachment #2: Type: text/html, Size: 1270 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-20 17:53 ` Mike Hearn @ 2015-02-21 16:03 ` Chris Pacia 2015-02-21 16:47 ` Mike Hearn 0 siblings, 1 reply; 19+ messages in thread From: Chris Pacia @ 2015-02-21 16:03 UTC (permalink / raw) To: bitcoin-development [-- Attachment #1: Type: text/plain, Size: 1796 bytes --] Adam seems to be making sense to me. Only querying a single node when an address in my wallet matches the block filter seems to be pretty efficient. The downside is it relies entirely on Tor for privacy, but then again it's not the only aspect of spv clients that require it for privacy (there's broadcasting for example). And on a related note, if we eventually do end up receiving bip70 payments directly, we still need to query for block inclusion and that would seem to be an easy way to do it. On 02/20/2015 12:53 PM, Mike Hearn wrote: > > This is talking about a committed bloom filter. Not a committed > UTXO set. > > > I read the following comment to mean it requires the UTXO commitments. > Otherwise I'm not sure how you prove absence of withholding with just > current block structures+an extra filter included in the block: > > but with the bloom commitment (and UTXO trie organised commitment) he > can verify that the positive hits are correct via the merkle path, and > that the false positives are not being wrongly withheld by obtaining > merkle path proof that they are not in the trie > > > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > > > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development [-- Attachment #2: Type: text/html, Size: 3819 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-21 16:03 ` Chris Pacia @ 2015-02-21 16:47 ` Mike Hearn 2015-02-21 18:38 ` Chris Pacia 0 siblings, 1 reply; 19+ messages in thread From: Mike Hearn @ 2015-02-21 16:47 UTC (permalink / raw) To: Chris Pacia; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 1997 bytes --] > > Adam seems to be making sense to me. Only querying a single node when an > address in my wallet matches the block filter seems to be pretty efficient. > No, I think it's less efficient (for the client). Quick sums: blocks with 1500 transactions in them are common today. But Bitcoin is growing. Let's imagine a system 10x larger than today. Doesn't seem implausible to reach that in the next 5-10 years, so 15,000 transactions. Each transaction has multiple elements we might want to match (addresses, keys, etc). Let's say the average tx contains 5 unique keys/elements. That's the base case of {1 input, 2 outputs} without address reuse, plus fudged up a bit for multi-sends then down a bit again for address reuse. 15,000*5=75,000 unique elements per block. With an FP rate of 0.1% we get: http://hur.st/bloomfilter?n=75000&p=0.001 131.63KB per block extra overhead. 144 blocks in a day, so that's 18mb of data per day's worth of sync to pull down over the network. If you don't start your wallet for a week that's 126 megabytes of data just to get started. Affordable, yes (in the west). Fast enough to be competitive? Doubtful. I don't believe that even in five years mobiles will be pulling down and processing that much data within a few seconds, not even in developed countries. But like I said, I don't see why it matters. Anyone who is watching the wire close to you learns which transactions are yours, still, so it doesn't stop intelligence agencies. Anyone who is running a node learns which transactions in the requested block were yours and thus can follow the tx chain to learn which other transactions might be yours too, no different to today. If you connect to a single node and say "give me the transactions sending money to key A in block N", it doesn't matter if you then don't request block N+6 from the same peer - they know you will request it eventually anyway, just by virtue of the fact that one of the transactions they gave you was spent in that block. [-- Attachment #2: Type: text/html, Size: 2695 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Bitcoin-development] bloom filtering, privacy 2015-02-21 16:47 ` Mike Hearn @ 2015-02-21 18:38 ` Chris Pacia 0 siblings, 0 replies; 19+ messages in thread From: Chris Pacia @ 2015-02-21 18:38 UTC (permalink / raw) To: Mike Hearn; +Cc: Bitcoin Dev [-- Attachment #1: Type: text/plain, Size: 2212 bytes --] Yeah that overhead is pretty high. I wasn't thinking about 10 years out. On Sat, Feb 21, 2015, 11:47 AM Mike Hearn <mike@plan99.net> wrote: > Adam seems to be making sense to me. Only querying a single node when an >> address in my wallet matches the block filter seems to be pretty efficient. >> > > No, I think it's less efficient (for the client). > > Quick sums: blocks with 1500 transactions in them are common today. But > Bitcoin is growing. Let's imagine a system 10x larger than today. Doesn't > seem implausible to reach that in the next 5-10 years, so 15,000 > transactions. Each transaction has multiple elements we might want to match > (addresses, keys, etc). > > Let's say the average tx contains 5 unique keys/elements. That's the base > case of {1 input, 2 outputs} without address reuse, plus fudged up a bit > for multi-sends then down a bit again for address reuse. > > 15,000*5=75,000 unique elements per block. With an FP rate of 0.1% we get: > > http://hur.st/bloomfilter?n=75000&p=0.001 > > 131.63KB per block extra overhead. > > 144 blocks in a day, so that's 18mb of data per day's worth of sync to > pull down over the network. If you don't start your wallet for a week > that's 126 megabytes of data just to get started. > > Affordable, yes (in the west). Fast enough to be competitive? Doubtful. I > don't believe that even in five years mobiles will be pulling down and > processing that much data within a few seconds, not even in developed > countries. > > But like I said, I don't see why it matters. Anyone who is watching the > wire close to you learns which transactions are yours, still, so it doesn't > stop intelligence agencies. Anyone who is running a node learns which > transactions in the requested block were yours and thus can follow the tx > chain to learn which other transactions might be yours too, no different to > today. If you connect to a single node and say "give me the transactions > sending money to key A in block N", it doesn't matter if you then don't > request block N+6 from the same peer - they know you will request it > eventually anyway, just by virtue of the fact that one of the transactions > they gave you was spent in that block. > > > [-- Attachment #2: Type: text/html, Size: 3155 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-02-21 18:38 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-02-20 12:44 [Bitcoin-development] bloom filtering, privacy Adam Back 2015-02-20 16:18 ` Wladimir 2015-02-20 16:38 ` Tamas Blummer 2015-02-20 16:54 ` Mike Hearn 2015-02-20 17:35 ` Adam Back 2015-02-20 17:43 ` Mike Hearn 2015-02-20 17:59 ` Adam Back 2015-02-20 18:10 ` Mike Hearn 2015-02-20 18:20 ` Gregory Maxwell 2015-02-20 19:03 ` Mike Hearn 2015-02-21 5:12 ` Adam Back 2015-02-21 13:28 ` Mike Hearn 2015-02-21 14:30 ` Adam Back 2015-02-21 14:45 ` Mike Hearn 2015-02-20 17:50 ` Gregory Maxwell 2015-02-20 17:53 ` Mike Hearn 2015-02-21 16:03 ` Chris Pacia 2015-02-21 16:47 ` Mike Hearn 2015-02-21 18:38 ` Chris Pacia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox