public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] BIP 158 Flexibility and Filter Size
@ 2018-05-17 15:25 Matt Corallo
  2018-05-17 15:43 ` Peter Todd
                   ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Matt Corallo @ 2018-05-17 15:25 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

BIP 158 currently includes the following in the "basic" filter: 1)
txids, 2) output scripts, 3) input prevouts.

I believe (1) could be skipped entirely - there is almost no reason why
you'd not be able to filter for, eg, the set of output scripts in a
transaction you know about and (2) and (3) may want to be split out -
many wallets may wish to just find transactions paying to them, as
transactions spending from their outputs should generally be things
they've created.

In general, I'm concerned about the size of the filters making existing
SPV clients less willing to adopt BIP 158 instead of the existing bloom
filter garbage and would like to see a further exploration of ways to
split out filters to make them less bandwidth intensive. Some further
ideas we should probably play with before finalizing moving forward is
providing filters for certain script templates, eg being able to only
get outputs that are segwit version X or other similar ideas.

Matt


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 15:25 [bitcoin-dev] BIP 158 Flexibility and Filter Size Matt Corallo
@ 2018-05-17 15:43 ` Peter Todd
  2018-05-17 15:46   ` Matt Corallo
  2018-05-17 16:36 ` Gregory Maxwell
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 60+ messages in thread
From: Peter Todd @ 2018-05-17 15:43 UTC (permalink / raw)
  To: Matt Corallo, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On Thu, May 17, 2018 at 11:25:12AM -0400, Matt Corallo via bitcoin-dev wrote:
> BIP 158 currently includes the following in the "basic" filter: 1)
> txids, 2) output scripts, 3) input prevouts.
> 
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about and (2) and (3) may want to be split out -
> many wallets may wish to just find transactions paying to them, as
> transactions spending from their outputs should generally be things
> they've created.

So I think we have two cases where wallets want to find txs spending from their
outputs:

1) Waiting for a confirmation
2) Detecting theft

The former can be turned off once there are no expected unconfirmed
transactions.

As for the latter, this is probably a valuable thing for wallets to do. Modulo
reorgs, reducing the frequency that you check for stolen funds doesn't decrease
total bandwidth cost - it's one filter match per block regardless - but perhaps
the real-world bandwidth cost can be reduced by, say, waiting for a wifi
connection rather than using cellular data.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 15:43 ` Peter Todd
@ 2018-05-17 15:46   ` Matt Corallo
  0 siblings, 0 replies; 60+ messages in thread
From: Matt Corallo @ 2018-05-17 15:46 UTC (permalink / raw)
  To: Peter Todd, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

(1) can be accomplished by filtering for the set of outputs in the transaction you created. I agree (2) would ideally be done to avoid issues with a copied wallet (theft or not), but I am worried about the size of the filters themselves, not just the size of the blocks downloaded after a match.

On May 17, 2018 3:43:15 PM UTC, Peter Todd <pete@petertodd.org> wrote:
>On Thu, May 17, 2018 at 11:25:12AM -0400, Matt Corallo via bitcoin-dev
>wrote:
>> BIP 158 currently includes the following in the "basic" filter: 1)
>> txids, 2) output scripts, 3) input prevouts.
>> 
>> I believe (1) could be skipped entirely - there is almost no reason
>why
>> you'd not be able to filter for, eg, the set of output scripts in a
>> transaction you know about and (2) and (3) may want to be split out -
>> many wallets may wish to just find transactions paying to them, as
>> transactions spending from their outputs should generally be things
>> they've created.
>
>So I think we have two cases where wallets want to find txs spending
>from their
>outputs:
>
>1) Waiting for a confirmation
>2) Detecting theft
>
>The former can be turned off once there are no expected unconfirmed
>transactions.
>
>As for the latter, this is probably a valuable thing for wallets to do.
>Modulo
>reorgs, reducing the frequency that you check for stolen funds doesn't
>decrease
>total bandwidth cost - it's one filter match per block regardless - but
>perhaps
>the real-world bandwidth cost can be reduced by, say, waiting for a
>wifi
>connection rather than using cellular data.
>
>-- 
>https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Type: text/html, Size: 1967 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 15:25 [bitcoin-dev] BIP 158 Flexibility and Filter Size Matt Corallo
  2018-05-17 15:43 ` Peter Todd
@ 2018-05-17 16:36 ` Gregory Maxwell
  2018-05-17 16:59   ` Matt Corallo
                     ` (2 more replies)
  2018-05-18  6:28 ` Karl-Johan Alm
  2018-05-19  2:51 ` Olaoluwa Osuntokun
  3 siblings, 3 replies; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-17 16:36 UTC (permalink / raw)
  To: Matt Corallo, Bitcoin Protocol Discussion

On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about

I think this is convincing for the txids themselves.

What about also making input prevouts filter based on the scriptpubkey
being _spent_?  Layering wise in the processing it's a bit ugly, but
if you validated the block you have the data needed.

This would eliminate the multiple data type mixing entirely.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 16:36 ` Gregory Maxwell
@ 2018-05-17 16:59   ` Matt Corallo
  2018-05-17 18:34     ` Gregory Maxwell
  2018-05-17 18:34     ` Gregory Maxwell
  2018-05-18  8:46   ` Riccardo Casatta
  2018-05-19  2:57   ` Olaoluwa Osuntokun
  2 siblings, 2 replies; 60+ messages in thread
From: Matt Corallo @ 2018-05-17 16:59 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Protocol Discussion

Yea I generally would really prefer something like that but it
significantly complicates the download logic - currently clients can
easily cross-check a filter in case they differ between providers by
downloading the block. If we instead went with the script being spent
they would have to be provided all previous transactions (potentially
compressed via midstate) as well, making it potentially infeasible to
identify the offending node while remaining a lightweight client. Maybe
there is some other reasonable download logic to replace it with, however.

Matt

On 05/17/18 12:36, Gregory Maxwell wrote:
> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> I believe (1) could be skipped entirely - there is almost no reason why
>> you'd not be able to filter for, eg, the set of output scripts in a
>> transaction you know about
> 
> I think this is convincing for the txids themselves.
> 
> What about also making input prevouts filter based on the scriptpubkey
> being _spent_?  Layering wise in the processing it's a bit ugly, but
> if you validated the block you have the data needed.
> 
> This would eliminate the multiple data type mixing entirely.
> 


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 16:59   ` Matt Corallo
  2018-05-17 18:34     ` Gregory Maxwell
@ 2018-05-17 18:34     ` Gregory Maxwell
  2018-05-17 20:19       ` Jim Posen
  1 sibling, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-17 18:34 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Protocol Discussion

On Thu, May 17, 2018 at 4:59 PM, Matt Corallo <lf-lists@mattcorallo.com> wrote:
> Yea I generally would really prefer something like that but it
> significantly complicates the download logic - currently clients can
> easily cross-check [...] Maybe
> there is some other reasonable download logic to replace it with, however.

I think lite clients cross checking is something which very likely
will never be implemented by anyone, and probably not stay working
(due to under-usage) if it is implemented.  This thought is driven by
three things  (1) the bandwidth overhead of performing the check, (2)
thinking about the network-interacting-state-machine complexity of it,
and by the multitude of sanity checks that lite clients already don't
implement (e.g. when a lite client noticed a split tip it could ask
peers for the respective blocks and check at least the stateless
checks, but none has ever done that), and...

(3) that kind of checking would be moot if the filters were committed
and validated... and the commitment would be both much simpler to
check for lite clients and provide much stronger protection against
malicious peers.

My expectation is that eventually one of these filter-map designs
would become committed-- not after we already had it deployed and had
worked out the design to the n-th generation (just as your proposed
revisions are doing to the initial proposal), but eventually.

Also, even without this change clients can still do that "are multiple
peers telling me the same thing or different things" kind of checking,
which I expect is the strongest testing we'd actually see them
implement absent a commitment.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 16:59   ` Matt Corallo
@ 2018-05-17 18:34     ` Gregory Maxwell
  2018-05-17 18:34     ` Gregory Maxwell
  1 sibling, 0 replies; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-17 18:34 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Protocol Discussion

On Thu, May 17, 2018 at 4:59 PM, Matt Corallo <lf-lists@mattcorallo.com> wrote:
> Yea I generally would really prefer something like that but it
> significantly complicates the download logic - currently clients can
> easily cross-check [...] Maybe
> there is some other reasonable download logic to replace it with, however.

I think lite clients cross checking is something which very likely
will never be implemented by anyone, and probably not stay working
(due to under-usage) if it is implemented.  This thought is driven by
three things  (1) the bandwidth overhead of performing the check, (2)
thinking about the network-interacting-state-machine complexity of it,
and by the multitude of sanity checks that lite clients already don't
implement (e.g. when a lite client noticed a split tip it could ask
peers for the respective blocks and check at least the stateless
checks, but none has ever done that), and...

(3) that kind of checking would be moot if the filters were committed
and validated... and the commitment would be both much simpler to
check for lite clients and provide much stronger protection against
malicious peers.

My expectation is that eventually one of these filter-map designs
would become committed-- not after we already had it deployed and had
worked out the design to the n-th generation (just as your proposed
revisions are doing to the initial proposal), but eventually.

Also, even without this change clients can still do that "are multiple
peers telling me the same thing or different things" kind of checking,
which I expect is the strongest testing we'd actually see them
implement absent a commitment.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 18:34     ` Gregory Maxwell
@ 2018-05-17 20:19       ` Jim Posen
  2018-05-17 20:45         ` Gregory Maxwell
  0 siblings, 1 reply; 60+ messages in thread
From: Jim Posen @ 2018-05-17 20:19 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2082 bytes --]

>
> I think lite clients cross checking is something which very likely
> will never be implemented by anyone, and probably not stay working
> (due to under-usage) if it is implemented.  This thought is driven by
> three things  (1) the bandwidth overhead of performing the check, (2)
> thinking about the network-interacting-state-machine complexity of it,
> and by the multitude of sanity checks that lite clients already don't
> implement (e.g. when a lite client noticed a split tip it could ask
> peers for the respective blocks and check at least the stateless
> checks, but none has ever done that), and...
>

In my opinion, it's overly pessimistic to design the protocol in an
insecure way because some light clients historically have taken shortcuts.
If the protocol can provide clients the option of getting additional
security, it should.

On the general topic, Peter makes a good point that in many cases filtering
by txid of spending transaction may be preferable to filtering by outpoint
spend, which has the nice benefit that there are obviously fewer txs in a
block than txins. This wouldn't work for malleable transactions though.

I'm open to the idea of splitting the basic filter into three separate
filters based on data type, but there are some bandwidth concerns. First,
the GCS encoding gets better compression with a greater number of elements,
though as I recall in my analysis, that starts to tail off at ~1000
elements per filter with P=20, in which case it's not as much of a concern
given current block sizes. The other is that clients need to download
and/or store the filter header chain for each filter type, which are 32
bytes each per block. So if a client is expected to download all three
filter types anyway, or even two of three, it's less efficient in these
terms. It would be possible though to split the filters themselves, but
still have the basic filter header cover all three filters. This would mean
that full nodes could not support just a subset of the basic filters --
they'd have to compute all of them to compute the filter header.

[-- Attachment #2: Type: text/html, Size: 2847 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 20:19       ` Jim Posen
@ 2018-05-17 20:45         ` Gregory Maxwell
  2018-05-17 21:27           ` Jim Posen
  0 siblings, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-17 20:45 UTC (permalink / raw)
  To: Jim Posen; +Cc: Bitcoin Protocol Discussion

On Thu, May 17, 2018 at 8:19 PM, Jim Posen <jim.posen@gmail.com> wrote:
> In my opinion, it's overly pessimistic to design the protocol in an insecure
> way because some light clients historically have taken shortcuts.

Any non-commited form is inherently insecure.  A nearby network
attacker (or eclipse attacker) or whatnot can moot whatever kind of
comparisons you make, and non-comparison based validation doesn't seem
like it would be useful without mooting all the bandwidth improvements
unless I'm missing something.

It isn't a question of 'some lite clients' -- I am aware of no
implementation of these kinds of measures in any cryptocurrency ever.

The same kind of comparison to the block could have been done with
BIP37 filtering, but no one has implemented that. (similarly, the
whitepaper suggests doing that for all network rules when a
disagreement has been seen, though that isn't practical for all
network rules it could be done for many of them-- but again no
implementation or AFAIK any interest in implementing that)

> If the
> protocol can provide clients the option of getting additional security, it
> should.

Sure, but at what cost?   And "additional" while nice doesn't
necessarily translate into a meaningful increase in delivered security
for any particular application.

I think we might be speaking too generally here.

What I'm suggesting would still allow a lite client to verify that
multiple parties are offering the same map for a given block (by
asking them for the map hash). It would still allow a future
commitment so that lite client could verify that the hashpower they're
hearing from agrees that the map they got is the correct corresponding
map for the block. It would still allow downloading a block and
verifying that all the outpoints in the block were included.  So still
a lot better than BIP37.

What it would not permit is for a lite client to download a whole
block and completely verify the filter (they could only tell if the
filter at least told them about all the outputs in the block, but if
extra bits were set or inputs were omitted, they couldn't tell).

But in exchange the filters for a given FP rate would be probably
about half the current size (actual measurements would be needed
because the figure depends on much scriptpubkey reuse there is, it
probably could be anywhere between 1/3 and 2/3rd).  In some
applications it would likely have better anonymity properties as well,
because a client that always filters for both an output and and input
as distinct items (and then leaks matches by fetching blocks) is more
distinguishable.

I think this trade-off is at leat worth considering because if you
always verify by downloading you wash out the bandwidth gains, strong
verification will eventually need a commitment in any case.  A client
can still partially verify, and can still multi-party comparison
verify.  ... and a big reduction in filter bandwidth

Monitoring inputs by scriptPubkey vs input-txid also has a massive
advantage for parallel filtering:  You can usually known your pubkeys
well in advance, but if you have to change what you're watching block
 N+1 for based on the txids that paid you in N you can't filter them
in parallel.

> On the general topic, Peter makes a good point that in many cases filtering
> by txid of spending transaction may be preferable to filtering by outpoint
> spend, which has the nice benefit that there are obviously fewer txs in a
> block than txins. This wouldn't work for malleable transactions though.

I think Peter missed Matt's point that you can monitor for a specific
transaction's confirmation by monitoring for any of the outpoints that
transaction contains. Because the txid commits to the outpoints there
shouldn't be any case where the txid is knowable but (an) outpoint is
not.  Removal of the txid and monitoring for any one of the outputs
should be a strict reduction in the false positive rate for a given
filter size (the filter will contain strictly fewer elements and the
client will match for the same (or usually, fewer) number).

I _think_ dropping txids as matt suggests is an obvious win that costs
nothing.  Replacing inputs with scripts as I suggested has some
trade-offs.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 20:45         ` Gregory Maxwell
@ 2018-05-17 21:27           ` Jim Posen
  2018-05-19  3:12             ` Olaoluwa Osuntokun
  0 siblings, 1 reply; 60+ messages in thread
From: Jim Posen @ 2018-05-17 21:27 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5188 bytes --]

>
> It isn't a question of 'some lite clients' -- I am aware of no
> implementation of these kinds of measures in any cryptocurrency ever.
>

Doesn't mean there can't or shouldn't be a first. :-)


> The same kind of comparison to the block could have been done with
> BIP37 filtering, but no one has implemented that. (similarly, the
> whitepaper suggests doing that for all network rules when a
> disagreement has been seen, though that isn't practical for all
> network rules it could be done for many of them-- but again no
> implementation or AFAIK any interest in implementing that)


Correct me if I'm wrong, but I don't think it's true that the same could be
done for BIP 37. With BIP 37, one would have to download every partial
block from every peer to determine if there is a difference between them.
With BIP 157, you only download a 32 byte filter header from every peer
(because filters are deterministic), and using that commitment can
determine whether there's a conflict requiring further interrogation. The
difference in overhead makes checking for conflicts with BIP 157 practical,
whereas it's not as practical with BIP 37.


> Sure, but at what cost?   And "additional" while nice doesn't
> necessarily translate into a meaningful increase in delivered security
> for any particular application.
>
> I think we might be speaking too generally here.
>

Sure. The security model that BIP 157 now allows is that a light client with*
at least one honest peer serving filters* can get the correct information
about the chain. No, this does not prevent against total eclipse attacks,
but I think it's a much stronger security guarantee than requiring all
peers or even a majority of peers to be honest. In a decentralized network
that stores money, I think there's a big difference between those security
models.


> But in exchange the filters for a given FP rate would be probably
> about half the current size (actual measurements would be needed
> because the figure depends on much scriptpubkey reuse there is, it
> probably could be anywhere between 1/3 and 2/3rd).
>

This does not seem right. Let's assume txids are removed because they are
not relevant to this particular point. The difference as I understand it is
whether to include in the filter serialized outpoints for inputs or
serialized prev scriptPubkeys for inputs. When hashed these are the same
size, and there's an equal number of them (one per input in a block). So
the only savings comes from deduping the prev scriptPubkeys with each other
and with the scriptPubkeys in the block's outputs. So it comes down
entirely to how much address reuse there is on the chain.


> Monitoring inputs by scriptPubkey vs input-txid also has a massive
> advantage for parallel filtering:  You can usually known your pubkeys
> well in advance, but if you have to change what you're watching block
>  N+1 for based on the txids that paid you in N you can't filter them
> in parallel.
>

Yes, I'll grant that this is a benefit of your suggestion.


> I think Peter missed Matt's point that you can monitor for a specific
> transaction's confirmation by monitoring for any of the outpoints that
> transaction contains. Because the txid commits to the outpoints there
> shouldn't be any case where the txid is knowable but (an) outpoint is
> not.  Removal of the txid and monitoring for any one of the outputs
> should be a strict reduction in the false positive rate for a given
> filter size (the filter will contain strictly fewer elements and the
> client will match for the same (or usually, fewer) number).
>
> I _think_ dropping txids as matt suggests is an obvious win that costs
> nothing.  Replacing inputs with scripts as I suggested has some
> trade-offs.
>

I may have interpreted this differently. So wallets need a way to know when
the transactions they send get confirmed (for obvious usability reasons and
so for automatic fee-bumping). One way is to match the spent outpoints
against the filter, which I think of as the standard. Another would be to
match the txid of the spending transaction against the first, which only
works if the transaction is not malleable. Another would be to match the
change output script against the first, assuming the wallet does not reuse
change addresses and that the spending transaction does in fact have a
change output.

Now lets say these pieces of data, txids, output scripts, and spent
outpoints are in three separate filters that a wallet can download
separately or choose not to download. The spent outpoint method is the most
reliable and has no caviats. It also allows for theft detection as Peter
notes, which is a very nice property indeed. If the wallet uses the txid
matching though, the txid filter would be smaller because there are fewer
txids per block than inputs. So there could be some bandwidth savings to
that approach. The change output watching is probably the nicest in some
ways because the client needs the output filter anyway. If the transaction
has no change output with a unique script, the client could watch for any
of the other outputs on the spending tx, but may get more false positives
depending on the degree of address reuse.

[-- Attachment #2: Type: text/html, Size: 10705 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 15:25 [bitcoin-dev] BIP 158 Flexibility and Filter Size Matt Corallo
  2018-05-17 15:43 ` Peter Todd
  2018-05-17 16:36 ` Gregory Maxwell
@ 2018-05-18  6:28 ` Karl-Johan Alm
  2018-06-04  8:42   ` Riccardo Casatta
  2018-05-19  2:51 ` Olaoluwa Osuntokun
  3 siblings, 1 reply; 60+ messages in thread
From: Karl-Johan Alm @ 2018-05-18  6:28 UTC (permalink / raw)
  To: Matt Corallo, Bitcoin Protocol Discussion

On Fri, May 18, 2018 at 12:25 AM, Matt Corallo via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> In general, I'm concerned about the size of the filters making existing
> SPV clients less willing to adopt BIP 158 instead of the existing bloom
> filter garbage and would like to see a further exploration of ways to
> split out filters to make them less bandwidth intensive. Some further
> ideas we should probably play with before finalizing moving forward is
> providing filters for certain script templates, eg being able to only
> get outputs that are segwit version X or other similar ideas.

There is also the idea of multi-block filters. The idea is that light
clients would download a pair of filters for blocks X..X+255 and
X+256..X+511, check if they have any matches and then grab pairs for
any that matched, e.g. X..X+127 & X+128..X+255 if left matched, and
iterate down until it ran out of hits-in-a-row or it got down to
single-block level.

This has an added benefit where you can accept a slightly higher false
positive rate for bigger ranges, because the probability of a specific
entry having a false positive in each filter is (empirically speaking)
independent. I.e. with a FP probability of 1% in the 256 range block
and a FP probability of 0.1% in the 128 range block would mean the
probability is actually 0.001%.

Wrote about this here: https://bc-2.jp/bfd-profile.pdf (but the filter
type is different in my experiments)


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 16:36 ` Gregory Maxwell
  2018-05-17 16:59   ` Matt Corallo
@ 2018-05-18  8:46   ` Riccardo Casatta
  2018-05-19  3:08     ` Olaoluwa Osuntokun
  2018-05-19  2:57   ` Olaoluwa Osuntokun
  2 siblings, 1 reply; 60+ messages in thread
From: Riccardo Casatta @ 2018-05-18  8:46 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

Another parameter which heavily affects filter size is the false positive
rate which is empirically set
<https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki#construction>
to 2^-20
The BIP recall some go code
<https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go>
for how the parameter has been selected which I can hardly understand and
run, it's totally my fault but if possible I would really like more details
on the process, like charts and explanations (for example, which is the
number of elements to search for which the filter has been optimized for?)

Instinctively I feel 2^-20 is super low and choosing a lot higher alpha
will shrink the total filter size by gigabytes at the cost of having to
wastefully download just some megabytes of blocks.


2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org>:

> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > I believe (1) could be skipped entirely - there is almost no reason why
> > you'd not be able to filter for, eg, the set of output scripts in a
> > transaction you know about
>
> I think this is convincing for the txids themselves.
>
> What about also making input prevouts filter based on the scriptpubkey
> being _spent_?  Layering wise in the processing it's a bit ugly, but
> if you validated the block you have the data needed.
>
> This would eliminate the multiple data type mixing entirely.
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>



-- 
Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>

[-- Attachment #2: Type: text/html, Size: 2735 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 15:25 [bitcoin-dev] BIP 158 Flexibility and Filter Size Matt Corallo
                   ` (2 preceding siblings ...)
  2018-05-18  6:28 ` Karl-Johan Alm
@ 2018-05-19  2:51 ` Olaoluwa Osuntokun
  3 siblings, 0 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-19  2:51 UTC (permalink / raw)
  To: Matt Corallo, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5888 bytes --]

Matt wrote:
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about

Depending on the use-case, the txid is more precise than searching for the
output script as it doesn't need to deal with duplicated output scripts. To
my knowledge, lnd is the only major project that currently utilizes BIP
157+158. At this point, we use the txid in the regular filter for
confirmations (channel confirmed, sweep tx confirmed, cltv confirmed, etc).
Switching to use output scripts instead wouldn't be _too_ invasive w.r.t
changes required in the codebase, only the need to deal with output script
duplication could be annoying.

> (2) and (3) may want to be split out - many wallets may wish to just find
> transactions paying to them, as transactions spending from their outputs
> should generally be things they've created.

FWIW, in the "rescan after importing by seed phrase" both are needed in
order to ensure the wallet ends up with the proper output set after the
scan. In lnd we actively use both (2) to detect deposits to the internal
wallet, and (3) to be notified when our channel outputs are spent on-chain
(and also generally when any of our special scripts are spent).

> In general, I'm concerned about the size of the filters making existing
SPV
> clients less willing to adopt BIP 158 instead of the existing bloom filter
> garbage and would like to see a further exploration of ways to split out
> filters to make them less bandwidth intensive.

Agreed that the current filter size may prevent adoption amongst wallets.
However, the other factor that will likely prevent adoption amongst current
BIP-37 mobile wallets is the lack of support for notifying _unconfirmed_
transactions. When we drafted up the protocol last year and asked around,
this was one of the major points of contention amongst existing mobile
wallets that utilize BIP 37.

On the other hand, the two "popular" BIP 37 wallets I'm aware of
(Breadwallet, and Andreas Schildbach's Bitcoin Wallet) have lagged massively
behind the existing set of wallet related protocol upgrades. For example,
neither of them have released versions of their applications that take
advantage of segwit in any manner. Breadwallet has more or less "pivoted"
(they did an ICO and have a token) and instead is prioritizing things like
adding random ICO tokens over catching up with the latest protocol updates.
Based on this behavior, even if the filter sizes were even _more_ bandwidth
efficient that BIP 37, I don't think they'd adopt the protocol.

> Some further ideas we should probably play with before finalizing moving
> forward is providing filters for certain script templates, eg being able
to
> only get outputs that are segwit version X or other similar ideas.

Why should this block active deployment of BIP 157+158 as is now? As
defined, the protocol already allows future updates to add additional filter
types. Before the filters are committed, each filter type requires a new
filter header. We could move to a single filter header that commits to the
hashes of _all_ filters, but that would mean that a node couldn't serve the
headers unless they had all currently defined features, defeating the
optionality offered.

Additionally, more filters entails more disk utilization for nodes serving
these filters. Nodes have the option to instead create the filters at "query
time", but then this counters the benefit of simply slinging the filters
from disk (or a memory map or w/e). IMO, it's a desirable feature that
serving light clients no longer requires active CPU+I/O and instead just
passive I/O (nodes could even write the filters to disk in protocol msg
format).

To get a feel for the current filter sizes, a txid-only filter size, and a
regular filter w/o txid's, I ran some stats on the last 10k blocks:

regular size:    217107653  bytes
regular avg:     21710.7653 bytes
regular median:  22332      bytes
regular max:     61901      bytes

txid-only size:    34518463  bytes
txid-only avg:     3451.8463 bytes
txid-only median:  3258      bytes
txid-only max:     10193     bytes

reg-no-txid size:    182663961  bytes
reg-no-txid avg:     18266.3961 bytes
reg-no-txid median:  19198      bytes
reg-no-txid max:     60172      bytes

So the median regular filter size over the past 10k blocks is 20KB. If we
extract the txid from the regular filter and add a txid-only filter, the
median size of that is 3.2KB. Finally, the median size of a modified regular
filter (no txid) is 19KB.

-- Laolu


On Thu, May 17, 2018 at 8:33 AM Matt Corallo via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> BIP 158 currently includes the following in the "basic" filter: 1)
> txids, 2) output scripts, 3) input prevouts.
>
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about and (2) and (3) may want to be split out -
> many wallets may wish to just find transactions paying to them, as
> transactions spending from their outputs should generally be things
> they've created.
>
> In general, I'm concerned about the size of the filters making existing
> SPV clients less willing to adopt BIP 158 instead of the existing bloom
> filter garbage and would like to see a further exploration of ways to
> split out filters to make them less bandwidth intensive. Some further
> ideas we should probably play with before finalizing moving forward is
> providing filters for certain script templates, eg being able to only
> get outputs that are segwit version X or other similar ideas.
>
> Matt
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 7573 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 16:36 ` Gregory Maxwell
  2018-05-17 16:59   ` Matt Corallo
  2018-05-18  8:46   ` Riccardo Casatta
@ 2018-05-19  2:57   ` Olaoluwa Osuntokun
  2018-05-19  3:06     ` Pieter Wuille
  2 siblings, 1 reply; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-19  2:57 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

Greg wrote:
> What about also making input prevouts filter based on the scriptpubkey
being
> _spent_?  Layering wise in the processing it's a bit ugly, but if you
> validated the block you have the data needed.

AFAICT, this would mean that in order for a new node to catch up the filter
index (index all historical blocks), they'd either need to: build up a
utxo-set in memory during indexing, or would require a txindex in order to
look up the prev out's script. The first option increases the memory load
during indexing, and the second requires nodes to have a transaction index
(and would also add considerable I/O load). When proceeding from tip, this
doesn't add any additional load assuming that your synchronously index the
block as you validate it, otherwise the utxo set will already have been
updated (the spent scripts removed).

I have a script running to compare the filter sizes assuming the regular
filter switches to include the prev out's script rather than the prev
outpoint itself. The script hasn't yet finished (due to the increased I/O
load to look up the scripts when indexing), but I'll report back once it's
finished.

-- Laolu


On Thu, May 17, 2018 at 9:37 AM Gregory Maxwell via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > I believe (1) could be skipped entirely - there is almost no reason why
> > you'd not be able to filter for, eg, the set of output scripts in a
> > transaction you know about
>
> I think this is convincing for the txids themselves.
>
> What about also making input prevouts filter based on the scriptpubkey
> being _spent_?  Layering wise in the processing it's a bit ugly, but
> if you validated the block you have the data needed.
>
> This would eliminate the multiple data type mixing entirely.
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 2952 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-19  2:57   ` Olaoluwa Osuntokun
@ 2018-05-19  3:06     ` Pieter Wuille
  2018-05-22  1:15       ` Olaoluwa Osuntokun
  0 siblings, 1 reply; 60+ messages in thread
From: Pieter Wuille @ 2018-05-19  3:06 UTC (permalink / raw)
  To: Olaoluwa Osuntokun, Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Greg wrote:
> > What about also making input prevouts filter based on the scriptpubkey
> being
> > _spent_?  Layering wise in the processing it's a bit ugly, but if you
> > validated the block you have the data needed.
>
> AFAICT, this would mean that in order for a new node to catch up the filter
> index (index all historical blocks), they'd either need to: build up a
> utxo-set in memory during indexing, or would require a txindex in order to
> look up the prev out's script. The first option increases the memory load
> during indexing, and the second requires nodes to have a transaction index
> (and would also add considerable I/O load). When proceeding from tip, this
> doesn't add any additional load assuming that your synchronously index the
> block as you validate it, otherwise the utxo set will already have been
> updated (the spent scripts removed).
>

I was wondering about that too, but it turns out that isn't necessary. At
least in Bitcoin Core, all the data needed for such a filter is in the
block + undo files (the latter contain the scriptPubKeys of the outputs
being spent).

I have a script running to compare the filter sizes assuming the regular
> filter switches to include the prev out's script rather than the prev
> outpoint itself. The script hasn't yet finished (due to the increased I/O
> load to look up the scripts when indexing), but I'll report back once it's
> finished.
>

That's very helpful, thank you.

Cheers,

-- 
Pieter

[-- Attachment #2: Type: text/html, Size: 2692 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-18  8:46   ` Riccardo Casatta
@ 2018-05-19  3:08     ` Olaoluwa Osuntokun
  0 siblings, 0 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-19  3:08 UTC (permalink / raw)
  To: Riccardo Casatta, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3509 bytes --]

Riccardo wrote:
> The BIP recall some go code for how the parameter has been selected which
> I can hardly understand and run

The code you're linking to is for generating test vectors (to allow
implementations to check the correctness of their gcs filters. The name of
the file is 'gentestvectors.go'. It produces CSV files which contain test
vectors of various testnet blocks and at various false positive rates.

> it's totally my fault but if possible I would really like more details on
> the process, like charts and explanations

When we published the BIP draft last year (wow, time flies!), we put up code
(as well as an interactive website) showing the process we used to arrive at
the current false positive rate. The aim was to minimize the bandwidth
required to download each filter plus the expected bandwidth from
downloading "large-ish" full segwit blocks. The code simulated a few wallet
types (in terms of number of addrs, etc) focusing on a "mid-sized" wallet.
One could also model the selection as a Bernoulli process where we attempt
to compute the probability that after k queries (let's say you have k
addresses) we have k "successes". A success would mean the queries item
wasn't found in the filter, while a failure is a filter match (false
positive or not). A failure in the process requires fetching the entire
block.

-- Laolu

On Fri, May 18, 2018 at 5:35 AM Riccardo Casatta via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Another parameter which heavily affects filter size is the false positive
> rate which is empirically set
> <https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki#construction>
> to 2^-20
> The BIP recall some go code
> <https://github.com/Roasbeef/bips/blob/83b83c78e189be898573e0bfe936dd0c9b99ecb9/gcs_light_client/gentestvectors.go>
> for how the parameter has been selected which I can hardly understand and
> run, it's totally my fault but if possible I would really like more details
> on the process, like charts and explanations (for example, which is the
> number of elements to search for which the filter has been optimized for?)
>
> Instinctively I feel 2^-20 is super low and choosing a lot higher alpha
> will shrink the total filter size by gigabytes at the cost of having to
> wastefully download just some megabytes of blocks.
>
>
> 2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org>:
>
>> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> > I believe (1) could be skipped entirely - there is almost no reason why
>> > you'd not be able to filter for, eg, the set of output scripts in a
>> > transaction you know about
>>
>> I think this is convincing for the txids themselves.
>>
>> What about also making input prevouts filter based on the scriptpubkey
>> being _spent_?  Layering wise in the processing it's a bit ugly, but
>> if you validated the block you have the data needed.
>>
>> This would eliminate the multiple data type mixing entirely.
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>
>
>
> --
> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 5342 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-17 21:27           ` Jim Posen
@ 2018-05-19  3:12             ` Olaoluwa Osuntokun
  2018-05-21  8:35               ` Johan Torås Halseth
  0 siblings, 1 reply; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-19  3:12 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

On Thu, May 17, 2018 at 2:44 PM Jim Posen via bitcoin-dev <bitcoin-

> Monitoring inputs by scriptPubkey vs input-txid also has a massive
>> advantage for parallel filtering:  You can usually known your pubkeys
>> well in advance, but if you have to change what you're watching block
>>  N+1 for based on the txids that paid you in N you can't filter them
>> in parallel.
>>
>
> Yes, I'll grant that this is a benefit of your suggestion.
>

Yeah parallel filtering would be pretty nice. We've implemented a serial
filtering for btcwallet [1] for the use-case of rescanning after a seed
phrase import. Parallel filtering would help here, but also we don't yet
take advantage of batch querying for the filters themselves. This would
speed up the scanning by quite a bit.

I really like the filtering model though, it really simplifies the code,
and we can leverage identical logic for btcd (which has RPCs to fetch the
filters) as well.

[1]:
https://github.com/Roasbeef/btcwallet/blob/master/chain/neutrino.go#L180

[-- Attachment #2: Type: text/html, Size: 1701 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-19  3:12             ` Olaoluwa Osuntokun
@ 2018-05-21  8:35               ` Johan Torås Halseth
  2018-05-22  1:16                 ` Olaoluwa Osuntokun
  0 siblings, 1 reply; 60+ messages in thread
From: Johan Torås Halseth @ 2018-05-21  8:35 UTC (permalink / raw)
  To: Olaoluwa Osuntokun; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3249 bytes --]

Hi all,
Most light wallets will want to download the minimum amount of data required to operate, which means they would ideally download the smallest possible filters containing the subset of elements they need.
What if instead of trying to decide up front which subset of elements will be most useful to include in the filters, and the size tradeoff, we let the full-node decide which subsets of elements it serves filters for?

For instance, a full node would advertise that it could serve filters for the subsets 110 (txid+script+outpoint), 100 (txid only), 011 ( script+outpoint) etc. A light client could then choose to download the minimal filter type covering its needs.
The obvious benefit of this would be minimal bandwidth usage for the light client, but there are also some less obvious ones. We wouldn’t have to decide up front what each filter type should contain, only the possible elements a filter can contain (more can be added later without breaking existing clients). This, I think, would let the most served filter types grow organically, with full-node implementations coming with sane defaults for served filter types (maybe even all possible types as long as the number of elements is small), letting their operator add/remove types at will.
The main disadvantage of this as I see it, is that there’s an exponential blowup in the number of possible filter types in the number of element types. However, this would let us start out small with only the elements we need, and in the worst case the node operators just choose to serve the subsets corresponding to what now is called “regular” + “extended” filters anyway, requiring no more resources.
This would also give us some data on what is the most widely used filter types, which could be useful in making the decision on what should be part of filters to eventually commit to in blocks.
- Johan On Sat, May 19, 2018 at 5:12, Olaoluwa Osuntokun via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
On Thu, May 17, 2018 at 2:44 PM Jim Posen via bitcoin-dev <bitcoin- Monitoring inputs by scriptPubkey vs input-txid also has a massive
advantage for parallel filtering: You can usually known your pubkeys
well in advance, but if you have to change what you're watching block
N+1 for based on the txids that paid you in N you can't filter them
in parallel.

Yes, I'll grant that this is a benefit of your suggestion.
Yeah parallel filtering would be pretty nice. We've implemented a serial filtering for btcwallet [1] for the use-case of rescanning after a seed phrase import. Parallel filtering would help here, but also we don't yet take advantage of batch querying for the filters themselves. This would speed up the scanning by quite a bit.
I really like the filtering model though, it really simplifies the code, and we can leverage identical logic for btcd (which has RPCs to fetch the filters) as well.
[1]: https://github.com/Roasbeef/btcwallet/blob/master/chain/neutrino.go#L180 [https://github.com/Roasbeef/btcwallet/blob/master/chain/neutrino.go#L180]
_______________________________________________ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

[-- Attachment #2: Type: text/html, Size: 4734 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-19  3:06     ` Pieter Wuille
@ 2018-05-22  1:15       ` Olaoluwa Osuntokun
  0 siblings, 0 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-22  1:15 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 4101 bytes --]

Hi Y'all,

The script finished a few days ago with the following results:

reg-filter-prev-script total size:  161236078  bytes
reg-filter-prev-script avg:         16123.6078 bytes
reg-filter-prev-script median:      16584      bytes
reg-filter-prev-script max:         59480      bytes

Compared to the original median size of the same block range, but with the
current filter (has both txid, prev outpoint, output scripts), we see a
roughly 34% reduction in filter size (current median is 22258 bytes).
Compared to the suggested modified filter (no txid, prev outpoint, output
scripts), we see a 15% reduction in size (median of that was 19198 bytes).
This shows that script re-use is still pretty prevalent in the chain as of
recent.

One thing that occurred to me, is that on the application level, switching
to the input prev output script can make things a bit awkward. Observe that
when looking for matches in the filter, upon a match, one would need access
to an additional (outpoint -> script) map in order to locate _which_
particular transaction matched w/o access to an up-to-date UTOX set. In
contrast, as is atm, one can locate the matching transaction with no
additional information (as we're matching on the outpoint).

At this point, if we feel filter sizes need to drop further, then we may
need to consider raising the false positive rate.

Does anyone have any estimates or direct measures w.r.t how much bandwidth
current BIP 37 light clients consume? It would be nice to have a direct
comparison. We'd need to consider the size of their base bloom filter, the
accumulated bandwidth as a result of repeated filterload commands (to adjust
the fp rate), and also the overhead of receiving the merkle branch and
transactions in distinct messages (both due to matches and false positives).

Finally, I'd be open to removing the current "extended" filter from the BIP
as is all together for now. If a compelling use case for being able to
filter the sigScript/witness arises, then we can examine re-adding it with a
distinct service bit. After all it would be harder to phase out the filter
once wider deployment was already reached. Similarly, if the 16% savings
achieved by removing the txid is attractive, then we can create an
additional
filter just for the txids to allow those applications which need the
information to seek out that extra filter.

-- Laolu


On Fri, May 18, 2018 at 8:06 PM Pieter Wuille <pieter.wuille@gmail.com>
wrote:

> On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> Greg wrote:
>> > What about also making input prevouts filter based on the scriptpubkey
>> being
>> > _spent_?  Layering wise in the processing it's a bit ugly, but if you
>> > validated the block you have the data needed.
>>
>> AFAICT, this would mean that in order for a new node to catch up the
>> filter
>> index (index all historical blocks), they'd either need to: build up a
>> utxo-set in memory during indexing, or would require a txindex in order to
>> look up the prev out's script. The first option increases the memory load
>> during indexing, and the second requires nodes to have a transaction index
>> (and would also add considerable I/O load). When proceeding from tip, this
>> doesn't add any additional load assuming that your synchronously index the
>> block as you validate it, otherwise the utxo set will already have been
>> updated (the spent scripts removed).
>>
>
> I was wondering about that too, but it turns out that isn't necessary. At
> least in Bitcoin Core, all the data needed for such a filter is in the
> block + undo files (the latter contain the scriptPubKeys of the outputs
> being spent).
>
> I have a script running to compare the filter sizes assuming the regular
>> filter switches to include the prev out's script rather than the prev
>> outpoint itself. The script hasn't yet finished (due to the increased I/O
>> load to look up the scripts when indexing), but I'll report back once it's
>> finished.
>>
>
> That's very helpful, thank you.
>
> Cheers,
>
> --
> Pieter
>
>

[-- Attachment #2: Type: text/html, Size: 6050 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-21  8:35               ` Johan Torås Halseth
@ 2018-05-22  1:16                 ` Olaoluwa Osuntokun
  2018-05-22  9:23                   ` Johan Torås Halseth
  0 siblings, 1 reply; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-22  1:16 UTC (permalink / raw)
  To: Johan Torås Halseth; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3923 bytes --]

> What if instead of trying to decide up front which subset of elements will
> be most useful to include in the filters, and the size tradeoff, we let
the
> full-node decide which subsets of elements it serves filters for?

This is already the case. The current "track" is to add new service bits
(while we're in the uncommitted phase) to introduce new fitler types. Light
clients can then filter out nodes before even connecting to them.

-- Laolu

On Mon, May 21, 2018 at 1:35 AM Johan Torås Halseth <johanth@gmail.com>
wrote:

> Hi all,
>
> Most light wallets will want to download the minimum amount of data
> required to operate, which means they would ideally download the smallest
> possible filters containing the subset of elements they need.
>
> What if instead of trying to decide up front which subset of elements will
> be most useful to include in the filters, and the size tradeoff, we let the
> full-node decide which subsets of elements it serves filters for?
>
> For instance, a full node would advertise that it could serve filters for
> the subsets 110 (txid+script+outpoint), 100 (txid only), 011 (script+outpoint)
> etc. A light client could then choose to download the minimal filter type
> covering its needs.
>
> The obvious benefit of this would be minimal bandwidth usage for the light
> client, but there are also some less obvious ones. We wouldn’t have to
> decide up front what each filter type should contain, only the possible
> elements a filter can contain (more can be added later without breaking
> existing clients). This, I think, would let the most served filter types
> grow organically, with full-node implementations coming with sane defaults
> for served filter types (maybe even all possible types as long as the
> number of elements is small), letting their operator add/remove types at
> will.
>
> The main disadvantage of this as I see it, is that there’s an exponential
> blowup in the number of possible filter types in the number of element
> types. However, this would let us start out small with only the elements we
> need, and in the worst case the node operators just choose to serve the
> subsets corresponding to what now is called “regular” + “extended” filters
> anyway, requiring no more resources.
>
> This would also give us some data on what is the most widely used filter
> types, which could be useful in making the decision on what should be part
> of filters to eventually commit to in blocks.
>
> - Johan
> On Sat, May 19, 2018 at 5:12, Olaoluwa Osuntokun via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
> On Thu, May 17, 2018 at 2:44 PM Jim Posen via bitcoin-dev <bitcoin-
>
>> Monitoring inputs by scriptPubkey vs input-txid also has a massive
>>> advantage for parallel filtering: You can usually known your pubkeys
>>> well in advance, but if you have to change what you're watching block
>>> N+1 for based on the txids that paid you in N you can't filter them
>>> in parallel.
>>>
>>
>> Yes, I'll grant that this is a benefit of your suggestion.
>>
>
> Yeah parallel filtering would be pretty nice. We've implemented a serial
> filtering for btcwallet [1] for the use-case of rescanning after a seed
> phrase import. Parallel filtering would help here, but also we don't yet
> take advantage of batch querying for the filters themselves. This would
> speed up the scanning by quite a bit.
>
> I really like the filtering model though, it really simplifies the code,
> and we can leverage identical logic for btcd (which has RPCs to fetch the
> filters) as well.
>
> [1]:
> https://github.com/Roasbeef/btcwallet/blob/master/chain/neutrino.go#L180
>
> _______________________________________________ bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 6573 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-22  1:16                 ` Olaoluwa Osuntokun
@ 2018-05-22  9:23                   ` Johan Torås Halseth
  2018-05-23  0:42                     ` Jim Posen
  0 siblings, 1 reply; 60+ messages in thread
From: Johan Torås Halseth @ 2018-05-22  9:23 UTC (permalink / raw)
  To: Olaoluwa Osuntokun; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]

Maybe I didn't make it clear, but the distinction is that the current track
allocates
one service bit for each "filter type", where it has to be agreed upon up
front what
elements such a filter type contains.

My suggestion was to advertise a bitfield for each filter type the node
serves,
where the bitfield indicates what elements are part of the filters. This
essentially
removes the notion of decided filter types and instead leaves the decision
to
full-nodes.

This would require a "getcftypes" message, of course.

- Johan


On Tue, May 22, 2018 at 3:16 AM, Olaoluwa Osuntokun <laolu32@gmail.com>
wrote:

> > What if instead of trying to decide up front which subset of elements
> will
> > be most useful to include in the filters, and the size tradeoff, we let
> the
> > full-node decide which subsets of elements it serves filters for?
>
> This is already the case. The current "track" is to add new service bits
> (while we're in the uncommitted phase) to introduce new fitler types. Light
> clients can then filter out nodes before even connecting to them.
>
> -- Laolu
>
> On Mon, May 21, 2018 at 1:35 AM Johan Torås Halseth <johanth@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Most light wallets will want to download the minimum amount of data
>> required to operate, which means they would ideally download the smallest
>> possible filters containing the subset of elements they need.
>>
>> What if instead of trying to decide up front which subset of elements
>> will be most useful to include in the filters, and the size tradeoff, we
>> let the full-node decide which subsets of elements it serves filters for?
>>
>> For instance, a full node would advertise that it could serve filters for
>> the subsets 110 (txid+script+outpoint), 100 (txid only), 011 (script+outpoint)
>> etc. A light client could then choose to download the minimal filter type
>> covering its needs.
>>
>> The obvious benefit of this would be minimal bandwidth usage for the
>> light client, but there are also some less obvious ones. We wouldn’t have
>> to decide up front what each filter type should contain, only the possible
>> elements a filter can contain (more can be added later without breaking
>> existing clients). This, I think, would let the most served filter types
>> grow organically, with full-node implementations coming with sane defaults
>> for served filter types (maybe even all possible types as long as the
>> number of elements is small), letting their operator add/remove types at
>> will.
>>
>> The main disadvantage of this as I see it, is that there’s an exponential
>> blowup in the number of possible filter types in the number of element
>> types. However, this would let us start out small with only the elements we
>> need, and in the worst case the node operators just choose to serve the
>> subsets corresponding to what now is called “regular” + “extended” filters
>> anyway, requiring no more resources.
>>
>> This would also give us some data on what is the most widely used filter
>> types, which could be useful in making the decision on what should be part
>> of filters to eventually commit to in blocks.
>>
>> - Johan
>> On Sat, May 19, 2018 at 5:12, Olaoluwa Osuntokun via bitcoin-dev <
>> bitcoin-dev@lists.linuxfoundation.org> wrote:
>>
>> On Thu, May 17, 2018 at 2:44 PM Jim Posen via bitcoin-dev <bitcoin-
>>
>>> Monitoring inputs by scriptPubkey vs input-txid also has a massive
>>>> advantage for parallel filtering: You can usually known your pubkeys
>>>> well in advance, but if you have to change what you're watching block
>>>> N+1 for based on the txids that paid you in N you can't filter them
>>>> in parallel.
>>>>
>>>
>>> Yes, I'll grant that this is a benefit of your suggestion.
>>>
>>
>> Yeah parallel filtering would be pretty nice. We've implemented a serial
>> filtering for btcwallet [1] for the use-case of rescanning after a seed
>> phrase import. Parallel filtering would help here, but also we don't yet
>> take advantage of batch querying for the filters themselves. This would
>> speed up the scanning by quite a bit.
>>
>> I really like the filtering model though, it really simplifies the code,
>> and we can leverage identical logic for btcd (which has RPCs to fetch the
>> filters) as well.
>>
>> [1]: https://github.com/Roasbeef/btcwallet/blob/master/chain/
>> neutrino.go#L180
>>
>> _______________________________________________ bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.
>> org/mailman/listinfo/bitcoin-dev
>>
>>

[-- Attachment #2: Type: text/html, Size: 7952 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-22  9:23                   ` Johan Torås Halseth
@ 2018-05-23  0:42                     ` Jim Posen
  2018-05-23  7:38                       ` Jim Posen
  0 siblings, 1 reply; 60+ messages in thread
From: Jim Posen @ 2018-05-23  0:42 UTC (permalink / raw)
  To: Johan Torås Halseth, Matt Corallo; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1322 bytes --]

>
> My suggestion was to advertise a bitfield for each filter type the node
> serves,
> where the bitfield indicates what elements are part of the filters. This
> essentially
> removes the notion of decided filter types and instead leaves the decision
> to
> full-nodes.
>

I think it makes more sense to construct entirely separate filters for the
different types of elements and allow clients to download only the ones
they care about. If there are enough elements per filter, the compression
ratio shouldn't be much worse by splitting them up. This prevents the
exponential blowup in the number of filters that you mention, Johan, and it
works nicely with service bits for advertising different filter types
independently.

So if we created three separate filter types, one for output scripts, one
for input outpoints, and one for TXIDs, each signaled with a separate
service bit, are people good with that? Or do you think there shouldn't be
a TXID filter at all, Matt? I didn't include the option of a prev output
script filter or rolling that into the block output script filter because
it changes the security model (cannot be proven to be correct/incorrect
succinctly).

Then there's the question of whether to separate or combine the headers.
I'd lean towards keeping them separate because it's simpler that way.

[-- Attachment #2: Type: text/html, Size: 1671 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-23  0:42                     ` Jim Posen
@ 2018-05-23  7:38                       ` Jim Posen
  2018-05-23  8:16                         ` Johan Torås Halseth
  2018-05-23 17:28                         ` Gregory Maxwell
  0 siblings, 2 replies; 60+ messages in thread
From: Jim Posen @ 2018-05-23  7:38 UTC (permalink / raw)
  To: Johan Torås Halseth, Matt Corallo; +Cc: Bitcoin Protocol Discussion


[-- Attachment #1.1: Type: text/plain, Size: 2351 bytes --]

So I checked filter sizes (as a proportion of block size) for each of the
sub-filters. The graph is attached.

As interpretation, the first ~120,000 blocks are so small that the
Golomb-Rice coding can't compress the filters that well, which is why the
filter sizes are so high proportional to the block size. Except for the
input filter, because the coinbase input is skipped, so many of them have 0
elements. But after block 120,000 or so, the filter compression converges
pretty quickly to near the optimal value. The encouraging thing here is
that if you look at the ratio of the combined size of the separated filters
vs the size of a filter containing all of them (currently known as the
basic filter), they are pretty much the same size. The mean of the ratio
between them after block 150,000 is 99.4%. So basically, not much
compression efficiently is lost by separating the basic filter into
sub-filters.

On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:

> My suggestion was to advertise a bitfield for each filter type the node
>> serves,
>> where the bitfield indicates what elements are part of the filters. This
>> essentially
>> removes the notion of decided filter types and instead leaves the
>> decision to
>> full-nodes.
>>
>
> I think it makes more sense to construct entirely separate filters for the
> different types of elements and allow clients to download only the ones
> they care about. If there are enough elements per filter, the compression
> ratio shouldn't be much worse by splitting them up. This prevents the
> exponential blowup in the number of filters that you mention, Johan, and it
> works nicely with service bits for advertising different filter types
> independently.
>
> So if we created three separate filter types, one for output scripts, one
> for input outpoints, and one for TXIDs, each signaled with a separate
> service bit, are people good with that? Or do you think there shouldn't be
> a TXID filter at all, Matt? I didn't include the option of a prev output
> script filter or rolling that into the block output script filter because
> it changes the security model (cannot be proven to be correct/incorrect
> succinctly).
>
> Then there's the question of whether to separate or combine the headers.
> I'd lean towards keeping them separate because it's simpler that way.
>

[-- Attachment #1.2: Type: text/html, Size: 3002 bytes --]

[-- Attachment #2: filter_sizes.svg --]
[-- Type: image/svg+xml, Size: 2066102 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-23  7:38                       ` Jim Posen
@ 2018-05-23  8:16                         ` Johan Torås Halseth
  2018-05-23 17:28                         ` Gregory Maxwell
  1 sibling, 0 replies; 60+ messages in thread
From: Johan Torås Halseth @ 2018-05-23  8:16 UTC (permalink / raw)
  To: Jim Posen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2991 bytes --]

Thanks, Jimpo!

This is very encouraging, I think. I sorta assumed that separating the
elements into their own sub-filters would hurt the compression a lot more.
Can the compression ratio/false positive rate be tweaked with the
sub-filters in mind?

With the total size of the separated filters being no larger than the
combined filters, I see no benefit of combined filters? Committing to them
all in the headers would also save space, and we could ensure nodes are
serving all sub-filters.

- Johan

On Wed, May 23, 2018 at 9:38 AM, Jim Posen <jim.posen@gmail.com> wrote:

> So I checked filter sizes (as a proportion of block size) for each of the
> sub-filters. The graph is attached.
>
> As interpretation, the first ~120,000 blocks are so small that the
> Golomb-Rice coding can't compress the filters that well, which is why the
> filter sizes are so high proportional to the block size. Except for the
> input filter, because the coinbase input is skipped, so many of them have 0
> elements. But after block 120,000 or so, the filter compression converges
> pretty quickly to near the optimal value. The encouraging thing here is
> that if you look at the ratio of the combined size of the separated filters
> vs the size of a filter containing all of them (currently known as the
> basic filter), they are pretty much the same size. The mean of the ratio
> between them after block 150,000 is 99.4%. So basically, not much
> compression efficiently is lost by separating the basic filter into
> sub-filters.
>
> On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:
>
>> My suggestion was to advertise a bitfield for each filter type the node
>>> serves,
>>> where the bitfield indicates what elements are part of the filters. This
>>> essentially
>>> removes the notion of decided filter types and instead leaves the
>>> decision to
>>> full-nodes.
>>>
>>
>> I think it makes more sense to construct entirely separate filters for
>> the different types of elements and allow clients to download only the ones
>> they care about. If there are enough elements per filter, the compression
>> ratio shouldn't be much worse by splitting them up. This prevents the
>> exponential blowup in the number of filters that you mention, Johan, and it
>> works nicely with service bits for advertising different filter types
>> independently.
>>
>> So if we created three separate filter types, one for output scripts, one
>> for input outpoints, and one for TXIDs, each signaled with a separate
>> service bit, are people good with that? Or do you think there shouldn't be
>> a TXID filter at all, Matt? I didn't include the option of a prev output
>> script filter or rolling that into the block output script filter because
>> it changes the security model (cannot be proven to be correct/incorrect
>> succinctly).
>>
>> Then there's the question of whether to separate or combine the headers.
>> I'd lean towards keeping them separate because it's simpler that way.
>>
>
>

[-- Attachment #2: Type: text/html, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-23  7:38                       ` Jim Posen
  2018-05-23  8:16                         ` Johan Torås Halseth
@ 2018-05-23 17:28                         ` Gregory Maxwell
  2018-05-24  1:04                           ` Conner Fromknecht
  1 sibling, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-23 17:28 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

Any chance you could add a graph of input-scripts  (instead of input outpoints)?

On Wed, May 23, 2018 at 7:38 AM, Jim Posen via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> So I checked filter sizes (as a proportion of block size) for each of the
> sub-filters. The graph is attached.
>
> As interpretation, the first ~120,000 blocks are so small that the
> Golomb-Rice coding can't compress the filters that well, which is why the
> filter sizes are so high proportional to the block size. Except for the
> input filter, because the coinbase input is skipped, so many of them have 0
> elements. But after block 120,000 or so, the filter compression converges
> pretty quickly to near the optimal value. The encouraging thing here is that
> if you look at the ratio of the combined size of the separated filters vs
> the size of a filter containing all of them (currently known as the basic
> filter), they are pretty much the same size. The mean of the ratio between
> them after block 150,000 is 99.4%. So basically, not much compression
> efficiently is lost by separating the basic filter into sub-filters.
>
> On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:
>>>
>>> My suggestion was to advertise a bitfield for each filter type the node
>>> serves,
>>> where the bitfield indicates what elements are part of the filters. This
>>> essentially
>>> removes the notion of decided filter types and instead leaves the
>>> decision to
>>> full-nodes.
>>
>>
>> I think it makes more sense to construct entirely separate filters for the
>> different types of elements and allow clients to download only the ones they
>> care about. If there are enough elements per filter, the compression ratio
>> shouldn't be much worse by splitting them up. This prevents the exponential
>> blowup in the number of filters that you mention, Johan, and it works nicely
>> with service bits for advertising different filter types independently.
>>
>> So if we created three separate filter types, one for output scripts, one
>> for input outpoints, and one for TXIDs, each signaled with a separate
>> service bit, are people good with that? Or do you think there shouldn't be a
>> TXID filter at all, Matt? I didn't include the option of a prev output
>> script filter or rolling that into the block output script filter because it
>> changes the security model (cannot be proven to be correct/incorrect
>> succinctly).
>>
>> Then there's the question of whether to separate or combine the headers.
>> I'd lean towards keeping them separate because it's simpler that way.
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-23 17:28                         ` Gregory Maxwell
@ 2018-05-24  1:04                           ` Conner Fromknecht
  2018-05-24  3:48                             ` Jim Posen
  0 siblings, 1 reply; 60+ messages in thread
From: Conner Fromknecht @ 2018-05-24  1:04 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4896 bytes --]

Hi all,

Jimpo, thanks for looking into those stats! I had always imagined that there
would be a more significant savings in having all filters in one bundle, as
opposed to separate. These results are interesting, to say the least, and
definitely offer us some flexibility in options for filter sharding.

So far, the bulk of this discussion has centered around bandwidth. I am
concerned, however, that splitting up the filters is at odds with the other
goal of the proposal in offering improved privacy.

Allowing clients to choose individual filter sets trivially exposes the
type of
data that client is interested in. This alone might be enough to
fingerprint the
function of a peer and reduce anonymity set justifying their potential
behavior.

Furthermore, if a match is encountered, and block requested, full nodes have
more targeted insight into what caused a particular match. They could infer
that
the client received funds in a particular block, e.g., if they are only
requesting
output scripts.

This is above and beyond the additional complexity of now syncing,
validating,
and managing five or six distinct header/filter-header/filter/block chains.

I agree that saving on bandwidth is an important goal, but bandwidth and
privacy
are always seemingly at odds. Strictly comparing the bandwidth requirements
of
a system that heavily weighs privacy to existing ones, e.g. BIP39, that
don't is a
losing battle IMO.

I'm not fundamentally opposed to splitting the filters, I certainly see the
arguments for flexibility. However, I also want to ensure we are
considering the
second order effects that fall out of optimizing for one metric when others
exist.

Cheers,
Conner
On Wed, May 23, 2018 at 10:29 Gregory Maxwell via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Any chance you could add a graph of input-scripts  (instead of input
> outpoints)?
>
> On Wed, May 23, 2018 at 7:38 AM, Jim Posen via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > So I checked filter sizes (as a proportion of block size) for each of the
> > sub-filters. The graph is attached.
> >
> > As interpretation, the first ~120,000 blocks are so small that the
> > Golomb-Rice coding can't compress the filters that well, which is why the
> > filter sizes are so high proportional to the block size. Except for the
> > input filter, because the coinbase input is skipped, so many of them
> have 0
> > elements. But after block 120,000 or so, the filter compression converges
> > pretty quickly to near the optimal value. The encouraging thing here is
> that
> > if you look at the ratio of the combined size of the separated filters vs
> > the size of a filter containing all of them (currently known as the basic
> > filter), they are pretty much the same size. The mean of the ratio
> between
> > them after block 150,000 is 99.4%. So basically, not much compression
> > efficiently is lost by separating the basic filter into sub-filters.
> >
> > On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:
> >>>
> >>> My suggestion was to advertise a bitfield for each filter type the node
> >>> serves,
> >>> where the bitfield indicates what elements are part of the filters.
> This
> >>> essentially
> >>> removes the notion of decided filter types and instead leaves the
> >>> decision to
> >>> full-nodes.
> >>
> >>
> >> I think it makes more sense to construct entirely separate filters for
> the
> >> different types of elements and allow clients to download only the ones
> they
> >> care about. If there are enough elements per filter, the compression
> ratio
> >> shouldn't be much worse by splitting them up. This prevents the
> exponential
> >> blowup in the number of filters that you mention, Johan, and it works
> nicely
> >> with service bits for advertising different filter types independently.
> >>
> >> So if we created three separate filter types, one for output scripts,
> one
> >> for input outpoints, and one for TXIDs, each signaled with a separate
> >> service bit, are people good with that? Or do you think there shouldn't
> be a
> >> TXID filter at all, Matt? I didn't include the option of a prev output
> >> script filter or rolling that into the block output script filter
> because it
> >> changes the security model (cannot be proven to be correct/incorrect
> >> succinctly).
> >>
> >> Then there's the question of whether to separate or combine the headers.
> >> I'd lean towards keeping them separate because it's simpler that way.
> >
> >
> >
> > _______________________________________________
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 6724 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-24  1:04                           ` Conner Fromknecht
@ 2018-05-24  3:48                             ` Jim Posen
  2018-05-28 18:18                               ` Tamas Blummer
  0 siblings, 1 reply; 60+ messages in thread
From: Jim Posen @ 2018-05-24  3:48 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion


[-- Attachment #1.1: Type: text/plain, Size: 5583 bytes --]

Greg, I've attached a graph including the input scripts.

In the top graph, we can see how the input script filter compares to the
input outpoint filter. It is definitely smaller as a result of address
reuse. The bottom graph shows the ratio over time of combining the input
prev script and output script filters vs keeping them separate. In more
recent blocks, it appears that there are decreasing savings.

On Wed, May 23, 2018 at 6:04 PM Conner Fromknecht
<conner@lightning.engineering> wrote:

> Hi all,
>
> Jimpo, thanks for looking into those stats! I had always imagined that
> there
> would be a more significant savings in having all filters in one bundle, as
> opposed to separate. These results are interesting, to say the least, and
> definitely offer us some flexibility in options for filter sharding.
>
> So far, the bulk of this discussion has centered around bandwidth. I am
> concerned, however, that splitting up the filters is at odds with the
> other
> goal of the proposal in offering improved privacy.
>
> Allowing clients to choose individual filter sets trivially exposes the
> type of
> data that client is interested in. This alone might be enough to
> fingerprint the
> function of a peer and reduce anonymity set justifying their potential
> behavior.
>
> Furthermore, if a match is encountered, and block requested, full nodes
> have
> more targeted insight into what caused a particular match. They could
> infer that
> the client received funds in a particular block, e.g., if they are only
> requesting
> output scripts.
>
> This is above and beyond the additional complexity of now syncing,
> validating,
> and managing five or six distinct header/filter-header/filter/block chains.
>
> I agree that saving on bandwidth is an important goal, but bandwidth and
> privacy
> are always seemingly at odds. Strictly comparing the bandwidth
> requirements of
> a system that heavily weighs privacy to existing ones, e.g. BIP39, that
> don't is a
> losing battle IMO.
>
> I'm not fundamentally opposed to splitting the filters, I certainly see the
> arguments for flexibility. However, I also want to ensure we are
> considering the
> second order effects that fall out of optimizing for one metric when
> others exist.
>
> Cheers,
> Conner
> On Wed, May 23, 2018 at 10:29 Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> Any chance you could add a graph of input-scripts  (instead of input
>> outpoints)?
>>
>> On Wed, May 23, 2018 at 7:38 AM, Jim Posen via bitcoin-dev
>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> > So I checked filter sizes (as a proportion of block size) for each of
>> the
>> > sub-filters. The graph is attached.
>> >
>> > As interpretation, the first ~120,000 blocks are so small that the
>> > Golomb-Rice coding can't compress the filters that well, which is why
>> the
>> > filter sizes are so high proportional to the block size. Except for the
>> > input filter, because the coinbase input is skipped, so many of them
>> have 0
>> > elements. But after block 120,000 or so, the filter compression
>> converges
>> > pretty quickly to near the optimal value. The encouraging thing here is
>> that
>> > if you look at the ratio of the combined size of the separated filters
>> vs
>> > the size of a filter containing all of them (currently known as the
>> basic
>> > filter), they are pretty much the same size. The mean of the ratio
>> between
>> > them after block 150,000 is 99.4%. So basically, not much compression
>> > efficiently is lost by separating the basic filter into sub-filters.
>> >
>> > On Tue, May 22, 2018 at 5:42 PM, Jim Posen <jim.posen@gmail.com> wrote:
>> >>>
>> >>> My suggestion was to advertise a bitfield for each filter type the
>> node
>> >>> serves,
>> >>> where the bitfield indicates what elements are part of the filters.
>> This
>> >>> essentially
>> >>> removes the notion of decided filter types and instead leaves the
>> >>> decision to
>> >>> full-nodes.
>> >>
>> >>
>> >> I think it makes more sense to construct entirely separate filters for
>> the
>> >> different types of elements and allow clients to download only the
>> ones they
>> >> care about. If there are enough elements per filter, the compression
>> ratio
>> >> shouldn't be much worse by splitting them up. This prevents the
>> exponential
>> >> blowup in the number of filters that you mention, Johan, and it works
>> nicely
>> >> with service bits for advertising different filter types independently.
>> >>
>> >> So if we created three separate filter types, one for output scripts,
>> one
>> >> for input outpoints, and one for TXIDs, each signaled with a separate
>> >> service bit, are people good with that? Or do you think there
>> shouldn't be a
>> >> TXID filter at all, Matt? I didn't include the option of a prev output
>> >> script filter or rolling that into the block output script filter
>> because it
>> >> changes the security model (cannot be proven to be correct/incorrect
>> >> succinctly).
>> >>
>> >> Then there's the question of whether to separate or combine the
>> headers.
>> >> I'd lean towards keeping them separate because it's simpler that way.
>> >
>> >
>> >
>> > _______________________________________________
>> > bitcoin-dev mailing list
>> > bitcoin-dev@lists.linuxfoundation.org
>> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>> >
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>

[-- Attachment #1.2: Type: text/html, Size: 7723 bytes --]

[-- Attachment #2: filter_sizes.svg --]
[-- Type: image/svg+xml, Size: 2833874 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-24  3:48                             ` Jim Posen
@ 2018-05-28 18:18                               ` Tamas Blummer
  2018-05-28 18:28                                 ` Tamas Blummer
  0 siblings, 1 reply; 60+ messages in thread
From: Tamas Blummer @ 2018-05-28 18:18 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

Hi Jim,

A “basic” combined filter would mean up to 0.5 GB filter data per month (with 100% segwith use). Considering that 1 GB is the usual data quota for an entry level mobile phone contract, this could be a too high barrier for adoption.

I repeated your calculations and produced a slightly different graph that shows the fraction of cummulative filter size to cummulative blockchain size. This is less noisy but otherwise confirms your measurement.

I think that the data supports separation of filters as a combined filter does not seem to come with significant savings. (basic  size ~= txid + input points + output scripts sizes)
 
My calculations are repeatable with:

https://github.com/tamasblummer/rust-bitcoin-spv/blob/blockfilterstats/src/bin/blockfilterstats.rs

that is using a Rust implementation of an SPV client built on top of other libraries we work on in the rust-bitcoin GitHub community (https://github.com/rust-bitcoin). Yes, this is a shameles plug for the project hoping to attract more developer.

Tamas Blummer




> On May 24, 2018, at 05:48, Jim Posen via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> Greg, I've attached a graph including the input scripts.
> 
> In the top graph, we can see how the input script filter compares to the input outpoint filter. It is definitely smaller as a result of address reuse. The bottom graph shows the ratio over time of combining the input prev script and output script filters vs keeping them separate. In more recent blocks, it appears that there are decreasing savings.
> 


[-- Attachment #2.1: Type: text/html, Size: 2653 bytes --]

[-- Attachment #2.2: filters.png --]
[-- Type: image/png, Size: 69944 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-28 18:18                               ` Tamas Blummer
@ 2018-05-28 18:28                                 ` Tamas Blummer
  2018-05-28 19:24                                   ` Gregory Maxwell
  0 siblings, 1 reply; 60+ messages in thread
From: Tamas Blummer @ 2018-05-28 18:28 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

Forgot to mention: The link I sent is to a branch that is patched to produce the filter stats. 
This is the main project and the BIP158 implementation: https://github.com/rust-bitcoin/rust-bitcoin-spv/blob/master/src/blockfilter.rs

Tamas Blummer

> On May 28, 2018, at 20:18, Tamas Blummer <tamas.blummer@gmail.com> wrote:
> 
> Hi Jim,
> 
> A “basic” combined filter would mean up to 0.5 GB filter data per month (with 100% segwith use). Considering that 1 GB is the usual data quota for an entry level mobile phone contract, this could be a too high barrier for adoption.
> 
> I repeated your calculations and produced a slightly different graph that shows the fraction of cummulative filter size to cummulative blockchain size. This is less noisy but otherwise confirms your measurement.
> 
> I think that the data supports separation of filters as a combined filter does not seem to come with significant savings. (basic  size ~= txid + input points + output scripts sizes)
>  
> My calculations are repeatable with:
> 
> https://github.com/tamasblummer/rust-bitcoin-spv/blob/blockfilterstats/src/bin/blockfilterstats.rs
> 
> that is using a Rust implementation of an SPV client built on top of other libraries we work on in the rust-bitcoin GitHub community (https://github.com/rust-bitcoin). Yes, this is a shameles plug for the project hoping to attract more developer.
> 
> Tamas Blummer
> 
> 
> <filters.png>
> 
>> On May 24, 2018, at 05:48, Jim Posen via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> 
>> Greg, I've attached a graph including the input scripts.
>> 
>> In the top graph, we can see how the input script filter compares to the input outpoint filter. It is definitely smaller as a result of address reuse. The bottom graph shows the ratio over time of combining the input prev script and output script filters vs keeping them separate. In more recent blocks, it appears that there are decreasing savings.
>> 
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-28 18:28                                 ` Tamas Blummer
@ 2018-05-28 19:24                                   ` Gregory Maxwell
  2018-05-29  2:42                                     ` Jim Posen
  0 siblings, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-28 19:24 UTC (permalink / raw)
  To: Tamas Blummer; +Cc: Bitcoin Protocol Discussion

Is there an application that requires watching for output scripts that
doesn't also require watching for input scrips (or, less efficiently,
input outpoints)?

Any of the wallet application that I'm aware of need to see coins
being spent as well as created, else they may try to spend already
spent coins. If we're not aware of any applications that wouldnt use
both, there isn't much reason to separate them and if input scripts
are used instead of input outpoints there is additional savings from
combining them (due to the same scripts being spent as were created in
the block-- due to reuse and chaining).

I still am of the belief, based on Matt's argument, that there is no
use for txid what-so-ever (instead just watch for an outpoint).





On Mon, May 28, 2018 at 6:28 PM, Tamas Blummer <tamas.blummer@gmail.com> wrote:
> Forgot to mention: The link I sent is to a branch that is patched to produce the filter stats.
> This is the main project and the BIP158 implementation: https://github.com/rust-bitcoin/rust-bitcoin-spv/blob/master/src/blockfilter.rs
>
> Tamas Blummer
>
>> On May 28, 2018, at 20:18, Tamas Blummer <tamas.blummer@gmail.com> wrote:
>>
>> Hi Jim,
>>
>> A “basic” combined filter would mean up to 0.5 GB filter data per month (with 100% segwith use). Considering that 1 GB is the usual data quota for an entry level mobile phone contract, this could be a too high barrier for adoption.
>>
>> I repeated your calculations and produced a slightly different graph that shows the fraction of cummulative filter size to cummulative blockchain size. This is less noisy but otherwise confirms your measurement.
>>
>> I think that the data supports separation of filters as a combined filter does not seem to come with significant savings. (basic  size ~= txid + input points + output scripts sizes)
>>
>> My calculations are repeatable with:
>>
>> https://github.com/tamasblummer/rust-bitcoin-spv/blob/blockfilterstats/src/bin/blockfilterstats.rs
>>
>> that is using a Rust implementation of an SPV client built on top of other libraries we work on in the rust-bitcoin GitHub community (https://github.com/rust-bitcoin). Yes, this is a shameles plug for the project hoping to attract more developer.
>>
>> Tamas Blummer
>>
>>
>> <filters.png>
>>
>>> On May 24, 2018, at 05:48, Jim Posen via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
>>>
>>> Greg, I've attached a graph including the input scripts.
>>>
>>> In the top graph, we can see how the input script filter compares to the input outpoint filter. It is definitely smaller as a result of address reuse. The bottom graph shows the ratio over time of combining the input prev script and output script filters vs keeping them separate. In more recent blocks, it appears that there are decreasing savings.
>>>
>>
>


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-28 19:24                                   ` Gregory Maxwell
@ 2018-05-29  2:42                                     ` Jim Posen
  2018-05-29  3:24                                       ` Gregory Maxwell
  2018-05-29  4:01                                       ` Olaoluwa Osuntokun
  0 siblings, 2 replies; 60+ messages in thread
From: Jim Posen @ 2018-05-29  2:42 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

> Is there an application that requires watching for output scripts that
> doesn't also require watching for input scrips (or, less efficiently,
> input outpoints)?
>

Certain wallets may be able to use only the output script filter by using
output scripts to watch for confirmations on sent transactions, assuming
that application is the only one with access to the private keys. The
additional benefit of the input script/outpoint filter is to watch for
unexpected spends (coins getting stolen or spent from another wallet) or
transactions without a unique change or output address. I think this is a
reasonable implementation, and it would be nice to be able to download that
filter without any input elements.

[-- Attachment #2: Type: text/html, Size: 934 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-29  2:42                                     ` Jim Posen
@ 2018-05-29  3:24                                       ` Gregory Maxwell
  2018-05-29  4:01                                       ` Olaoluwa Osuntokun
  1 sibling, 0 replies; 60+ messages in thread
From: Gregory Maxwell @ 2018-05-29  3:24 UTC (permalink / raw)
  To: Jim Posen; +Cc: Bitcoin Protocol Discussion

On Tue, May 29, 2018 at 2:42 AM, Jim Posen <jim.posen@gmail.com> wrote:
> Certain wallets may be able to use only the output script filter by using
> output scripts to watch for confirmations on sent transactions, assuming
> that application is the only one with access to the private keys. The
> additional benefit of the input script/outpoint filter is to watch for
> unexpected spends (coins getting stolen or spent from another wallet) or
> transactions without a unique change or output address. I think this is a
> reasonable implementation, and it would be nice to be able to download that
> filter without any input elements.

In this configuration there is little need to access historical blocks
though, since you're assuming that you'll never recover from a backup.
No?


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-29  2:42                                     ` Jim Posen
  2018-05-29  3:24                                       ` Gregory Maxwell
@ 2018-05-29  4:01                                       ` Olaoluwa Osuntokun
  2018-05-31 14:27                                         ` Tamas Blummer
  2018-06-01  2:52                                         ` Olaoluwa Osuntokun
  1 sibling, 2 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-05-29  4:01 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2420 bytes --]

> The additional benefit of the input script/outpoint filter is to watch for
> unexpected spends (coins getting stolen or spent from another wallet) or
> transactions without a unique change or output address. I think this is a
> reasonable implementation, and it would be nice to be able to download
that
> filter without any input elements.

As someone who's implemented a complete integration of the filtering
technique into an existing wallet, and a higher application I disagree.
There's not much gain to be had in splitting up the filters: it'll result in
additional round trips (to fetch these distinct filter) during normal
operation, complicate routine seed rescanning logic, and also is detrimental
to privacy if one is fetching blocks from the same peer as they've
downloaded the filters from.

However, I'm now convinced that the savings had by including the prev output
script (addr re-use and outputs spent in the same block as they're created)
outweigh the additional booking keeping required in an implementation (when
extracting the precise tx that matched) compared to using regular outpoint
as we do currently. Combined with the recently proposed re-parametrization
of the gcs parameters[1], the filter size should shrink by quite a bit!

I'm very happy with the review the BIPs has been receiving as of late. It
would've been nice to have this 1+ year ago when the draft was initially
proposed, but better late that never!

Based on this thread, [1], and discussions on various IRC channels, I plan
to make the following modifications to the BIP:

  1. use P=2^19 and M=784931 as gcs parameters, and also bind these to the
     filter instance, so future filter types may use distinct parameters
  2. use the prev output script rather than the prev input script in the
     regular filter
  3. remove the txid from the regular filter(as with some extra book-keeping
     the output script is enough)
  4. do away with the extended filter all together, as our original use case
     for it has been nerfed as the filter size grew too large when doing
     recursive parsing. instead we watch for the outpoint being spent and
     extract the pre-image from it if it matches now

The resulting changes should slash the size of the filters, yet still ensure
that they're useful enough for our target use case.

[1]:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-May/016029.html

-- Laolu

[-- Attachment #2: Type: text/html, Size: 3108 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-29  4:01                                       ` Olaoluwa Osuntokun
@ 2018-05-31 14:27                                         ` Tamas Blummer
  2018-06-01  2:52                                         ` Olaoluwa Osuntokun
  1 sibling, 0 replies; 60+ messages in thread
From: Tamas Blummer @ 2018-05-31 14:27 UTC (permalink / raw)
  To: Olaoluwa Osuntokun, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5046 bytes --]

I processed the historic blockchain to create a single filter populated with spent input scripts and output scripts. The Golomb parameter was P=2^20

The resulting chart shows a volatile history of same-block address re-use with a notable drops in relative filter size during the early history and in the time window where SatoshiDICE was popular, since then trending higher.
The history of only the last half year suggests a current filter size of between 2.0% - 2.5% of block sizes.

Since most outputs are spent within a short time period, but apparently not that often in same blocks, I think it was worth considering filter series that match over a windows of 2^n blocks (n=(0…10)). Applications could then bracket the 
range of interest and then narrow down requesting finer filters or blocks.

Then I created 1600 random (P2SH) scripts and totaled the false positive block download data size if observing 100, 200, 400, 800, 1600 of them. 
The result suggests that even for 1600 the false positive overhead is less than 0.1% of blockchain data size. 

I agree with Greg that we should optimize the parameters for a small observed set as those will be running on mobile devices.
As of Pieter’s findings the simulation parameters were optimal for ca. 1000 observed scripts which is maybe to many for a “small” application.
On the other hand we do not know the needs of future popular mobile applications.  
 
With parameters of the simulation the current minimal data burden on a mobile wallet would be ca. 0.1 GB / Month.

Simulations with other parameters could be executed using this patch branch: https://github.com/tamasblummer/rust-bitcoin-spv/tree/blockfilterstats A run takes a few hours on a fast machine with release build and local bitcoind.
The calculation can not be reduced to the recent history as the process builds in-memory utxo from genesis.

The result of executing the binary is a CSV file containing:
blocknumber, blocksize, utxo size, filter size, false positive data size for 100, false positive data size for 100, … false positive data size for 100
e.g:
524994,1112181,57166825,21556,0,0,0,0,1112181

Tamas Blummer


> On May 29, 2018, at 06:01, Olaoluwa Osuntokun via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> > The additional benefit of the input script/outpoint filter is to watch for
> > unexpected spends (coins getting stolen or spent from another wallet) or
> > transactions without a unique change or output address. I think this is a
> > reasonable implementation, and it would be nice to be able to download that
> > filter without any input elements. 
> 
> As someone who's implemented a complete integration of the filtering
> technique into an existing wallet, and a higher application I disagree.
> There's not much gain to be had in splitting up the filters: it'll result in
> additional round trips (to fetch these distinct filter) during normal
> operation, complicate routine seed rescanning logic, and also is detrimental
> to privacy if one is fetching blocks from the same peer as they've
> downloaded the filters from.
> 
> However, I'm now convinced that the savings had by including the prev output
> script (addr re-use and outputs spent in the same block as they're created)
> outweigh the additional booking keeping required in an implementation (when
> extracting the precise tx that matched) compared to using regular outpoint
> as we do currently. Combined with the recently proposed re-parametrization
> of the gcs parameters[1], the filter size should shrink by quite a bit!
> 
> I'm very happy with the review the BIPs has been receiving as of late. It
> would've been nice to have this 1+ year ago when the draft was initially
> proposed, but better late that never!
> 
> Based on this thread, [1], and discussions on various IRC channels, I plan
> to make the following modifications to the BIP:
> 
>   1. use P=2^19 and M=784931 as gcs parameters, and also bind these to the
>      filter instance, so future filter types may use distinct parameters
>   2. use the prev output script rather than the prev input script in the
>      regular filter
>   3. remove the txid from the regular filter(as with some extra book-keeping
>      the output script is enough) 
>   4. do away with the extended filter all together, as our original use case
>      for it has been nerfed as the filter size grew too large when doing
>      recursive parsing. instead we watch for the outpoint being spent and
>      extract the pre-image from it if it matches now
> 
> The resulting changes should slash the size of the filters, yet still ensure
> that they're useful enough for our target use case.
> 
> [1]: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-May/016029.html
> 
> -- Laolu
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[-- Attachment #2.1: Type: text/html, Size: 7130 bytes --]

[-- Attachment #2.2: filtersize.png --]
[-- Type: image/png, Size: 58848 bytes --]

[-- Attachment #2.3: lasthalfyear.png --]
[-- Type: image/png, Size: 50431 bytes --]

[-- Attachment #2.4: falsepositive.png --]
[-- Type: image/png, Size: 82663 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-29  4:01                                       ` Olaoluwa Osuntokun
  2018-05-31 14:27                                         ` Tamas Blummer
@ 2018-06-01  2:52                                         ` Olaoluwa Osuntokun
  2018-06-01  4:15                                           ` Gregory Maxwell
       [not found]                                           ` <CAAS2fgSyVi0d_ixp-auRPPzPfFeffN=hsWhWT5=EzDO3O+Ue1g@mail.gmail.com>
  1 sibling, 2 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-01  2:52 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4314 bytes --]

Hi y'all,

I've made a PR to the BIP repo to modify BIP 158 based on this thread, and
other recent threads giving feedback on the current version of the BIP:

  * https://github.com/bitcoin/bips/pull/687

I've also updated the test vectors based on the current parameters (and
filter format), and also the code used to generate the test vectors. Due to
the change in parametrization, the test vectors now target (P=19 M=784931),
and there're no longer any cases related to extended filters.

One notable thing that I left off is the proposed change to use the previous
output script rather than the outpoint. Modifying the filters in this
fashion would be a downgrade in the security model for light clients, as it
would allow full nodes to lie by omission, just as they can with BIP 37. As
is now, if nodes present conflicting information, then the light client can
download the target block, fully reconstruct the filter itself, then ban any
nodes which advertised the incorrect filter. The inclusion of the filter
header checkpoints make it rather straight forward for light clients to
bisect the state to find the conflicting advertisement, and it's strongly
recommended that they do so.

To get a feel for the level of impact these changes would have on existing
applications that depend on the txid being included in the filter, I've
implemented these changes across btcutil, btcd, btcwallet, and lnd (which
previously relied on the txid for confirmation notifications). For lnd at
least, the code impact was rather minimal, as we use the pkScript for
matching a block, but then still scan the block manually to find the precise
transaction (by txid) that we were interested in (if it's there).

-- Laolu


On Mon, May 28, 2018 at 9:01 PM Olaoluwa Osuntokun <laolu32@gmail.com>
wrote:

> > The additional benefit of the input script/outpoint filter is to watch
> for
> > unexpected spends (coins getting stolen or spent from another wallet) or
> > transactions without a unique change or output address. I think this is a
> > reasonable implementation, and it would be nice to be able to download
> that
> > filter without any input elements.
>
> As someone who's implemented a complete integration of the filtering
> technique into an existing wallet, and a higher application I disagree.
> There's not much gain to be had in splitting up the filters: it'll result
> in
> additional round trips (to fetch these distinct filter) during normal
> operation, complicate routine seed rescanning logic, and also is
> detrimental
> to privacy if one is fetching blocks from the same peer as they've
> downloaded the filters from.
>
> However, I'm now convinced that the savings had by including the prev
> output
> script (addr re-use and outputs spent in the same block as they're created)
> outweigh the additional booking keeping required in an implementation (when
> extracting the precise tx that matched) compared to using regular outpoint
> as we do currently. Combined with the recently proposed re-parametrization
> of the gcs parameters[1], the filter size should shrink by quite a bit!
>
> I'm very happy with the review the BIPs has been receiving as of late. It
> would've been nice to have this 1+ year ago when the draft was initially
> proposed, but better late that never!
>
> Based on this thread, [1], and discussions on various IRC channels, I plan
> to make the following modifications to the BIP:
>
>   1. use P=2^19 and M=784931 as gcs parameters, and also bind these to the
>      filter instance, so future filter types may use distinct parameters
>   2. use the prev output script rather than the prev input script in the
>      regular filter
>   3. remove the txid from the regular filter(as with some extra
> book-keeping
>      the output script is enough)
>   4. do away with the extended filter all together, as our original use
> case
>      for it has been nerfed as the filter size grew too large when doing
>      recursive parsing. instead we watch for the outpoint being spent and
>      extract the pre-image from it if it matches now
>
> The resulting changes should slash the size of the filters, yet still
> ensure
> that they're useful enough for our target use case.
>
> [1]:
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2018-May/016029.html
>
> -- Laolu
>

[-- Attachment #2: Type: text/html, Size: 5651 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-01  2:52                                         ` Olaoluwa Osuntokun
@ 2018-06-01  4:15                                           ` Gregory Maxwell
       [not found]                                           ` <CAAS2fgSyVi0d_ixp-auRPPzPfFeffN=hsWhWT5=EzDO3O+Ue1g@mail.gmail.com>
  1 sibling, 0 replies; 60+ messages in thread
From: Gregory Maxwell @ 2018-06-01  4:15 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

On Fri, Jun 1, 2018 at 2:52 AM, Olaoluwa Osuntokun via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> One notable thing that I left off is the proposed change to use the previous
> output script rather than the outpoint. Modifying the filters in this
> fashion would be a downgrade in the security model for light clients, as it

Only if you make a very strong assumption about the integrity of the
nodes the client is talkign to. A typical network attacker (e.g.
someone on your lan or wifi segmet, or someone who has compromised or
operates an upstream router) can be all of your peers.

The original propsal for using these kinds of maps was that their
digests could eventually be commited and then checked against the
commitment, matching the same general security model used otherwise in
SPV.

Unfortunately, using the scripts instead of the outpoints takes us
further away from a design that is optimized for committing (or, for
that matter, use purely locally by a wallet)...


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
       [not found]                                           ` <CAAS2fgSyVi0d_ixp-auRPPzPfFeffN=hsWhWT5=EzDO3O+Ue1g@mail.gmail.com>
@ 2018-06-02  0:01                                             ` Olaoluwa Osuntokun
  2018-06-02  0:22                                               ` Gregory Maxwell
  0 siblings, 1 reply; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-02  0:01 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2705 bytes --]

> A typical network attacker (e.g.  someone on your lan or wifi segmet, or
> someone who has compromised or operates an upstream router) can be all of
> your peers.

This is true, but it cannot make us accept any invalid filters unless the
attacker is also creating invalid blocks w/ valid PoW.

> The original propsal for using these kinds of maps was that their digests
> could eventually be commited and then checked against the commitment,
> matching the same general security model used otherwise in SPV.

Indeed, but no such proposal for committing the filters has emerged yet.
Slinging filters with new p2p messages requires much less coordination that
adding a new committed structure to Bitcoin. One could imagine that if
consensus exists to add new committed structures, then there may also be
initiatives to start to commit sig-ops, block weight, utxo's etc. As a
result one could imagine a much longer deployment cycle compared to a pure
p2p roll out in the near term, and many applications are looking for a
viable alternative to BIP 37.

> Unfortunately, using the scripts instead of the outpoints takes us further
> away from a design that is optimized for committing (or, for that matter,
> use purely locally by a wallet)...

I agree that using the prev input scripts would indeed be optimal from a
size perspective when the filters are to be committed. The current proposal
makes way for future filter types and it's likely the case that only the
most optimal filters should be committed (while other more niche filters
perhaps, remain only on the p2p level).

-- Laolu


On Thu, May 31, 2018 at 9:14 PM Gregory Maxwell <gmaxwell@gmail.com> wrote:

> On Fri, Jun 1, 2018 at 2:52 AM, Olaoluwa Osuntokun via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > One notable thing that I left off is the proposed change to use the
> previous
> > output script rather than the outpoint. Modifying the filters in this
> > fashion would be a downgrade in the security model for light clients, as
> it
>
> Only if you make a very strong assumption about the integrity of the
> nodes the client is talkign to. A typical network attacker (e.g.
> someone on your lan or wifi segmet, or someone who has compromised or
> operates an upstream router) can be all of your peers.
>
> The original propsal for using these kinds of maps was that their
> digests could eventually be commited and then checked against the
> commitment, matching the same general security model used otherwise in
> SPV.
>
> Unfortunately, using the scripts instead of the outpoints takes us
> further away from a design that is optimized for committing (or, for
> that matter, use purely locally by a wallet)...
>

[-- Attachment #2: Type: text/html, Size: 3474 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-02  0:01                                             ` Olaoluwa Osuntokun
@ 2018-06-02  0:22                                               ` Gregory Maxwell
  2018-06-02  2:02                                                 ` Jim Posen
  0 siblings, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-06-02  0:22 UTC (permalink / raw)
  To: Olaoluwa Osuntokun; +Cc: Bitcoin Protocol Discussion

On Sat, Jun 2, 2018 at 12:01 AM, Olaoluwa Osuntokun <laolu32@gmail.com> wrote:
>> A typical network attacker (e.g.  someone on your lan or wifi segmet, or
>> someone who has compromised or operates an upstream router) can be all of
>> your peers.
>
> This is true, but it cannot make us accept any invalid filters unless the
> attacker is also creating invalid blocks w/ valid PoW.

I wish that were the true, but absent commitments that wouldn't be the
case unless you were always downloading all the blocks-- since you
wouldn't have any sign that there was something wrong with the
filter-- and downloading all the blocks would moot using the filters
in the first place. :)

Or have I misunderstood you massively here?

For segwit originally I had proposed adding additional commitments
that would make it possible to efficiently prove invalidity of a
block; but that got stripped because many people were of the view that
the "assume you have at least one honest peer who saw that block and
rejected it to tell you that the block was invalid" security
assumption was of dubious value. Maybe it's more justifiable to make
use of a dubious assumption for a P2P feature than for a consensus
feature?  Perhaps,  I'd rather have both filter types from day one so
that things not implementing the comparison techniques don't get the
efficiency loss or the extra work to change filter types for a
consensus one.

[I think now that we're much closer to a design that would be worth
making a consensus committed version of than we were a few months ago
now, since we are effectively already on a second generation of the
design with the various improvements lately]


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-02  0:22                                               ` Gregory Maxwell
@ 2018-06-02  2:02                                                 ` Jim Posen
  2018-06-02 12:41                                                   ` David A. Harding
  0 siblings, 1 reply; 60+ messages in thread
From: Jim Posen @ 2018-06-02  2:02 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4109 bytes --]

To address the at-least-one-honest peer security assumption for light
clients, I think this is a rather good security model for light clients.
First it significantly reduces the chances that an attacker can eclipse a
client just by chance, and clients can implement measures like ensuring
connectivity to peers from different subnets. But even if, as you suggest,
a network attacker controls the target's local network, peers still can
have good security guarantees by requiring authenticated connections to
semi-trusted peers. A client can select a set of N servers that it believes
will not collude to attack it, and only sync filters if connected to a
threshold of them. So even if the network is malicious, the attacker cannot
forge the authenticated responses. The level of trust in these designated
parties again is quite low because only one has to be honest. This would
require something like BIP 150.

Even if clients are uncomfortable with whitelisting required peers, it
could have a policy of requiring a certain number of connections to peers
that have honestly served it filters in the past. This is sort of like
trust-on-first-use. This type of scheme, however, would require nodes to
advertise a pubkey per address, which BIP 150/151 does not support at
present.

All in all, I think this is an acceptable security model for light clients.
Without the ability to verify filter validity, a client would have to stop
syncing altogether in the presence of just one malicious peer, which is
unacceptable.

The other concern you raise, Greg, is using a filter for P2P communications
that we expect may be replaced in the future. You also raise the point that
full node wallets can use the smaller filters for rescans because the
filter validity is not in question. I'd perfectly fine with the idea of
defining two filter types in the BIP, one that is output script + outpoint
and the other output script + prev script. But I imagine some people would
object to the idea of full nodes storing two different filters that overlap
in contents. If we had to pick just one though, I'm strongly in support of
output script + outpoint so that BIP 157 can be deployed ASAP without a
consensus change. It's entirely possible we will learn even more about
optimal filter design through deployment and adoption.

On Fri, Jun 1, 2018 at 5:22 PM Gregory Maxwell <greg@xiph.org> wrote:

> On Sat, Jun 2, 2018 at 12:01 AM, Olaoluwa Osuntokun <laolu32@gmail.com>
> wrote:
> >> A typical network attacker (e.g.  someone on your lan or wifi segmet, or
> >> someone who has compromised or operates an upstream router) can be all
> of
> >> your peers.
> >
> > This is true, but it cannot make us accept any invalid filters unless the
> > attacker is also creating invalid blocks w/ valid PoW.
>
> I wish that were the true, but absent commitments that wouldn't be the
> case unless you were always downloading all the blocks-- since you
> wouldn't have any sign that there was something wrong with the
> filter-- and downloading all the blocks would moot using the filters
> in the first place. :)
>
> Or have I misunderstood you massively here?
>
> For segwit originally I had proposed adding additional commitments
> that would make it possible to efficiently prove invalidity of a
> block; but that got stripped because many people were of the view that
> the "assume you have at least one honest peer who saw that block and
> rejected it to tell you that the block was invalid" security
> assumption was of dubious value. Maybe it's more justifiable to make
> use of a dubious assumption for a P2P feature than for a consensus
> feature?  Perhaps,  I'd rather have both filter types from day one so
> that things not implementing the comparison techniques don't get the
> efficiency loss or the extra work to change filter types for a
> consensus one.
>
> [I think now that we're much closer to a design that would be worth
> making a consensus committed version of than we were a few months ago
> now, since we are effectively already on a second generation of the
> design with the various improvements lately]
>

[-- Attachment #2: Type: text/html, Size: 4654 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-02  2:02                                                 ` Jim Posen
@ 2018-06-02 12:41                                                   ` David A. Harding
  2018-06-02 22:02                                                     ` Tamas Blummer
  0 siblings, 1 reply; 60+ messages in thread
From: David A. Harding @ 2018-06-02 12:41 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]

On Fri, Jun 01, 2018 at 07:02:38PM -0700, Jim Posen via bitcoin-dev wrote:
> Without the ability to verify filter validity, a client would have to stop
> syncing altogether in the presence of just one malicious peer, which is
> unacceptable.

I'm confused about why this would be the case.  If Alice's node
generates filters accurately and Mallory's node generates filters
inaccurately, and they both send their filters to Bob, won't Bob be able
to download any blocks either filter indicates are relevant to his
wallet?

If Bob downloads a block that contains one of his transactions based on
Alice's filter indicating a possible match at a time when Mallory's
filter said there was no match, then this false negative is perfect
evidence of deceit on Mallory's part[1] and Bob can ban her.

If Bob downloads a block that doesn't contain any of his transactions
based on Mallory's filter indicating a match at a time when Alice's
filter said there was no match, then this false positive can be recorded
and Bob can eventually ban Mallory should the false positive rate
exceeds some threshold.

Until Mallory is eventually banned, it seems to me that the worst she
can do is waste Bob's bandwidth and that of any nodes serving him
accurate information, such as Alice's filters and the blocks Bob
is misled into downloading to check for matches.  The amount of
attacker:defender asymetry in the bandwidth wasted increases if
Mallory's filters become less accurate, but this also increases her
false positive rate and reduces the number of filters that need to be
seen before Bob bans her, so it seems to me (possibly naively) that this
is not a significant DoS vector.

-Dave

[1] Per BIP158 saying, "a Golomb-coded set (GCS), which matches all
items in the set with probability 1"

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-02 12:41                                                   ` David A. Harding
@ 2018-06-02 22:02                                                     ` Tamas Blummer
  2018-06-03  0:28                                                       ` Gregory Maxwell
  0 siblings, 1 reply; 60+ messages in thread
From: Tamas Blummer @ 2018-06-02 22:02 UTC (permalink / raw)
  To: David A. Harding, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1284 bytes --]

Without block commitment mobiles would have to use trusted filter provider or implement a complex data hungry algorithm and still remain as insecure as with BIP 37.

Years of experience implementing wallets with BIP 37 taught us that an outpoint + output script filter is useful. Committing such a filter to the block can not be an error.

We could roll this out on P2P prior to a soft fork adding the commitment, but I would not expect its use to pick up before that.
Therafter BIP 37 could be rightfully decommissioned, herby offering both security and privacy enhancement at modest data cost.

Tamas Blummer

> On Jun 2, 2018, at 14:41, David A. Harding via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> On Fri, Jun 01, 2018 at 07:02:38PM -0700, Jim Posen via bitcoin-dev wrote:
>> Without the ability to verify filter validity, a client would have to stop
>> syncing altogether in the presence of just one malicious peer, which is
>> unacceptable.
> 
> I'm confused about why this would be the case.  If Alice's node
> generates filters accurately and Mallory's node generates filters
> inaccurately, and they both send their filters to Bob, won't Bob be able
> to download any blocks either filter indicates are relevant to his
> wallet?


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 529 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-02 22:02                                                     ` Tamas Blummer
@ 2018-06-03  0:28                                                       ` Gregory Maxwell
  2018-06-03  5:14                                                         ` Tamas Blummer
  0 siblings, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-06-03  0:28 UTC (permalink / raw)
  To: Tamas Blummer, Bitcoin Protocol Discussion

On Sat, Jun 2, 2018 at 10:02 PM, Tamas Blummer via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> Years of experience implementing wallets with BIP 37

pretty much us that all these filter things are a total waste of time.
BIP37 use is nearly dead on the network-- monitor your own nodes to
see the actual use of the filters: it's very low.  I see under average
of 1 peer per day using it.

Moreover the primary complaint from users about BIP37 vs the
alternatives they're choosing over it (electrum and web wallets) is
that the sync time is too long-- something BIP158 doesn't improve.

So if we were going to go based on history we wouldn't bother with on
P2P at all.   But I think the history's lesson here may mostly be an
accident, and that the the non-use of BIP37 is  due more to the low
quality and/or abandoned status of most BIP37 implementing software,
rather than a fundamental lack of utility.   Though maybe we do find
out that once someone bothers implementing a PIR based scanning
mechanism (as electrum has talked about on and off for a while now)
we'll lose another advantage.

BIP37 also got a number of things wrong-- what went into the filters
was a big element in that (causing massive pollution of matches due to
useless data), along with privacy etc.  This kind of approach will
have the best chances if it doesn't repeat the mistakes... but also
it'll have the best chances if it has good security, and getting SPV-
equivalent security will require committing the filters, but
committing them is a big step because then the behaviour becomes
consensus normative-- it's worth spending a few months of extra
iteration getting the design as good as possible before doing that
(which is what we've been seeing lately).


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-03  0:28                                                       ` Gregory Maxwell
@ 2018-06-03  5:14                                                         ` Tamas Blummer
  2018-06-03  6:11                                                           ` Pieter Wuille
  0 siblings, 1 reply; 60+ messages in thread
From: Tamas Blummer @ 2018-06-03  5:14 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 513 bytes --]

Lighter but SPV secure nodes (filter committed) would help the network (esp. Layer 2) to grow mesh like, but add more user that blindly follow POW.

On longer term most users' security will be determined by either trusted hubs or POW.
I do not know which is worse, but we should at least offer the choice to the user, therefore commit filters.

Tamas Blummer

> On Jun 3, 2018, at 02:28, Gregory Maxwell <greg@xiph.org> wrote:
> 
> pretty much us that all these filter things are a total waste of time.

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 529 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-03  5:14                                                         ` Tamas Blummer
@ 2018-06-03  6:11                                                           ` Pieter Wuille
  2018-06-03 16:44                                                             ` Tamas Blummer
  2018-06-08  5:03                                                             ` Olaoluwa Osuntokun
  0 siblings, 2 replies; 60+ messages in thread
From: Pieter Wuille @ 2018-06-03  6:11 UTC (permalink / raw)
  To: Tamas Blummer, Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

On Sat, Jun 2, 2018, 22:56 Tamas Blummer via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Lighter but SPV secure nodes (filter committed) would help the network
> (esp. Layer 2) to grow mesh like, but add more user that blindly follow POW.
>
> On longer term most users' security will be determined by either trusted
> hubs or POW.
> I do not know which is worse, but we should at least offer the choice to
> the user, therefore commit filters.
>

I don't think that's the point of discussion here. Of course, in order to
have filters that verifiably don't lie by omission, the filters need to be
committed to by blocks.

The question is what data that filter should contain.

There are two suggestions:
(a) The scriptPubKeys of the block's outputs, and prevouts of the block's
inputs.
(b) The scriptPubKeys of the block's outputs, and scriptPubKeys of outputs
being spent by the block's inputs.

The advantage of (a) is that it can be verified against a full block
without access to the outputs being spent by it. This allows light clients
to ban nodes that give them incorrect filters, but they do need to actually
see the blocks (partially defeating the purpose of having filters in the
first place).

The advantage of (b) is that it is more compact (scriot reuse, and outputs
spent within the same block as they are created). It also had the advantage
of being more easily usable for scanning of a wallet's transactions. Using
(a) for that in some cases may need to restart and refetch when an output
is discovered, to go test for its spending (whose outpoint is not known
ahead of time). Especially when fetching multiple filters at a time this
may be an issue.

I think both of these potentially good arguments. However, once a committed
filter exists, the advantage of (a) goes away completely - validation of
committed filters is trivial and can be done without needing the full
blocks in the first place.

So I think the question is do we aim for an uncommitted (a) first and a
committed (b) later, or go for (b) immediately?

Cheers,

-- 
Pieter

[-- Attachment #2: Type: text/html, Size: 3055 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-03  6:11                                                           ` Pieter Wuille
@ 2018-06-03 16:44                                                             ` Tamas Blummer
  2018-06-03 16:50                                                               ` Tamas Blummer
  2018-06-08  5:03                                                             ` Olaoluwa Osuntokun
  1 sibling, 1 reply; 60+ messages in thread
From: Tamas Blummer @ 2018-06-03 16:44 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev


[-- Attachment #1.1: Type: text/plain, Size: 402 bytes --]

I processed bitcoin history assuming filters using with P=19 M=784931.

Findings:
- Output script + spent script filters (Wuille’s (b)) have sizes of ca. 0.2% of block size.
- Output script + spent script filters (Wuille’s (b)) are ca. 10% smaller than output script + spent outpoint filters (Wuille's (a)). Savings here however trend lower since years.

Graphs attached.

Tamas Blummer


[-- Attachment #1.2.1: Type: text/html, Size: 1140 bytes --]

[-- Attachment #1.2.2: scriptfilter.png --]
[-- Type: image/png, Size: 55464 bytes --]

[-- Attachment #1.2.3: scriptssaving.png --]
[-- Type: image/png, Size: 59097 bytes --]

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 529 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-03 16:44                                                             ` Tamas Blummer
@ 2018-06-03 16:50                                                               ` Tamas Blummer
  0 siblings, 0 replies; 60+ messages in thread
From: Tamas Blummer @ 2018-06-03 16:50 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev


[-- Attachment #1.1: Type: text/plain, Size: 670 bytes --]

Correction:
- Output script + spent script filters (Wuille’s (b)) have sizes of ca. 2% of block size.

Tamas Blummer

> On Jun 3, 2018, at 18:44, Tamas Blummer <tamas.blummer@gmail.com> wrote:
> 
> I processed bitcoin history assuming filters using with P=19 M=784931.
> 
> Findings:
> - Output script + spent script filters (Wuille’s (b)) have sizes of ca. 0.2% of block size.
> - Output script + spent script filters (Wuille’s (b)) are ca. 10% smaller than output script + spent outpoint filters (Wuille's (a)). Savings here however trend lower since years.
> 
> Graphs attached.
> 
> Tamas Blummer
> 
> <scriptfilter.png><scriptssaving.png>


[-- Attachment #1.2: Type: text/html, Size: 1913 bytes --]

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 529 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-05-18  6:28 ` Karl-Johan Alm
@ 2018-06-04  8:42   ` Riccardo Casatta
  2018-06-05  1:08     ` Jim Posen
  2018-06-06  1:12     ` Olaoluwa Osuntokun
  0 siblings, 2 replies; 60+ messages in thread
From: Riccardo Casatta @ 2018-06-04  8:42 UTC (permalink / raw)
  To: karljohan-alm, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2047 bytes --]

I was wondering why this multi-layer multi-block filter proposal isn't
getting any comment,
is it because not asking all filters is leaking information?

Thanks

Il giorno ven 18 mag 2018 alle ore 08:29 Karl-Johan Alm via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> ha scritto:

> On Fri, May 18, 2018 at 12:25 AM, Matt Corallo via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > In general, I'm concerned about the size of the filters making existing
> > SPV clients less willing to adopt BIP 158 instead of the existing bloom
> > filter garbage and would like to see a further exploration of ways to
> > split out filters to make them less bandwidth intensive. Some further
> > ideas we should probably play with before finalizing moving forward is
> > providing filters for certain script templates, eg being able to only
> > get outputs that are segwit version X or other similar ideas.
>
> There is also the idea of multi-block filters. The idea is that light
> clients would download a pair of filters for blocks X..X+255 and
> X+256..X+511, check if they have any matches and then grab pairs for
> any that matched, e.g. X..X+127 & X+128..X+255 if left matched, and
> iterate down until it ran out of hits-in-a-row or it got down to
> single-block level.
>
> This has an added benefit where you can accept a slightly higher false
> positive rate for bigger ranges, because the probability of a specific
> entry having a false positive in each filter is (empirically speaking)
> independent. I.e. with a FP probability of 1% in the 256 range block
> and a FP probability of 0.1% in the 128 range block would mean the
> probability is actually 0.001%.
>
> Wrote about this here: https://bc-2.jp/bfd-profile.pdf (but the filter
> type is different in my experiments)
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>


-- 
Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>

[-- Attachment #2: Type: text/html, Size: 3161 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-04  8:42   ` Riccardo Casatta
@ 2018-06-05  1:08     ` Jim Posen
  2018-06-05  4:33       ` Karl-Johan Alm
  2018-06-05 17:52       ` Gregory Maxwell
  2018-06-06  1:12     ` Olaoluwa Osuntokun
  1 sibling, 2 replies; 60+ messages in thread
From: Jim Posen @ 2018-06-05  1:08 UTC (permalink / raw)
  To: Riccardo Casatta, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

>
> I was wondering why this multi-layer multi-block filter proposal isn't
> getting any comment,
> is it because not asking all filters is leaking information?
>

It's an interesting idea, but it adds more complexity to the client and
could be added later on if clients adopt BIP 157 and complain about
bandwidth. It also derives all bandwidth gains from address reuse. So I'm
hesitant to make the complexity tradeoff for bandwidth savings due to a
behavior that is actively discouraged.

On another note, I've been thinking that block TXO commitments could
resolve the issue we are facing now with deciding between the prev script
approach and outpoint. The whole argument for outpoints is that there are
compact-ish (<1 MiB) proofs of filter validity, which is not currently
possible if the filters included prev output data. Such proofs would be
feasible if blocks headers (well, actually coinbase txs) had a commitment
to the Merkle root of all newly created outputs in the block.

This idea has been tossed around before in the context of fraud proofs and
TXO bitfields, and seems to unlock a whole bunch of other P2P commitments.
For example, if we wanted to do P2P commitments (BIP 157-style) to the
distribution of tx fees in a block, one could use block TXO commitments to
prove correctness of fees for non-segwit txs. It also enables block
validity proofs (assuming parent blocks are valid), which are not as
powerful as invalidity/fraud proofs, but interesting nonetheless.

This would require a new getdata type BLOCK_WITH_PREVOUTS or something. I
assume for most coinbase-tx-committed proposals, we'll also need a new
getcoinbases/coinbases that requests the coinbase tx and Merkle branch for
a range of headers as well. But with these additions, we could start
serving more block-derived data to light clients under the BIP 157
at-least-one-honest-peer assumption.

[-- Attachment #2: Type: text/html, Size: 5040 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-05  1:08     ` Jim Posen
@ 2018-06-05  4:33       ` Karl-Johan Alm
  2018-06-05 17:22         ` Jim Posen
  2018-06-05 17:52       ` Gregory Maxwell
  1 sibling, 1 reply; 60+ messages in thread
From: Karl-Johan Alm @ 2018-06-05  4:33 UTC (permalink / raw)
  To: Jim Posen; +Cc: Bitcoin Protocol Discussion

On Tue, Jun 5, 2018 at 10:08 AM, Jim Posen <jim.posen@gmail.com> wrote:
> It also derives all bandwidth gains from address reuse. So I'm
> hesitant to make the complexity tradeoff for bandwidth savings due to a
> behavior that is actively discouraged.

I don't understand this comment. The bandwidth gains are not from
address reuse, they are from the observed property that false
positives are independent between two filters. I.e. clients that
connect once a day will probably download 2-3 filters at most, if they
had nothing relevant in the last ~144 blocks.

-Kalle.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-05  4:33       ` Karl-Johan Alm
@ 2018-06-05 17:22         ` Jim Posen
  0 siblings, 0 replies; 60+ messages in thread
From: Jim Posen @ 2018-06-05 17:22 UTC (permalink / raw)
  To: Karl Johan Alm; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

>
> I don't understand this comment. The bandwidth gains are not from
> address reuse, they are from the observed property that false
> positives are independent between two filters. I.e. clients that
> connect once a day will probably download 2-3 filters at most, if they
> had nothing relevant in the last ~144 blocks.
>

Your multi-layer digest proposal (https://bc-2.jp/bfd-profile.pdf) uses a
different type of filter which seems more like a compressed Bloom filter if
I understand it correctly. Appendix A shows how the FP rate increases with
the number of elements.

With the Golomb-Coded Sets, the filter size increases linearly in the
number of elements for a fixed FP rate. So currently we are targeting an
~1/2^20 rate (actually 1/784931 now), and filter sizes are ~20 bits * N for
N elements. With a 1-layer digest covering let's say 16 blocks, you could
drop the FP rate on the digest filters and the block filters each to ~10
bits per element, I think, to get the same FP rate for a given block by
your argument of independence. But the digest is only half the size of the
16 combined filters and there's a high probability of downloading the other
half anyway. So unless there is greater duplication of elements in the
digest filters, it's not clear to me that there are great bandwidth
savings. But maybe there are. Even so, I think we should just ship the
block filters and consider multi-layer digests later.

[-- Attachment #2: Type: text/html, Size: 2134 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-05  1:08     ` Jim Posen
  2018-06-05  4:33       ` Karl-Johan Alm
@ 2018-06-05 17:52       ` Gregory Maxwell
  1 sibling, 0 replies; 60+ messages in thread
From: Gregory Maxwell @ 2018-06-05 17:52 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

On Tue, Jun 5, 2018 at 1:08 AM, Jim Posen via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> hesitant to make the complexity tradeoff for bandwidth savings due to a
> behavior that is actively discouraged.

As an important point of clarification here. If scripts are used to
identify inputs and outputs, then no use is required for that savings.
Each coin spent was created once, so in an absurd hypothetical you can
get a 2:1 change in bits set without any reuse at all.   I don't know
what portion of coins created are spent in the same 144 block
window...


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-04  8:42   ` Riccardo Casatta
  2018-06-05  1:08     ` Jim Posen
@ 2018-06-06  1:12     ` Olaoluwa Osuntokun
  2018-06-06 15:14       ` Riccardo Casatta
  1 sibling, 1 reply; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-06  1:12 UTC (permalink / raw)
  To: Riccardo Casatta, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]

It isn't being discussed atm (but was discussed 1 year ago when the BIP
draft was originally published), as we're in the process of removing items
or filters that aren't absolutely necessary. We're now at the point where
there're no longer any items we can remove w/o making the filters less
generally useful which signals a stopping point so we can begin widespread
deployment.

In terms of a future extension, BIP 158 already defines custom filter types,
and BIP 157 allows filters to be fetched in batch based on the block height
and numerical range. The latter feature can later be modified to return a
single composite filter rather than several individual filters.

-- Laolu


On Mon, Jun 4, 2018 at 7:28 AM Riccardo Casatta via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> I was wondering why this multi-layer multi-block filter proposal isn't
> getting any comment,
> is it because not asking all filters is leaking information?
>
> Thanks
>
> Il giorno ven 18 mag 2018 alle ore 08:29 Karl-Johan Alm via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> ha scritto:
>
>> On Fri, May 18, 2018 at 12:25 AM, Matt Corallo via bitcoin-dev
>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>> > In general, I'm concerned about the size of the filters making existing
>> > SPV clients less willing to adopt BIP 158 instead of the existing bloom
>> > filter garbage and would like to see a further exploration of ways to
>> > split out filters to make them less bandwidth intensive. Some further
>> > ideas we should probably play with before finalizing moving forward is
>> > providing filters for certain script templates, eg being able to only
>> > get outputs that are segwit version X or other similar ideas.
>>
>> There is also the idea of multi-block filters. The idea is that light
>> clients would download a pair of filters for blocks X..X+255 and
>> X+256..X+511, check if they have any matches and then grab pairs for
>> any that matched, e.g. X..X+127 & X+128..X+255 if left matched, and
>> iterate down until it ran out of hits-in-a-row or it got down to
>> single-block level.
>>
>> This has an added benefit where you can accept a slightly higher false
>> positive rate for bigger ranges, because the probability of a specific
>> entry having a false positive in each filter is (empirically speaking)
>> independent. I.e. with a FP probability of 1% in the 256 range block
>> and a FP probability of 0.1% in the 128 range block would mean the
>> probability is actually 0.001%.
>>
>> Wrote about this here: https://bc-2.jp/bfd-profile.pdf (but the filter
>> type is different in my experiments)
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>
>
> --
> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 4809 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-06  1:12     ` Olaoluwa Osuntokun
@ 2018-06-06 15:14       ` Riccardo Casatta
  0 siblings, 0 replies; 60+ messages in thread
From: Riccardo Casatta @ 2018-06-06 15:14 UTC (permalink / raw)
  To: laolu32; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 6184 bytes --]

Sorry if I continue on the subject even if
​custom filter types are considered in BIP 157/158
.
I am doing it
 because
​:
1)​
with a fixed target FP=2^-20  (or 1/784931)
​ and the multi layer filtering maybe it's reasonable to consider less than
~20 bits for the golomb encoding of the per-block filter (one day committed
in the blockchain)
2) based on the answer received, privacy leak if downloading a subset of
filters doesn't look a concern
3)
As far as I know, anyone is considering to use a map instead of a filter
for the upper layers of the filter​.

Simplistic example:
Suppose to have a 2 blocks blockchain, every block contains N items for the
filter:
1) In the current discussed filter we have 2 filters of 20N bits
2) In a two layer solution, we have 1 map of (10+1)2N bits and 2 filters of
10N bits
The additional bit in the map discriminate if the match is in the first or
in the second block.
Supposing to have 1 match in the two blocks, the filter size downloaded in
the first case is always 40N bits, while the expected downloaded size in
the second case is 22N+2^-10*10N+10N ~= 32N with the same FP because
independence.
This obviously isn't a full analysis of the methodology, the expected
downloaded size in the second case could go from the best case 22N bits to
the worst case of 42N bits...

@Gregory
> I don't know what portion of coins created are spent in the same 144 block
window...

About 50%
source code <https://github.com/RCasatta/coincount>

From block 393216 to 458752  (still waiting for results on all the
blockchain)
Total outputs 264185587
size: 2 spent: 11791058 ratio:0.04463172322871649
size: 4 spent: 29846090 ratio:0.11297395266305728
size: 16 spent: 72543182 ratio:0.2745917475051355
size: 64 spent: 113168726 ratio:0.4283682818775424
size: 144 spent: 134294070 ratio:0.508332311103709
size: 256 spent: 148824781 ratio:0.5633342177747191
size: 1024 spent: 179345566 ratio:0.6788620379960395
size: 4096 spent: 205755628 ratio:0.7788298761355213
size: 16384 spent: 224448158 ratio:0.849585174379706

Another point to consider is that if we don't want the full transaction
history of our wallet but only the UTXO, the upper layer map could contain
only the item which are not already spent in the considered window. As we
can see from the previous result if the window is 16384 ~85% of the
elements are already spent suggesting a very high time locality. (apart
144, I choose power of 2 windows so there are an integer number of bits in
the map)

It's possible we need ~20 bits anyway for the per-block filters because
there are always connected wallets which one synced, always download the
last filter, anyway the upper layer map looks very promising for longer
sync.

Il giorno mer 6 giu 2018 alle ore 03:13 Olaoluwa Osuntokun <
laolu32@gmail.com> ha scritto:

> It isn't being discussed atm (but was discussed 1 year ago when the BIP
> draft was originally published), as we're in the process of removing items
> or filters that aren't absolutely necessary. We're now at the point where
> there're no longer any items we can remove w/o making the filters less
> generally useful which signals a stopping point so we can begin widespread
> deployment.
>
> In terms of a future extension, BIP 158 already defines custom filter
> types,
> and BIP 157 allows filters to be fetched in batch based on the block height
> and numerical range. The latter feature can later be modified to return a
> single composite filter rather than several individual filters.
>
> -- Laolu
>
>
> On Mon, Jun 4, 2018 at 7:28 AM Riccardo Casatta via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> I was wondering why this multi-layer multi-block filter proposal isn't
>> getting any comment,
>> is it because not asking all filters is leaking information?
>>
>> Thanks
>>
>> Il giorno ven 18 mag 2018 alle ore 08:29 Karl-Johan Alm via bitcoin-dev <
>> bitcoin-dev@lists.linuxfoundation.org> ha scritto:
>>
>>> On Fri, May 18, 2018 at 12:25 AM, Matt Corallo via bitcoin-dev
>>> <bitcoin-dev@lists.linuxfoundation.org> wrote:
>>> > In general, I'm concerned about the size of the filters making existing
>>> > SPV clients less willing to adopt BIP 158 instead of the existing bloom
>>> > filter garbage and would like to see a further exploration of ways to
>>> > split out filters to make them less bandwidth intensive. Some further
>>> > ideas we should probably play with before finalizing moving forward is
>>> > providing filters for certain script templates, eg being able to only
>>> > get outputs that are segwit version X or other similar ideas.
>>>
>>> There is also the idea of multi-block filters. The idea is that light
>>> clients would download a pair of filters for blocks X..X+255 and
>>> X+256..X+511, check if they have any matches and then grab pairs for
>>> any that matched, e.g. X..X+127 & X+128..X+255 if left matched, and
>>> iterate down until it ran out of hits-in-a-row or it got down to
>>> single-block level.
>>>
>>> This has an added benefit where you can accept a slightly higher false
>>> positive rate for bigger ranges, because the probability of a specific
>>> entry having a false positive in each filter is (empirically speaking)
>>> independent. I.e. with a FP probability of 1% in the 256 range block
>>> and a FP probability of 0.1% in the 128 range block would mean the
>>> probability is actually 0.001%.
>>>
>>> Wrote about this here: https://bc-2.jp/bfd-profile.pdf (but the filter
>>> type is different in my experiments)
>>> _______________________________________________
>>> bitcoin-dev mailing list
>>> bitcoin-dev@lists.linuxfoundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>
>>
>> --
>> Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>

-- 
Riccardo Casatta - @RCasatta <https://twitter.com/RCasatta>

[-- Attachment #2: Type: text/html, Size: 16334 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-03  6:11                                                           ` Pieter Wuille
  2018-06-03 16:44                                                             ` Tamas Blummer
@ 2018-06-08  5:03                                                             ` Olaoluwa Osuntokun
  2018-06-08 16:14                                                               ` Gregory Maxwell
  1 sibling, 1 reply; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-08  5:03 UTC (permalink / raw)
  To: Pieter Wuille, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4951 bytes --]

Hi sipa,

> The advantage of (a) is that it can be verified against a full block
without
> access to the outputs being spent by it
>
> The advantage of (b) is that it is more compact (scriot reuse, and outputs
> spent within the same block as they are created).

Thanks for this breakdown. I think you've accurately summarized the sole
remaining discussing point in this thread.

As someone who's written and reviews code integrating the proposal all the
way up the stack (from node to wallet, to application), IMO, there's no
immediate cost to deferring the inclusion/creation of a filter that includes
prev scripts (b) instead of the outpoint as the "regular" filter does now.
Switching to prev script in the _short term_ would be costly for the set of
applications already deployed (or deployed in a minimal or flag flip gated
fashion) as the move from prev script to outpoint is a cascading one that
impacts wallet operation, rescans, HD seed imports, etc.

Maintaining the outpoint also allows us to rely on a "single honest peer"
security model in the short term. In the long term the main barrier to
committing the filters isn't choosing what to place in the filters (as once
you have the gcs code, adding/removing elements is a minor change), but the
actual proposal to add new consensus enforced commitments to Bitcoin in the
first place. Such a proposal would need to be generalized enough to allow
several components to be committed, likely have versioning, and also provide
the necessary extensibility to allow additional items to be committed in the
future. To my knowledge no such soft-fork has yet been proposed in a serious
manner, although we have years of brainstorming on the topic. The timeline
of the drafting, design, review, and deployment of such a change would
likely be measures in years, compared to the immediate deployment of the
current p2p filter model proposed in the BIP.

As a result, I see no reason to delay the p2p filter deployment (with the
outpoint) in the short term, as the long lead time a soft-fork to add
extensible commitments to Bitcoin would give application+wallet authors
ample time to switch to the new model. Also there's no reason that full-node
wallets which wish to primarily use the filters for rescan purposes can't
just construct them locally for this particular use case independent of
what's currently deployed on the p2p network.

Finally, I've addressed the remaining comments on my PR modifying the BIP
from my last message.

-- Laolu

On Sat, Jun 2, 2018 at 11:12 PM Pieter Wuille via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Sat, Jun 2, 2018, 22:56 Tamas Blummer via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> Lighter but SPV secure nodes (filter committed) would help the network
>> (esp. Layer 2) to grow mesh like, but add more user that blindly follow POW.
>>
>> On longer term most users' security will be determined by either trusted
>> hubs or POW.
>> I do not know which is worse, but we should at least offer the choice to
>> the user, therefore commit filters.
>>
>
> I don't think that's the point of discussion here. Of course, in order to
> have filters that verifiably don't lie by omission, the filters need to be
> committed to by blocks.
>
> The question is what data that filter should contain.
>
> There are two suggestions:
> (a) The scriptPubKeys of the block's outputs, and prevouts of the block's
> inputs.
> (b) The scriptPubKeys of the block's outputs, and scriptPubKeys of outputs
> being spent by the block's inputs.
>
> The advantage of (a) is that it can be verified against a full block
> without access to the outputs being spent by it. This allows light clients
> to ban nodes that give them incorrect filters, but they do need to actually
> see the blocks (partially defeating the purpose of having filters in the
> first place).
>
> The advantage of (b) is that it is more compact (scriot reuse, and outputs
> spent within the same block as they are created). It also had the advantage
> of being more easily usable for scanning of a wallet's transactions. Using
> (a) for that in some cases may need to restart and refetch when an output
> is discovered, to go test for its spending (whose outpoint is not known
> ahead of time). Especially when fetching multiple filters at a time this
> may be an issue.
>
> I think both of these potentially good arguments. However, once a
> committed filter exists, the advantage of (a) goes away completely -
> validation of committed filters is trivial and can be done without needing
> the full blocks in the first place.
>
> So I think the question is do we aim for an uncommitted (a) first and a
> committed (b) later, or go for (b) immediately?
>
> Cheers,
>
> --
> Pieter
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 6939 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-08  5:03                                                             ` Olaoluwa Osuntokun
@ 2018-06-08 16:14                                                               ` Gregory Maxwell
  2018-06-08 23:35                                                                 ` Olaoluwa Osuntokun
  0 siblings, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-06-08 16:14 UTC (permalink / raw)
  To: Olaoluwa Osuntokun, Bitcoin Protocol Discussion

On Fri, Jun 8, 2018 at 5:03 AM, Olaoluwa Osuntokun via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> As someone who's written and reviews code integrating the proposal all the
> way up the stack (from node to wallet, to application), IMO, there's no
> immediate cost to deferring the inclusion/creation of a filter that includes
> prev scripts (b) instead of the outpoint as the "regular" filter does now.
> Switching to prev script in the _short term_ would be costly for the set of
> applications already deployed (or deployed in a minimal or flag flip gated
> fashion) as the move from prev script to outpoint is a cascading one that
> impacts wallet operation, rescans, HD seed imports, etc.

It seems to me that you're making the argument against your own case
here: I'm reading this as a "it's hard to switch so it should be done
the inferior way".  That in argument against adopting the inferior
version, as that will contribute more momentum to doing it in a way
that doesn't make sense long term.

> Such a proposal would need to be generalized enough to allow several components to be committed,

I don't agree at all, and I can't see why you say so.

> likely have versioning,

This is inherent in how e.g. the segwit commitment is encoded, the
initial bytes are an identifying cookies. Different commitments would
have different cookies.

> and also provide the necessary extensibility to allow additional items to be committed in the future

What was previously proposed is that the commitment be required to be
consistent if present but not be required to be present.  This would
allow changing whats used by simply abandoning the old one.  Sparsity
in an optional commitment can be addressed when there is less than
100% participation by having each block that includes a commitment
commit to the missing filters ones from their immediate ancestors.

Additional optionality can be provided by the other well known
mechanisms,  e.g. have the soft fork expire at a block 5 years out
past deployment, and continue to soft-fork it in for a longer term so
long as its in use (or eventually without expiration if its clear that
it's not going away).

> wallets which wish to primarily use the filters for rescan purposes can't
> just construct them locally for this particular use case independent of
> what's currently deployed on the p2p network.

Absolutely, but given the failure of BIP37 on the network-- and the
apparent strong preference of end users for alternatives that don't
scan (e.g. electrum and web wallets)-- supporting making this
available via P2P was already only interesting to many as a nearly
free side effect of having filters for local scanning.  If it's a
different filter, it's no longer attractive.

It seems to me that some people have forgotten that this whole idea
was originally proposed to be a committed data-- but with an added
advantage of permitting expirementation ahead of the commitment.

> Maintaining the outpoint also allows us to rely on a "single honest peer"security model in the short term.

You can still scan blocks directly when peers disagree on the filter
content, regardless of how the filter is constructed-- yes, it uses
more bandwidth if you're attacked, but it makes the attack ineffective
and using outpoints considerably increases bandwidth for everyone
without an attack.  These ineffective (except for increasing
bandwidth) attacks would have to be common to offset the savings. It
seems to me this point is being overplayed, especially considering the
current state of non-existing validation in SPV software (if SPV
software doesn't validate anything else they could be validating, why
would they implement a considerable amount of logic for this?).


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-08 16:14                                                               ` Gregory Maxwell
@ 2018-06-08 23:35                                                                 ` Olaoluwa Osuntokun
  2018-06-09 10:34                                                                   ` David A. Harding
  2018-06-09 15:45                                                                   ` Gregory Maxwell
  0 siblings, 2 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-08 23:35 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 10030 bytes --]

> That in argument against adopting the inferior version, as that will
> contribute more momentum to doing it in a way that doesn't make sense long
> term.

That was moreso an attempt at a disclosure, rather than may argument. But
also as noted further up in the thread, both approaches have a trade off:
one is better for light clients in a p2p "one honest peer mode", while the
other is more compact, but is less verifiable for the light clients. They're
"inferior" in different ways.

My argument goes more like: moving to prev scripts means clients cannot
verify in full unless a block message is added to include the prev outs.
This is a downgrade assuming a "one honest peer" model for the p2p
interactions. A commitment removes this drawback, but ofc requires a soft
fork. Soft forks take a "long" time to deploy. So what's the cost in using
the current filter (as it lets the client verify the filter if they want to,
or in an attempted "bamboozlement" scenario) in the short term (as we don't
yet have a proposal for committing the filters) which would allow us to
experiment more with the technique on mainnet before making the step up to
committing the filter. Also, depending on the way the commitment is done,
the filters themselves would need to be modified.

> I don't agree at all, and I can't see why you say so.

Sure it doesn't _have_ to, but from my PoV as "adding more commitments" is
on the top of every developers wish list for additions to Bitcoin, it would
make sense to coordinate on an "ultimate" extensible commitment once, rather
than special case a bunch of distinct commitments. I can see arguments for
either really.

> This is inherent in how e.g. the segwit commitment is encoded, the initial
> bytes are an identifying cookies. Different commitments would have
different
> cookies.

Indeed, if the filter were to be committed, using an output on the coinbase
would be a likely candidate. However, I see two issues with this:

  1. The current filter format (even moving to prevouts) cannot be committed
     in this fashion as it indexes each of the coinbase output scripts. This
     creates a circular dependency: the commitment is modified by the
     filter, which is modified by the commitment (the filter atm indexes the
     commitment). So we'd need to add a special case to skip outputs with a
     particular witness magic. However, we don't know what that witness
     magic looks like (as there's no proposal). As a result, the type
     filters that can be served over the p2p network may be distinct from
     the type of filters that are to be committed, as the commitment may
     have an impact on the filter itself.

  2. Since the coinbase transaction is the first in a block, it has the
     longest merkle proof path. As a result, it may be several hundred bytes
     (and grows with future capacity increases) to present a proof to the
     client. Depending on the composition of blocks, this may outweigh the
     gains had from taking advantage of the additional compression the prev
     outs allow.

In regards to the second item above, what do you think of the old Tier Nolan
proposal [1] to create a "constant" sized proof for future commitments by
constraining the size of the block and placing the commitments within the
last few transactions in the block?

> but with an added advantage of permitting expirementation ahead of the
> commitment.

Indeed! To my knowledge, lnd is the only software deployed that even has
code to experiment with the filtering proposal in general. Also, as I
pointed out above, we may require an additional modification in order to be
able to commit the filter. The nature of that modification may depend on how
the filter is to be committed. As a result, why hinder experimentation today
(since it might need to be changed anyway, and as you point out the filter
being committed can even be swapped) by delaying until we know what the
commitment will look like?

> You can still scan blocks directly when peers disagree on the filter
> content, regardless of how the filter is constructed

But the difference is that one options lets you fully construct the filter
from a block, while the other requires additional data.

> but it makes the attack ineffective and using outpoints considerably
increases
> bandwidth for everyone without an attack

So should we optimize for the ability to validate in a particular model
(better
security), or lower bandwidth in this case? It may also be the case that the
overhead of receiving proofs of the commitment outweigh the savings
depending
on block composition (ofc entire block that re-uses the same address is
super
small).

> It seems to me this point is being overplayed, especially considering the
> current state of non-existing validation in SPV software (if SPV software
> doesn't validate anything else they could be validating, why would they
> implement a considerable amount of logic for this?).

I don't think its fair to compare those that wish to implement this proposal
(and actually do the validation) to the legacy SPV software that to my
knowledge is all but abandoned. The project I work on that seeks to deploy
this proposal (already has, but mainnet support is behind a flag as I
anticipated further modifications) indeed has implemented the "considerable"
amount of logic to check for discrepancies and ban peers trying to bamboozle
the light clients. I'm confident that the other projects seeking to
implement
this (rust-bitcoin-spv, NBitcoin, bcoin, maybe missing a few too) won't
find it
too difficult to implement "full" validation, as they're bitcoin developers
with quite a bit of experience.

I think we've all learned from the past defects of past light clients, and
don't seek to repeat history by purposefully implementing as little
validation
as possible. With these new projects by new authors, I think we have an
opprotunity to implement light clients "correctly" this time around.

[1]:
https://github.com/TierNolan/bips/blob/00a8d3e1ac066ce3728658c6c40240e1c2ab859e/bip-aux-header.mediawiki

-- Laolu


On Fri, Jun 8, 2018 at 9:14 AM Gregory Maxwell <greg@xiph.org> wrote:

> On Fri, Jun 8, 2018 at 5:03 AM, Olaoluwa Osuntokun via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > As someone who's written and reviews code integrating the proposal all
> the
> > way up the stack (from node to wallet, to application), IMO, there's no
> > immediate cost to deferring the inclusion/creation of a filter that
> includes
> > prev scripts (b) instead of the outpoint as the "regular" filter does
> now.
> > Switching to prev script in the _short term_ would be costly for the set
> of
> > applications already deployed (or deployed in a minimal or flag flip
> gated
> > fashion) as the move from prev script to outpoint is a cascading one that
> > impacts wallet operation, rescans, HD seed imports, etc.
>
> It seems to me that you're making the argument against your own case
> here: I'm reading this as a "it's hard to switch so it should be done
> the inferior way".  That in argument against adopting the inferior
> version, as that will contribute more momentum to doing it in a way
> that doesn't make sense long term.
>
> > Such a proposal would need to be generalized enough to allow several
> components to be committed,
>
> I don't agree at all, and I can't see why you say so.
>
> > likely have versioning,
>
> This is inherent in how e.g. the segwit commitment is encoded, the
> initial bytes are an identifying cookies. Different commitments would
> have different cookies.
>
> > and also provide the necessary extensibility to allow additional items
> to be committed in the future
>
> What was previously proposed is that the commitment be required to be
> consistent if present but not be required to be present.  This would
> allow changing whats used by simply abandoning the old one.  Sparsity
> in an optional commitment can be addressed when there is less than
> 100% participation by having each block that includes a commitment
> commit to the missing filters ones from their immediate ancestors.
>
> Additional optionality can be provided by the other well known
> mechanisms,  e.g. have the soft fork expire at a block 5 years out
> past deployment, and continue to soft-fork it in for a longer term so
> long as its in use (or eventually without expiration if its clear that
> it's not going away).
>
> > wallets which wish to primarily use the filters for rescan purposes can't
> > just construct them locally for this particular use case independent of
> > what's currently deployed on the p2p network.
>
> Absolutely, but given the failure of BIP37 on the network-- and the
> apparent strong preference of end users for alternatives that don't
> scan (e.g. electrum and web wallets)-- supporting making this
> available via P2P was already only interesting to many as a nearly
> free side effect of having filters for local scanning.  If it's a
> different filter, it's no longer attractive.
>
> It seems to me that some people have forgotten that this whole idea
> was originally proposed to be a committed data-- but with an added
> advantage of permitting expirementation ahead of the commitment.
>
> > Maintaining the outpoint also allows us to rely on a "single honest
> peer"security model in the short term.
>
> You can still scan blocks directly when peers disagree on the filter
> content, regardless of how the filter is constructed-- yes, it uses
> more bandwidth if you're attacked, but it makes the attack ineffective
> and using outpoints considerably increases bandwidth for everyone
> without an attack.  These ineffective (except for increasing
> bandwidth) attacks would have to be common to offset the savings. It
> seems to me this point is being overplayed, especially considering the
> current state of non-existing validation in SPV software (if SPV
> software doesn't validate anything else they could be validating, why
> would they implement a considerable amount of logic for this?).
>

[-- Attachment #2: Type: text/html, Size: 12298 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-08 23:35                                                                 ` Olaoluwa Osuntokun
@ 2018-06-09 10:34                                                                   ` David A. Harding
  2018-06-12 23:51                                                                     ` Olaoluwa Osuntokun
  2018-06-09 15:45                                                                   ` Gregory Maxwell
  1 sibling, 1 reply; 60+ messages in thread
From: David A. Harding @ 2018-06-09 10:34 UTC (permalink / raw)
  To: Olaoluwa Osuntokun, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3476 bytes --]

On Fri, Jun 08, 2018 at 04:35:29PM -0700, Olaoluwa Osuntokun via bitcoin-dev wrote:
>   2. Since the coinbase transaction is the first in a block, it has the
>      longest merkle proof path. As a result, it may be several hundred bytes
>      (and grows with future capacity increases) to present a proof to the
>      client.

I'm not sure why commitment proof size is a significant issue.  Doesn't
the current BIP157 protocol have each filter commit to the filter for
the previous block?  If that's the case, shouldn't validating the
commitment at the tip of the chain (or buried back whatever number of
blocks that the SPV client trusts) obliviate the need to validate the
commitments for any preceeding blocks in the SPV trust model?

> Depending on the composition of blocks, this may outweigh the gains
> had from taking advantage of the additional compression the prev outs
> allow.

I think those are unrelated points.  The gain from using a more
efficient filter is saved bytes.  The gain from using block commitments
is SPV-level security---that attacks have a definite cost in terms of
generating proof of work instead of the variable cost of network
compromise (which is effectively free in many situations).

Comparing the extra bytes used by block commitments to the reduced bytes
saved by prevout+output filters is like comparing the extra bytes used
to download all blocks for full validation to the reduced bytes saved by
only checking headers and merkle inclusion proofs in simplified
validation.  Yes, one uses more bytes than the other, but they're
completely different security models and so there's no normative way for
one to "outweigh the gains" from the other.

> So should we optimize for the ability to validate in a particular
> model (better security), or lower bandwidth in this case?

It seems like you're claiming better security here without providing any
evidence for it.  The security model is "at least one of my peers is
honest."  In the case of outpoint+output filters, when a client receives
advertisements for different filters from different peers, it:

    1. Downloads the corresponding block
    2. Locally generates the filter for that block
    3. Kicks any peers that advertised a different filter than what it
       generated locally

This ensures that as long as the client has at least one honest peer, it
will see every transaction affecting its wallet.  In the case of
prevout+output filters, when a client receives advertisements for
different filters from different peers, it:

    1. Downloads the corresponding block and checks it for wallet
       transactions as if there had been a filter match

This also ensures that as long as the client has at least one honest
peer, it will see every transaction affecting its wallet.  This is
equivilant security.

In the second case, it's possible for the client to eventually
probabalistically determine which peer(s) are dishonest and kick them.
The most space efficient of these protocols may disclose some bits of
evidence for what output scripts the client is looking for, but a
slightly less space-efficient protocol simply uses randomly-selected
outputs saved from previous blocks to make the probabalistic
determination (rather than the client's own outputs) and so I think
should be quite private.  Neither protocol seems significantly more
complicated than keeping an associative array recording the number of
false positive matches for each peer's filters.

-Dave

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-08 23:35                                                                 ` Olaoluwa Osuntokun
  2018-06-09 10:34                                                                   ` David A. Harding
@ 2018-06-09 15:45                                                                   ` Gregory Maxwell
  2018-06-12 23:58                                                                     ` Olaoluwa Osuntokun
  1 sibling, 1 reply; 60+ messages in thread
From: Gregory Maxwell @ 2018-06-09 15:45 UTC (permalink / raw)
  To: Olaoluwa Osuntokun; +Cc: Bitcoin Protocol Discussion

> So what's the cost in using
> the current filter (as it lets the client verify the filter if they want to,

An example of that cost is you arguing against specifying and
supporting the design that is closer to one that would be softforked,
which increases the time until we can make these filters secure
because it slows convergence on the design of what would get
committed.

>> I don't agree at all, and I can't see why you say so.
>
> Sure it doesn't _have_ to, but from my PoV as "adding more commitments" is
> on the top of every developers wish list for additions to Bitcoin, it would
> make sense to coordinate on an "ultimate" extensible commitment once, rather
> than special case a bunch of distinct commitments. I can see arguments for
> either really.

We have an extensible commitment style via BIP141 already. I don't see
why this in particular demands a new one.

>   1. The current filter format (even moving to prevouts) cannot be committed
>      in this fashion as it indexes each of the coinbase output scripts. This
>      creates a circular dependency: the commitment is modified by the
>      filter,

Great point, but it should probably exclude coinbase OP_RETURN output.
This would exclude the current BIP141 style commitment and likely any
other.

Should I start a new thread on excluding all OP_RETURN outputs from
BIP-158 filters for all transactions? -- they can't be spent, so
including them just pollutes the filters.

>   2. Since the coinbase transaction is the first in a block, it has the
>      longest merkle proof path. As a result, it may be several hundred bytes
>      (and grows with future capacity increases) to present a proof to the

If 384 bytes is a concern, isn't 3840 bytes (the filter size
difference is in this ballpark) _much_ more of a concern?  Path to the
coinbase transaction increases only logarithmically so further
capacity increases are unlikely to matter much, but the filter size
increases linearly and so it should be much more of a concern.

> In regards to the second item above, what do you think of the old Tier Nolan
> proposal [1] to create a "constant" sized proof for future commitments by
> constraining the size of the block and placing the commitments within the
> last few transactions in the block?

I think it's a fairly ugly hack. esp since it requires that mining
template code be able to stuff the block if they just don't know
enough actual transactions-- which means having a pool of spendable
outputs in order to mine, managing private keys, etc... it also
requires downstream software not tinker with the transaction count
(which I wish it didn't but as of today it does). A factor of two
difference in capacity-- if you constrain to get the smallest possible
proof-- is pretty stark, optimal txn selection with this cardinality
constraint would be pretty weird. etc.

If the community considers tree depth for proofs like that to be such
a concern to take on technical debt for that structure, we should
probably be thinking about more drastic (incompatible) changes... but
I don't think it's actually that interesting.

> I don't think its fair to compare those that wish to implement this proposal
> (and actually do the validation) to the legacy SPV software that to my
> knowledge is all but abandoned. The project I work on that seeks to deploy

Yes, maybe it isn't.  But then that just means we don't have good information.

When a lot of people were choosing electrum over SPV wallets when
those SPV wallets weren't abandoned, sync time was frequently cited as
an actual reason. BIP158 makes that worse, not better.   So while I'm
hopeful, I'm also somewhat sceptical.  Certainly things that reduce
the size of the 158 filters make them seem more likely to be a success
to me.

> too difficult to implement "full" validation, as they're bitcoin developers
> with quite a bit of experience.

::shrugs:: Above you're also arguing against fetching down to the
coinbase transaction to save a couple hundred bytes a block, which
makes it impossible to validate a half dozen other things (including
as mentioned in the other threads depth fidelity of returned proofs).
There are a lot of reasons why things don't get implemented other than
experience! :)


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-09 10:34                                                                   ` David A. Harding
@ 2018-06-12 23:51                                                                     ` Olaoluwa Osuntokun
  0 siblings, 0 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-12 23:51 UTC (permalink / raw)
  To: David A. Harding; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4891 bytes --]

> Doesn't the current BIP157 protocol have each filter commit to the filter
> for the previous block?

Yep!

> If that's the case, shouldn't validating the commitment at the tip of the
> chain (or buried back whatever number of blocks that the SPV client
trusts)
> obliviate the need to validate the commitments for any preceeding blocks
in
> the SPV trust model?

Yeah, just that there'll be a gap between the p2p version, and when it's
ultimately committed.

> It seems like you're claiming better security here without providing any
> evidence for it.

What I mean is that one allows you to fully verify the filter, while the
other allows you to only validate a portion of the filter and requires other
added heuristics.

> In the case of prevout+output filters, when a client receives
advertisements
> for different filters from different peers, it:

Alternatively, they can decompress the filter and at least verify that
proper _output scripts_ have been included. Maybe this is "good enough"
until its committed. If a command is added to fetch all the prev outs along
w/ a block (which would let you do another things like verify fees), then
they'd be able to fully validate the filter as well.

-- Laolu


On Sat, Jun 9, 2018 at 3:35 AM David A. Harding <dave@dtrt.org> wrote:

> On Fri, Jun 08, 2018 at 04:35:29PM -0700, Olaoluwa Osuntokun via
> bitcoin-dev wrote:
> >   2. Since the coinbase transaction is the first in a block, it has the
> >      longest merkle proof path. As a result, it may be several hundred
> bytes
> >      (and grows with future capacity increases) to present a proof to the
> >      client.
>
> I'm not sure why commitment proof size is a significant issue.  Doesn't
> the current BIP157 protocol have each filter commit to the filter for
> the previous block?  If that's the case, shouldn't validating the
> commitment at the tip of the chain (or buried back whatever number of
> blocks that the SPV client trusts) obliviate the need to validate the
> commitments for any preceeding blocks in the SPV trust model?
>
> > Depending on the composition of blocks, this may outweigh the gains
> > had from taking advantage of the additional compression the prev outs
> > allow.
>
> I think those are unrelated points.  The gain from using a more
> efficient filter is saved bytes.  The gain from using block commitments
> is SPV-level security---that attacks have a definite cost in terms of
> generating proof of work instead of the variable cost of network
> compromise (which is effectively free in many situations).
>
> Comparing the extra bytes used by block commitments to the reduced bytes
> saved by prevout+output filters is like comparing the extra bytes used
> to download all blocks for full validation to the reduced bytes saved by
> only checking headers and merkle inclusion proofs in simplified
> validation.  Yes, one uses more bytes than the other, but they're
> completely different security models and so there's no normative way for
> one to "outweigh the gains" from the other.
>
> > So should we optimize for the ability to validate in a particular
> > model (better security), or lower bandwidth in this case?
>
> It seems like you're claiming better security here without providing any
> evidence for it.  The security model is "at least one of my peers is
> honest."  In the case of outpoint+output filters, when a client receives
> advertisements for different filters from different peers, it:
>
>     1. Downloads the corresponding block
>     2. Locally generates the filter for that block
>     3. Kicks any peers that advertised a different filter than what it
>        generated locally
>
> This ensures that as long as the client has at least one honest peer, it
> will see every transaction affecting its wallet.  In the case of
> prevout+output filters, when a client receives advertisements for
> different filters from different peers, it:
>
>     1. Downloads the corresponding block and checks it for wallet
>        transactions as if there had been a filter match
>
> This also ensures that as long as the client has at least one honest
> peer, it will see every transaction affecting its wallet.  This is
> equivilant security.
>
> In the second case, it's possible for the client to eventually
> probabalistically determine which peer(s) are dishonest and kick them.
> The most space efficient of these protocols may disclose some bits of
> evidence for what output scripts the client is looking for, but a
> slightly less space-efficient protocol simply uses randomly-selected
> outputs saved from previous blocks to make the probabalistic
> determination (rather than the client's own outputs) and so I think
> should be quite private.  Neither protocol seems significantly more
> complicated than keeping an associative array recording the number of
> false positive matches for each peer's filters.
>
> -Dave
>

[-- Attachment #2: Type: text/html, Size: 5877 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size
  2018-06-09 15:45                                                                   ` Gregory Maxwell
@ 2018-06-12 23:58                                                                     ` Olaoluwa Osuntokun
  0 siblings, 0 replies; 60+ messages in thread
From: Olaoluwa Osuntokun @ 2018-06-12 23:58 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5952 bytes --]

> An example of that cost is you arguing against specifying and supporting
the
> design that is closer to one that would be softforked, which increases the
> time until we can make these filters secure because it
> slows convergence on the design of what would get committed

Agreed, since the commitment is just flat out better, and also also less
code to validate compared to the cross p2p validation, the filter should be
as close to the committed version. This way, wallet and other apps don't
need to modify their logic in X months when the commitment is rolled out.

> Great point, but it should probably exclude coinbase OP_RETURN output.
> This would exclude the current BIP141 style commitment and likely any
> other.

Definitely. I chatted offline with sipa recently, and he suggested this as
well. Upside is that the filters will get even smaller, and also the first
filter type becomes even more of a "barebones" wallet filter. If folks
reaally want to also search OP_RETURN in the filter (as no widely deployed
applications I know of really use it), then an additional filter type can be
added in the future. It would need to be special cased to filter out the
commitment itself.

Alright, color me convinced! I'll further edit my open BIP 158 PR to:

  * exclude all OP_RETURN
  * switch to prev scripts instead of outpoints
  * update the test vectors to include the prev scripts from blocks in
    addition to the block itself

-- Laolu


On Sat, Jun 9, 2018 at 8:45 AM Gregory Maxwell <greg@xiph.org> wrote:

> > So what's the cost in using
> > the current filter (as it lets the client verify the filter if they want
> to,
>
> An example of that cost is you arguing against specifying and
> supporting the design that is closer to one that would be softforked,
> which increases the time until we can make these filters secure
> because it slows convergence on the design of what would get
> committed.
>
> >> I don't agree at all, and I can't see why you say so.
> >
> > Sure it doesn't _have_ to, but from my PoV as "adding more commitments"
> is
> > on the top of every developers wish list for additions to Bitcoin, it
> would
> > make sense to coordinate on an "ultimate" extensible commitment once,
> rather
> > than special case a bunch of distinct commitments. I can see arguments
> for
> > either really.
>
> We have an extensible commitment style via BIP141 already. I don't see
> why this in particular demands a new one.
>
> >   1. The current filter format (even moving to prevouts) cannot be
> committed
> >      in this fashion as it indexes each of the coinbase output scripts.
> This
> >      creates a circular dependency: the commitment is modified by the
> >      filter,
>
> Great point, but it should probably exclude coinbase OP_RETURN output.
> This would exclude the current BIP141 style commitment and likely any
> other.
>
> Should I start a new thread on excluding all OP_RETURN outputs from
> BIP-158 filters for all transactions? -- they can't be spent, so
> including them just pollutes the filters.
>
> >   2. Since the coinbase transaction is the first in a block, it has the
> >      longest merkle proof path. As a result, it may be several hundred
> bytes
> >      (and grows with future capacity increases) to present a proof to the
>
> If 384 bytes is a concern, isn't 3840 bytes (the filter size
> difference is in this ballpark) _much_ more of a concern?  Path to the
> coinbase transaction increases only logarithmically so further
> capacity increases are unlikely to matter much, but the filter size
> increases linearly and so it should be much more of a concern.
>
> > In regards to the second item above, what do you think of the old Tier
> Nolan
> > proposal [1] to create a "constant" sized proof for future commitments by
> > constraining the size of the block and placing the commitments within the
> > last few transactions in the block?
>
> I think it's a fairly ugly hack. esp since it requires that mining
> template code be able to stuff the block if they just don't know
> enough actual transactions-- which means having a pool of spendable
> outputs in order to mine, managing private keys, etc... it also
> requires downstream software not tinker with the transaction count
> (which I wish it didn't but as of today it does). A factor of two
> difference in capacity-- if you constrain to get the smallest possible
> proof-- is pretty stark, optimal txn selection with this cardinality
> constraint would be pretty weird. etc.
>
> If the community considers tree depth for proofs like that to be such
> a concern to take on technical debt for that structure, we should
> probably be thinking about more drastic (incompatible) changes... but
> I don't think it's actually that interesting.
>
> > I don't think its fair to compare those that wish to implement this
> proposal
> > (and actually do the validation) to the legacy SPV software that to my
> > knowledge is all but abandoned. The project I work on that seeks to
> deploy
>
> Yes, maybe it isn't.  But then that just means we don't have good
> information.
>
> When a lot of people were choosing electrum over SPV wallets when
> those SPV wallets weren't abandoned, sync time was frequently cited as
> an actual reason. BIP158 makes that worse, not better.   So while I'm
> hopeful, I'm also somewhat sceptical.  Certainly things that reduce
> the size of the 158 filters make them seem more likely to be a success
> to me.
>
> > too difficult to implement "full" validation, as they're bitcoin
> developers
> > with quite a bit of experience.
>
> ::shrugs:: Above you're also arguing against fetching down to the
> coinbase transaction to save a couple hundred bytes a block, which
> makes it impossible to validate a half dozen other things (including
> as mentioned in the other threads depth fidelity of returned proofs).
> There are a lot of reasons why things don't get implemented other than
> experience! :)
>

[-- Attachment #2: Type: text/html, Size: 7041 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2018-06-12 23:59 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17 15:25 [bitcoin-dev] BIP 158 Flexibility and Filter Size Matt Corallo
2018-05-17 15:43 ` Peter Todd
2018-05-17 15:46   ` Matt Corallo
2018-05-17 16:36 ` Gregory Maxwell
2018-05-17 16:59   ` Matt Corallo
2018-05-17 18:34     ` Gregory Maxwell
2018-05-17 18:34     ` Gregory Maxwell
2018-05-17 20:19       ` Jim Posen
2018-05-17 20:45         ` Gregory Maxwell
2018-05-17 21:27           ` Jim Posen
2018-05-19  3:12             ` Olaoluwa Osuntokun
2018-05-21  8:35               ` Johan Torås Halseth
2018-05-22  1:16                 ` Olaoluwa Osuntokun
2018-05-22  9:23                   ` Johan Torås Halseth
2018-05-23  0:42                     ` Jim Posen
2018-05-23  7:38                       ` Jim Posen
2018-05-23  8:16                         ` Johan Torås Halseth
2018-05-23 17:28                         ` Gregory Maxwell
2018-05-24  1:04                           ` Conner Fromknecht
2018-05-24  3:48                             ` Jim Posen
2018-05-28 18:18                               ` Tamas Blummer
2018-05-28 18:28                                 ` Tamas Blummer
2018-05-28 19:24                                   ` Gregory Maxwell
2018-05-29  2:42                                     ` Jim Posen
2018-05-29  3:24                                       ` Gregory Maxwell
2018-05-29  4:01                                       ` Olaoluwa Osuntokun
2018-05-31 14:27                                         ` Tamas Blummer
2018-06-01  2:52                                         ` Olaoluwa Osuntokun
2018-06-01  4:15                                           ` Gregory Maxwell
     [not found]                                           ` <CAAS2fgSyVi0d_ixp-auRPPzPfFeffN=hsWhWT5=EzDO3O+Ue1g@mail.gmail.com>
2018-06-02  0:01                                             ` Olaoluwa Osuntokun
2018-06-02  0:22                                               ` Gregory Maxwell
2018-06-02  2:02                                                 ` Jim Posen
2018-06-02 12:41                                                   ` David A. Harding
2018-06-02 22:02                                                     ` Tamas Blummer
2018-06-03  0:28                                                       ` Gregory Maxwell
2018-06-03  5:14                                                         ` Tamas Blummer
2018-06-03  6:11                                                           ` Pieter Wuille
2018-06-03 16:44                                                             ` Tamas Blummer
2018-06-03 16:50                                                               ` Tamas Blummer
2018-06-08  5:03                                                             ` Olaoluwa Osuntokun
2018-06-08 16:14                                                               ` Gregory Maxwell
2018-06-08 23:35                                                                 ` Olaoluwa Osuntokun
2018-06-09 10:34                                                                   ` David A. Harding
2018-06-12 23:51                                                                     ` Olaoluwa Osuntokun
2018-06-09 15:45                                                                   ` Gregory Maxwell
2018-06-12 23:58                                                                     ` Olaoluwa Osuntokun
2018-05-18  8:46   ` Riccardo Casatta
2018-05-19  3:08     ` Olaoluwa Osuntokun
2018-05-19  2:57   ` Olaoluwa Osuntokun
2018-05-19  3:06     ` Pieter Wuille
2018-05-22  1:15       ` Olaoluwa Osuntokun
2018-05-18  6:28 ` Karl-Johan Alm
2018-06-04  8:42   ` Riccardo Casatta
2018-06-05  1:08     ` Jim Posen
2018-06-05  4:33       ` Karl-Johan Alm
2018-06-05 17:22         ` Jim Posen
2018-06-05 17:52       ` Gregory Maxwell
2018-06-06  1:12     ` Olaoluwa Osuntokun
2018-06-06 15:14       ` Riccardo Casatta
2018-05-19  2:51 ` Olaoluwa Osuntokun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox