Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

From: Alex Akselrod <alex@akselrod.org>
To: Bitcoin Dev <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients
Date: Fri, 2 Jun 2017 13:55:31 -0400	[thread overview]
Message-ID: <CAE0pnxK5r2XfVks=emkK=v66XRN5c-Sz-Lm_dKY+6nO=kPk6Vw@mail.gmail.com> (raw)
In-Reply-To: <CAE0pnxJxHYQ4+2pt3tt=1WZ0-K0vDxGB4KBXY+R=WfktMmATwA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]

On Jun 2, 2017 8:09 AM, "Karl Johan Alm via bitcoin-dev" <
bitcoin-dev@lists.linuxfoundation.org> wrote:

Hello,

Really wish I'd known you were working on this a few weeks ago, but
such is life. Hopefully I can provide some useful feedback.

Your feedback is greatly appreciated!

On Fri, Jun 2, 2017 at 4:01 AM, Olaoluwa Osuntokun via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> Full-nodes
> maintain an additional index of the chain, and serve this compact filter
> (the index) to light clients which request them. Light clients then fetch
> these filters, query the locally and _maybe_ fetch the block if a relevant
> item matches.

Is it necessary to maintain the index all the way to the beginning of
the chain? When would clients request "really old digests" and why?

Without a soft fork, this is the only way for light clients to verify that
peers aren't lying to them. Clients can request headers (just hashes of the
filters and the previous headers, creating a chain) and look for conflicts
between peers. If a conflict is found at a certain block, the client can
download the block, generate a filter, calculate the header by hashing
together the previous header and the generated filter, and banning any
peers that don't match. A full node could prune old filters if you wanted
and recalculate them as necessary if you just keep the filter header chain
info as really old filters are unlikely to be requested by correctly
written software but you can't guarantee every client will follow best
practices either.

> The calculator can be found here:
> https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical
> script he's been running on actual data, and the results seem to match up
> rather nicely.

I haven't tried the tool yet, and maybe it will answer some of my questions.

On what data were the simulated wallets on actual data based? How did
false positive rates for wallets with lots of items (pubkeys etc) play
out? Is there a maximum number of items for a wallet before it becomes
too bandwidth costly to use digests?

The simulations are based on completely random data within given
parameters. For example, it will generate a wallet of a specified size and
generate blocks of specified size with specified number of transactions of
specified format, all guaranteed to not match the wallet. It then tries to
match the wallet and tracks the filter size and the bandwidth used by block
downloads which are all due to false positives. The maximum wallet size can
be millions or more of addresses and outpoints before the filter isn't
worth it.

I published the simulation code at
https://gist.github.com/aakselrod/0ee665205f7c9538c2339876b0424b26 but the
calculation code gives you the same results (on average but very close with
a big enough sample size) much faster.

I will definitely try to reproduce my experiments with Golomb-Coded
sets and see what I come up with. It seems like you've got a little
less than half the size of my digests for 1-block digests but I
haven't tried making digests for all blocks (and lots of early blocks
are empty).

Filters for empty blocks only take a few bytes and sometimes zero when the
coinbase output is a burn that doesn't push any data (example will be in
the test vectors that I'll have ready shortly).

On the BIP proposal itself:

In Compact Filter Header Chain, you mention that clients should
download filters from nodes if filter_headers is not identical, and
ban offending nodes. What about temporary forks in the chain? What
about longer forks? In general, I am curious how you will deal with
reorgs and temporary non-consensus related chain splits.

The cfheaders messages give you the hash of the final block for which
there's a header in the message. This means you can ignore the message as
necessary rather than ban the peer, or track cfheaders for multiple forks
if desired.

I am also curious if you have considered digests containing multiple
blocks. Retaining a permanent binsearchable record of the entire chain
is obviously too space costly, but keeping the last X blocks as
binsearchable could speed up syncing for clients tremendously, I feel.

We hadn't (or I hadn't) until we read your recent post/paper and are
considering it now.

It may also be space efficient to ONLY store older digests in chunks
of e.g. 8 blocks. A client syncing up finding a match in an 8-block
chunk would have to grab those 8 blocks, but if it's not recent, that
may be acceptable. It may even be possible to make 4-, 2-, 1-block
digests on demand.

This is also something we (or at least I) hadn't considered before your
recent post. We have been working on this for a few months now so didn't
have time to work on trying out and possibly incorporating the idea before
release.

How fast are these to create? Would it make sense to provide digests
on demand in some cases, rather than keeping them around indefinitely?

They're pretty fast and can be pruned if desired, as mentioned above, as
long as the header chain is kept.

[-- Attachment #2: Type: text/html, Size: 8980 bytes --]

next prev parent reply	other threads:[~2017-06-02 17:55 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-01 19:01 [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients Olaoluwa Osuntokun
2017-06-01 21:00 ` Eric Lombrozo
2017-06-01 21:33 ` Matt Corallo
2017-06-01 22:10   ` Olaoluwa Osuntokun
2017-06-02  2:15     ` Chris
2017-06-02  2:28       ` Gregory Maxwell
2017-06-02  3:35         ` Alex Akselrod
2017-06-02 16:07         ` Chris Pacia
2017-06-02  4:49 ` Olaoluwa Osuntokun
2017-06-09  3:59   ` Olaoluwa Osuntokun
2017-11-09 23:44     ` Olaoluwa Osuntokun
2017-06-02  6:00 ` Karl Johan Alm
     [not found]   ` <CAE0pnx+RRAP269VeWAcxKbrcS9qX4LS8_6nY_js8X5NtQ22t_A@mail.gmail.com>
     [not found]     ` <CAE0pnxLKYnwHnktTqW949s1AA9uK=6WnVYWmRoau8B1SszzYEg@mail.gmail.com>
     [not found]       ` <CAE0pnxJxHYQ4+2pt3tt=1WZ0-K0vDxGB4KBXY+R=WfktMmATwA@mail.gmail.com>
2017-06-02 17:55         ` Alex Akselrod [this message]
2017-06-05  2:06           ` Karl Johan Alm
2017-06-09  3:03             ` Olaoluwa Osuntokun
2017-06-07 21:41 ` Gregory Maxwell
2017-06-09  3:42   ` Olaoluwa Osuntokun
2017-06-09  4:47     ` Olaoluwa Osuntokun
2017-06-08  9:50 ` Tomas
2017-06-09  3:50   ` Olaoluwa Osuntokun
2017-06-09  8:26     ` Tomas
2017-06-19 11:58 ` Andreas Schildbach
2017-06-19 12:26   ` bfd
2017-06-19 15:15     ` Tom Zander
2017-06-19 15:49       ` Jonas Schnelli
2017-06-19 15:59         ` Andreas Schildbach
2017-06-19 16:22           ` Jonas Schnelli
2017-06-19 16:36             ` adiabat
2017-06-19 20:49               ` Andreas Schildbach
2017-06-20  7:03             ` Eric Voskuil
2017-06-19 16:07         ` Tom Zander
2017-06-19 16:30           ` Jonas Schnelli
2017-06-19 16:38             ` Tom Zander
2017-06-19 15:43     ` Andreas Schildbach
2017-06-19 16:10       ` Jonas Schnelli
2017-06-19 22:41     ` Gregory Maxwell
2017-06-20  9:52       ` Tom Zander
2017-06-20 13:08         ` bfd
2017-06-20 17:20           ` Adam Back

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE0pnxK5r2XfVks=emkK=v66XRN5c-Sz-Lm_dKY+6nO=kPk6Vw@mail.gmail.com' \
    --to=alex@akselrod.org \
    --cc=bitcoin-dev@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox