From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 85DB4ACB for ; Thu, 9 Nov 2017 23:44:22 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D95E0E3 for ; Thu, 9 Nov 2017 23:44:20 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id r68so20966335wmr.3 for ; Thu, 09 Nov 2017 15:44:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=z5LHpgu4hwG5bP5of4RhyYWDyj+LzZNQl2P8v0ezgQ8=; b=M03WqPhTjkrFkWmKc/KxKT6BiJUmfDtDwR1MgpDX01pRCPtsRJD6yX+PKh6xnqvugC 2DoajgJt3egKuIfTgeuH2MP83sk2Ycfybyk0kK6R5dSzc2gfG+GvGX+cquXgDWWxh5nj lmccE7WzW4Zaa+9DAsVUqpIMtI71zNgywA44bRzDs5yZQ6UyA8VRq/8x4hC2SIr3kbNS /0KBs4oTKG54gDB39/iEjWaD7huYWPL+cvGy8sTlJtYKTr6s/RX+31rdGYIo/hEOo7Jg hmYmY9U6gEfubLpOGC9mrlFXdLFdyIogFDJt7qk8I/o+VAAPCg2NWU67RLGturIReqY8 TE2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=z5LHpgu4hwG5bP5of4RhyYWDyj+LzZNQl2P8v0ezgQ8=; b=TCYUeNOctU20i+GMw9TLKir/jciVxm26kQHm4JqVsL0iVDW5rSPR2rAUuVgp2OI3Y0 2qROHfmp+QrQpJVvPQaW12Cz3Rpha81rhLTMMSrpdOO79dDbxavbc3BYENF77RP8CZGR 0294XCWq5ijwi9CEQMihq6B+7fP+xrAJctFsG8fLgB3sr9jaNZPNRoTdNuajRzN8WVzN VieD2gcHhEmVRFIyGwVKmmpPUQeKzRCPwLozuoCgSJG8W5ffeqUlOWhaXVakQZJaNgE3 TugLGKYhzvYTOxmZVA1BZovdm07p5wEzHK8G3nFr17gziVnxYsPwbWY9XhOIPZsBk2P7 f7fw== X-Gm-Message-State: AJaThX6KTqPBQUu2TOyIAMaxfsNL257hG0pgmcACDAgzWgNYSB4gjpbf x/zMzFy5YfVlZyeQh/OKVJXdntY60W1sEz8o03m0Ig== X-Google-Smtp-Source: ABhQp+TbbPqstfSaKw5XMY9sP8nS+37bUeQmUeWbiVvYC0M/1qPHVbRl7fdCM4Zw03sB1j+56rLFlm/Ee8f/EuifuOU= X-Received: by 10.80.147.93 with SMTP id n29mr168531eda.237.1510271059023; Thu, 09 Nov 2017 15:44:19 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Olaoluwa Osuntokun Date: Thu, 09 Nov 2017 23:44:07 +0000 Message-ID: To: Arnoud Kouwenhoven - Pukaki Corp via bitcoin-dev Content-Type: multipart/alternative; boundary="94eb2c1a8c0efcdd59055d9561a5" X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=disabled version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Nov 2017 23:44:22 -0000 --94eb2c1a8c0efcdd59055d9561a5 Content-Type: text/plain; charset="UTF-8" Hi y'all, Since my last email we've made a number of changes to the BIP. The changes made were driven by the feedback we've received so far in this thread, and also as a result of real-world testing using this new proposal as the basis for our light weight LN node which powers the demo Lightning desktop application we recently released. A highlight of the changes made between this version and the last follows: * We've removed the modulus operation in the inner loop when constructing filters. This has been replaced with an alternative, more efficient mapping[1] as suggested by gmaxwell and sipa. In our implementation, we perform the operation in a piece-wise fashion by hand. Alternative implementations can take advantage of 128-bit arithmetic extensions on supporting CPU's. * The txid has been moved from the extended filter to the regular filter. During out testing of the new light client with our LN node implementation, we found that we were able to reduce network traffic as we only need the extended filter for rare on-chain events. * We now use the 6th service bit. We realized that the bit we had chosen prior was already being used to signal support of x-thin block syncing. To select this bit number, we ran a scanner on the addrman of our nodes and also the network to fin da bit that wasn't used widely. * An error in the BIP that didn't include the public key script of coinbase transactions in the filter has been fixed. * An error in the BIP when constructing the initial "genesis" filter has been fixed. * We no longer use the ProtocolVersion field in the getcfheaders message or its response. * The specification of several newly defined messages were incorrect and have been fixed. * A number of typos spotted by several reviewers have been fixed. The full commit history of the BIP draft can be found here: https://github.com/Roasbeef/bips/commits/gcs-bip-draft At this point, we're ready to make a PR against the official BIP repo and to request a number to be assigned to our proposal. Thanks to all those that have reviewed, and contributed to the proposal! [1]: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/ -- Laolu On Thu, Jun 8, 2017 at 8:59 PM Olaoluwa Osuntokun wrote: > Hi y'all, > > Thanks for all the comments so far! > > I've pushed a series of updates to the text of the BIP repo linked in the > OP. > The fixes include: typos, components of the specification which were > incorrect > (N is the total number of items, NOT the number of txns in the block), and > a > few sections have been clarified. > > The latest version also includes a set of test vectors (as CSV files), > which > for a series of fp rates (1/2 to 1/2^32) includes (for 6 testnet blocks, > one of > which generates a "null" filter): > > * The block height > * The block hash > * The raw block itself > * The previous basic+extended filter header > * The basic+extended filter header for the block > * The basic+extended filter for the block > > The size of the test vectors was too large to include in-line within the > document, so we put them temporarily in a distinct folder [1]. The code > used to > generate the test vectors has also been included. > > -- Laolu > > [1]: https://github.com/Roasbeef/bips/tree/master/gcs_light_client > > > On Thu, Jun 1, 2017 at 9:49 PM Olaoluwa Osuntokun > wrote: > >> > In order to consider the average+median filter sizes in a world worth >> larger >> > blocks, I also ran the index for testnet: >> > >> > * total size: 2753238530 >> > * total avg: 5918.95736054141 >> > * total median: 60202 >> > * total max: 74983 >> > * regular size: 1165148878 >> > * regular avg: 2504.856172982827 >> > * regular median: 24812 >> > * regular max: 64554 >> > * extended size: 1588089652 >> > * extended avg: 3414.1011875585823 >> > * extended median: 35260 >> > * extended max: 41731 >> > >> >> Oops, realized I made a mistake. These are the stats for Feb 2016 until >> about a >> month ago (since height 400k iirc). >> >> -- Laolu >> >> >> On Thu, Jun 1, 2017 at 12:01 PM Olaoluwa Osuntokun >> wrote: >> >>> Hi y'all, >>> >>> Alex Akselrod and I would like to propose a new light client BIP for >>> consideration: >>> * >>> https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki >>> >>> This BIP proposal describes a concrete specification (along with a >>> reference implementations[1][2][3]) for the much discussed client-side >>> filtering reversal of BIP-37. The precise details are described in the >>> BIP, but as a summary: we've implemented a new light-client mode that >>> uses >>> client-side filtering based off of Golomb-Rice coded sets. Full-nodes >>> maintain an additional index of the chain, and serve this compact filter >>> (the index) to light clients which request them. Light clients then fetch >>> these filters, query the locally and _maybe_ fetch the block if a >>> relevant >>> item matches. The cool part is that blocks can be fetched from _any_ >>> source, once the light client deems it necessary. Our primary motivation >>> for this work was enabling a light client mode for lnd[4] in order to >>> support a more light-weight back end paving the way for the usage of >>> Lightning on mobile phones and other devices. We've integrated neutrino >>> as a back end for lnd, and will be making the updated code public very >>> soon. >>> >>> One specific area we'd like feedback on is the parameter selection. >>> Unlike >>> BIP-37 which allows clients to dynamically tune their false positive >>> rate, >>> our proposal uses a _fixed_ false-positive. Within the document, it's >>> currently specified as P = 1/2^20. We've done a bit of analysis and >>> optimization attempting to optimize the following sum: >>> filter_download_bandwidth + expected_block_false_positive_bandwidth. Alex >>> has made a JS calculator that allows y'all to explore the affect of >>> tweaking the false positive rate in addition to the following variables: >>> the number of items the wallet is scanning for, the size of the blocks, >>> number of blocks fetched, and the size of the filters themselves. The >>> calculator calculates the expected bandwidth utilization using the CDF of >>> the Geometric Distribution. The calculator can be found here: >>> https://aakselrod.github.io/gcs_calc.html. Alex also has an empirical >>> script he's been running on actual data, and the results seem to match up >>> rather nicely. >>> >>> We we're excited to see that Karl Johan Alm (kallewoof) has done some >>> (rather extensive!) analysis of his own, focusing on a distinct encoding >>> type [5]. I haven't had the time yet to dig into his report yet, but I >>> think I've read enough to extract the key difference in our encodings: >>> his >>> filters use a binomial encoding _directly_ on the filter contents, will >>> we >>> instead create a Golomb-Coded set with the contents being _hashes_ (we >>> use >>> siphash) of the filter items. >>> >>> Using a fixed fp=20, I have some stats detailing the total index size, as >>> well as averages for both mainnet and testnet. For mainnet, using the >>> filter contents as currently described in the BIP (basic + extended), the >>> total size of the index comes out to 6.9GB. The break down is as follows: >>> >>> * total size: 6976047156 >>> * total avg: 14997.220622758816 >>> * total median: 3801 >>> * total max: 79155 >>> * regular size: 3117183743 >>> * regular avg: 6701.372750217131 >>> * regular median: 1734 >>> * regular max: 67533 >>> * extended size: 3858863413 <(385)%20886-3413> >>> * extended avg: 8295.847872541684 >>> * extended median: 2041 >>> * extended max: 52508 >>> >>> In order to consider the average+median filter sizes in a world worth >>> larger blocks, I also ran the index for testnet: >>> >>> * total size: 2753238530 >>> * total avg: 5918.95736054141 >>> * total median: 60202 >>> * total max: 74983 >>> * regular size: 1165148878 >>> * regular avg: 2504.856172982827 >>> * regular median: 24812 >>> * regular max: 64554 >>> * extended size: 1588089652 >>> * extended avg: 3414.1011875585823 >>> * extended median: 35260 >>> * extended max: 41731 >>> >>> Finally, here are the testnet stats which take into account the increase >>> in the maximum filter size due to segwit's block-size increase. The max >>> filter sizes are a bit larger due to some of the habitual blocks I >>> created last year when testing segwit (transactions with 30k inputs, 30k >>> outputs, etc). >>> >>> * total size: 585087597 >>> * total avg: 520.8839608674402 >>> * total median: 20 >>> * total max: 164598 >>> * regular size: 299325029 >>> * regular avg: 266.4790836307566 >>> * regular median: 13 >>> * regular max: 164583 >>> * extended size: 285762568 >>> * extended avg: 254.4048772366836 >>> * extended median: 7 >>> * extended max: 127631 >>> >>> For those that are interested in the raw data, I've uploaded a CSV file >>> of raw data for each block (mainnet + testnet), which can be found here: >>> * mainnet: (14MB): >>> https://www.dropbox.com/s/4yk2u8dj06njbuv/mainnet-gcs-stats.csv?dl=0 >>> * testnet: (25MB): >>> https://www.dropbox.com/s/w7dmmcbocnmjfbo/gcs-stats-testnet.csv?dl=0 >>> >>> >>> We look forward to getting feedback from all of y'all! >>> >>> -- Laolu >>> >>> >>> [1]: https://github.com/lightninglabs/neutrino >>> [2]: https://github.com/Roasbeef/btcd/tree/segwit-cbf >>> [3]: https://github.com/Roasbeef/btcutil/tree/gcs/gcs >>> [4]: https://github.com/lightningnetwork/lnd/ >>> >>> -- Laolu >>> >>> --94eb2c1a8c0efcdd59055d9561a5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi y'all,=C2=A0

Since my= last email we've made a number of changes to the BIP. The changes made=
were driven by the feedback we've received so far in this th= read, and also as a
result of real-world testing using this new p= roposal as the basis for our light
weight LN node which powers th= e demo Lightning desktop application we recently
released.
<= div>
A highlight of the changes made between this version and= the last follows:

=C2=A0 * We've removed the = modulus operation in the inner loop when constructing
=C2=A0 =C2= =A0 filters. This has been replaced with an alternative, more efficient
=C2=A0 =C2=A0 mapping[1] as suggested by gmaxwell and sipa. In our i= mplementation, we
=C2=A0 =C2=A0 perform the operation in a piece-= wise fashion by hand. Alternative
=C2=A0 =C2=A0 implementations c= an take advantage of 128-bit arithmetic extensions on
=C2=A0 =C2= =A0 supporting CPU's.
=C2=A0
=C2=A0 * The txid has = been moved from the extended filter to the regular filter.
=C2=A0= =C2=A0 During out testing of the new light client with our LN node impleme= ntation,
=C2=A0 =C2=A0 we found that we were able to reduce netwo= rk traffic as we only need the
=C2=A0 =C2=A0 extended filter for = rare on-chain events.

=C2=A0 * We now use the 6th = service bit. We realized that the bit we had chosen
=C2=A0 =C2=A0= prior was already being used to signal support of x-thin block syncing. To=
=C2=A0 =C2=A0 select this bit number, we ran a scanner on the ad= drman of our nodes and
=C2=A0 =C2=A0 also the network to fin da b= it that wasn't used widely.
=C2=A0=C2=A0
=C2=A0 * A= n error in the BIP that didn't include the public key script of coinbas= e
=C2=A0 =C2=A0 transactions in the filter has been fixed.
<= div>
=C2=A0 * An error in the BIP when constructing the initi= al "genesis" filter has been
=C2=A0 =C2=A0 fixed.
=

=C2=A0 * We no longer use the ProtocolVersion field in = the getcfheaders message or
=C2=A0 =C2=A0 its response.=C2=A0

=C2=A0 * The specification of several newly defined m= essages were incorrect and have
=C2=A0 =C2=A0 been fixed.

=C2=A0 * A number of typos spotted by several reviewers h= ave been fixed.

The full commit history of the BIP= draft can be found here:

At this point, we're ready to make= a PR against the official BIP repo and to
request a number to be= assigned to our proposal. Thanks to all those that have
reviewed= , and contributed to the proposal!


-- Laolu





=
On Thu, Jun 1, 2017 at 9:49 PM Olaoluwa Osuntokun <laolu32@gmail.com>= ; wrote:
>= In order to consider the average+median filter sizes in a world worth larg= er
> blocks, I also ran the index for testnet:
>= =C2=A0
> =C2=A0 =C2=A0 * total size: =C2=A02753238530
> =C2=A0 =C2=A0 * total avg: =C2=A05918.95736054141
> =C2= =A0 =C2=A0 * total median: =C2=A060202
> =C2=A0 =C2=A0 * total= max: =C2=A074983
> =C2=A0 =C2=A0 * regular size: =C2=A0116514= 8878
> =C2=A0 =C2=A0 * regular avg: =C2=A02504.856172982827
> =C2=A0 =C2=A0 * regular median: =C2=A024812
> =C2= =A0 =C2=A0 * regular max: =C2=A064554
> =C2=A0 =C2=A0 * extend= ed size: =C2=A01588089652
> =C2=A0 =C2=A0 * extended avg: =C2= =A03414.1011875585823
> =C2=A0 =C2=A0 * extended median: =C2= =A035260
> =C2=A0 =C2=A0 * extended max: =C2=A041731
>=C2=A0

Oops, realized = I made a mistake. These are the stats for Feb 2016 until about a
= month ago (since height 400k iirc).

-- Laolu
=


On Thu, Jun 1, 2017 at 12:01 PM Olaoluwa Osuntokun <laolu32@gmail.com> w= rote:
Hi y= 9;all,=C2=A0

Alex Akselrod and I would like to pro= pose a new light client BIP for
consideration:=C2=A0

This BI= P proposal describes a concrete specification (along with a
refer= ence implementations[1][2][3]) for the much discussed client-side
filtering reversal of BIP-37. The precise details are described in the
BIP, but as a summary: we've implemented a new light-client mode= that uses
client-side filtering based off of Golomb-Rice coded s= ets. Full-nodes
maintain an additional index of the chain, and se= rve this compact filter
(the index) to light clients which reques= t them. Light clients then fetch
these filters, query the locally= and _maybe_ fetch the block if a relevant
item matches. The cool= part is that blocks can be fetched from _any_
source, once the l= ight client deems it necessary. Our primary motivation
for this w= ork was enabling a light client mode for lnd[4] in order to
suppo= rt a more light-weight back end paving the way for the usage of
L= ightning on mobile phones and other devices. We've integrated neutrino<= /div>
as a back end for lnd, and will be making the updated code public= very
soon.

One specific area we'd l= ike feedback on is the parameter selection. Unlike
BIP-37 which a= llows clients to dynamically tune their false positive rate,
our = proposal uses a _fixed_ false-positive. Within the document, it's
=
currently specified as P =3D 1/2^20. We've done a bit of analysis = and
optimization attempting to optimize the following sum:
<= div>filter_download_bandwidth + expected_block_false_positive_bandwidth. Al= ex
has made a JS calculator that allows y'all to explore the = affect of
tweaking the false positive rate in addition to the fol= lowing variables:
the number of items the wallet is scanning for,= the size of the blocks,
number of blocks fetched, and the size o= f the filters themselves. The
calculator calculates the expected = bandwidth utilization using the CDF of
the Geometric Distribution= . The calculator can be found here:
script he's been ru= nning on actual data, and the results seem to match up
rather nic= ely.

We we're excited to see that Karl Johan A= lm (kallewoof) has done some
(rather extensive!) analysis of his = own, focusing on a distinct encoding
type [5]. I haven't had = the time yet to dig into his report yet, but I
think I've rea= d enough to extract the key difference in our encodings: his
filt= ers use a binomial encoding _directly_ on the filter contents, will we
instead create a Golomb-Coded set with the contents being _hashes_ (w= e use
siphash) of the filter items.

Usin= g a fixed fp=3D20, I have some stats detailing the total index size, as
well as averages for both mainnet and testnet. For mainnet, using th= e
filter contents as currently described in the BIP (basic + exte= nded), the
total size of the index comes out to 6.9GB. The break = down is as follows:

=C2=A0 =C2=A0 * total size: = =C2=A06976047156
=C2=A0 =C2=A0 * total avg: =C2=A014997.220622758= 816
=C2=A0 =C2=A0 * total median: =C2=A03801
=C2=A0 =C2= =A0 * total max: =C2=A079155
=C2=A0 =C2=A0 * regular size: =C2=A0= 3117183743
=C2=A0 =C2=A0 * regular avg: =C2=A06701.372750217131
=C2=A0 =C2=A0 * regular median: =C2=A01734
=C2=A0 =C2=A0= * regular max: =C2=A067533
=C2=A0 =C2=A0 * extended size: =C2=A0= 3= 858863413
=C2=A0 =C2=A0 * extended avg: =C2=A08295.8478725416= 84
=C2=A0 =C2=A0 * extended median: =C2=A02041
=C2=A0 = =C2=A0 * extended max: =C2=A052508

In order to con= sider the average+median filter sizes in a world worth
larger blo= cks, I also ran the index for testnet:=C2=A0

=C2= =A0 =C2=A0 * total size: =C2=A02753238530
=C2=A0 =C2=A0 * total a= vg: =C2=A05918.95736054141
=C2=A0 =C2=A0 * total median: =C2=A060= 202
=C2=A0 =C2=A0 * total max: =C2=A074983
=C2=A0 =C2= =A0 * regular size: =C2=A01165148878
=C2=A0 =C2=A0 * regular avg:= =C2=A02504.856172982827
=C2=A0 =C2=A0 * regular median: =C2=A024= 812
=C2=A0 =C2=A0 * regular max: =C2=A064554
=C2=A0 =C2= =A0 * extended size: =C2=A01588089652
=C2=A0 =C2=A0 * extended av= g: =C2=A03414.1011875585823
=C2=A0 =C2=A0 * extended median: =C2= =A035260
=C2=A0 =C2=A0 * extended max: =C2=A041731

=
Finally, here are the testnet stats which take into account the = increase
in the maximum filter size due to segwit's block-siz= e increase. The max
filter sizes are a bit larger due to some of = the habitual blocks I
created last year when testing segwit (tran= sactions with 30k inputs, 30k
outputs, etc).

=
=C2=A0 =C2=A0 =C2=A0* total size: =C2=A0585087597
=C2=A0 =C2= =A0 =C2=A0* total avg: =C2=A0520.8839608674402
=C2=A0 =C2=A0 =C2= =A0* total median: =C2=A020
=C2=A0 =C2=A0 =C2=A0* total max: =C2= =A0164598
=C2=A0 =C2=A0 =C2=A0* regular size: =C2=A0299325029
=C2=A0 =C2=A0 =C2=A0* regular avg: =C2=A0266.4790836307566
=C2=A0 =C2=A0 =C2=A0* regular median: =C2=A013
=C2=A0 =C2=A0 =C2= =A0* regular max: =C2=A0164583
=C2=A0 =C2=A0 =C2=A0* extended siz= e: =C2=A0285762568
=C2=A0 =C2=A0 =C2=A0* extended avg: =C2=A0254.= 4048772366836
=C2=A0 =C2=A0 =C2=A0* extended median: =C2=A07
=C2=A0 =C2=A0 =C2=A0* extended max: =C2=A0127631

=
For those that are interested in the raw data, I've uploaded a CSV= file
of raw data for each block (mainnet + testnet), which can b= e found here:


We look forward to ge= tting feedback from all of y'all!

--94eb2c1a8c0efcdd59055d9561a5--