From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A30CDD7F for ; Tue, 22 May 2018 01:15:36 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id AAB61F3 for ; Tue, 22 May 2018 01:15:35 +0000 (UTC) Received: by mail-wm0-f52.google.com with SMTP id n10-v6so29556035wmc.1 for ; Mon, 21 May 2018 18:15:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NgDqljMeKd0G5cLv7zLgUSJMCkf02t/Ye2V8Nt+CB1o=; b=m1h+202UfVE715qXQWBmKmgQqwib6pBx3T0R4EByqknZhH0m6avnNoIwXaWEYcZOCm +TX1WehPrn3BH5emIdW/nCmqPoOj03G6tRbPrDrxK0Y9faSsHhsHrun+atrT1eOoeU14 CDlYzJ09wDXsOoWSVVL5CKRro/SzH8+PiBE3hDvfBWCCcH/gdao/DIEDJpJKJUWGHQlK Lq58rczcX/REf9IkqZg7U12iafn1llayr+/7YgfeoWpoWXtwUomVHwljZihij5qUv9Ic tHiUbmbJHY7/at/Aq7KygzxTmqnwyskk8/0sAo7ozR4eVvrpNH1MKmX7rLmNl54KzTFW yWlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NgDqljMeKd0G5cLv7zLgUSJMCkf02t/Ye2V8Nt+CB1o=; b=FDkFXzjLQtnC9kk908JAKS1GsLbobEwMxcrkQ0VlVAl8+dM180QscDX1PQuvOh+Y37 bQAMnx0cH4ncqwgARTe//shmOTZnSqjel2zucSOu6LjkZO2cOJjv9/df2y0w6ten5+1G g1oN+yCHF97dyU0fRSBDXV4wEu67/vGvlTaRRbTC4CWwgkWcAS+AJQeKzqss5JX6ynY6 1IMwP8BWg/R/Bp0bNJ6AzabQeFarEgObMhczunZuBh48ppBg95BZtCd3L8VrRPD9XCSh sy7a8lDCPI0zjM1cfJI/V42in0ByGuuwgWFYg9e29t30CTg6R3wDdGz3gmSbbYX4O93b M+Gg== X-Gm-Message-State: ALKqPwctA0Di8eh9FHyBx5YxedtzslqxaC7TN8o5n4gCymxhHwqRE417 /KZAlsbSE/Fv0yS8XQPSuz3PtPSQhDdwOzglw50= X-Google-Smtp-Source: AB8JxZrUJvL0M04s1bMjYlTSD7ida7c2YyJcRE6TRFd/99izRO0iJaYrZxWyNH1QZikiQf1Hxq4ERmmgWpXU0NIQq/I= X-Received: by 2002:a50:81e3:: with SMTP id 90-v6mr26494528ede.252.1526951734174; Mon, 21 May 2018 18:15:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Olaoluwa Osuntokun Date: Mon, 21 May 2018 18:15:22 -0700 Message-ID: To: Pieter Wuille Content-Type: multipart/alternative; boundary="000000000000b4674c056cc12700" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Bitcoin Dev Subject: Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2018 01:15:36 -0000 --000000000000b4674c056cc12700 Content-Type: text/plain; charset="UTF-8" Hi Y'all, The script finished a few days ago with the following results: reg-filter-prev-script total size: 161236078 bytes reg-filter-prev-script avg: 16123.6078 bytes reg-filter-prev-script median: 16584 bytes reg-filter-prev-script max: 59480 bytes Compared to the original median size of the same block range, but with the current filter (has both txid, prev outpoint, output scripts), we see a roughly 34% reduction in filter size (current median is 22258 bytes). Compared to the suggested modified filter (no txid, prev outpoint, output scripts), we see a 15% reduction in size (median of that was 19198 bytes). This shows that script re-use is still pretty prevalent in the chain as of recent. One thing that occurred to me, is that on the application level, switching to the input prev output script can make things a bit awkward. Observe that when looking for matches in the filter, upon a match, one would need access to an additional (outpoint -> script) map in order to locate _which_ particular transaction matched w/o access to an up-to-date UTOX set. In contrast, as is atm, one can locate the matching transaction with no additional information (as we're matching on the outpoint). At this point, if we feel filter sizes need to drop further, then we may need to consider raising the false positive rate. Does anyone have any estimates or direct measures w.r.t how much bandwidth current BIP 37 light clients consume? It would be nice to have a direct comparison. We'd need to consider the size of their base bloom filter, the accumulated bandwidth as a result of repeated filterload commands (to adjust the fp rate), and also the overhead of receiving the merkle branch and transactions in distinct messages (both due to matches and false positives). Finally, I'd be open to removing the current "extended" filter from the BIP as is all together for now. If a compelling use case for being able to filter the sigScript/witness arises, then we can examine re-adding it with a distinct service bit. After all it would be harder to phase out the filter once wider deployment was already reached. Similarly, if the 16% savings achieved by removing the txid is attractive, then we can create an additional filter just for the txids to allow those applications which need the information to seek out that extra filter. -- Laolu On Fri, May 18, 2018 at 8:06 PM Pieter Wuille wrote: > On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via bitcoin-dev < > bitcoin-dev@lists.linuxfoundation.org> wrote: > >> Greg wrote: >> > What about also making input prevouts filter based on the scriptpubkey >> being >> > _spent_? Layering wise in the processing it's a bit ugly, but if you >> > validated the block you have the data needed. >> >> AFAICT, this would mean that in order for a new node to catch up the >> filter >> index (index all historical blocks), they'd either need to: build up a >> utxo-set in memory during indexing, or would require a txindex in order to >> look up the prev out's script. The first option increases the memory load >> during indexing, and the second requires nodes to have a transaction index >> (and would also add considerable I/O load). When proceeding from tip, this >> doesn't add any additional load assuming that your synchronously index the >> block as you validate it, otherwise the utxo set will already have been >> updated (the spent scripts removed). >> > > I was wondering about that too, but it turns out that isn't necessary. At > least in Bitcoin Core, all the data needed for such a filter is in the > block + undo files (the latter contain the scriptPubKeys of the outputs > being spent). > > I have a script running to compare the filter sizes assuming the regular >> filter switches to include the prev out's script rather than the prev >> outpoint itself. The script hasn't yet finished (due to the increased I/O >> load to look up the scripts when indexing), but I'll report back once it's >> finished. >> > > That's very helpful, thank you. > > Cheers, > > -- > Pieter > > --000000000000b4674c056cc12700 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Y'all,=C2=A0

The scri= pt finished a few days ago with the following results:

=
reg-filter-prev-script total size:=C2=A0 161236078=C2=A0 bytes
reg-filter-prev-script avg:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A016123.6078 = bytes
reg-filter-prev-script median:=C2=A0 =C2=A0 =C2=A0 16584=C2= =A0 =C2=A0 =C2=A0 bytes
reg-filter-prev-script max:=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A059480=C2=A0 =C2=A0 =C2=A0 bytes

Compared to the original median size of the same block range, but with th= e
current filter (has both txid, prev outpoint, output scripts), = we see a
roughly 34% reduction in filter size (current median is = 22258 bytes).
Compared to the suggested modified filter (no txid,= prev outpoint, output
scripts), we see a 15% reduction in size (= median of that was 19198 bytes).
This shows that script re-use is= still pretty prevalent in the chain as of
recent.

=
One thing that occurred to me, is that on the application level,= switching
to the input prev output script can make things a bit = awkward. Observe that
when looking for matches in the filter, upo= n a match, one would need access
to an additional (outpoint ->= script) map in order to locate _which_
particular transaction ma= tched w/o access to an up-to-date UTOX set. In
contrast, as is at= m, one can locate the matching transaction with no
additional inf= ormation (as we're matching on the outpoint).

= At this point, if we feel filter sizes need to drop further, then we may
need to consider raising the false positive rate.

Does anyone have any estimates or direct measures w.r.t how much ba= ndwidth
current BIP 37 light clients consume? It would be nice to= have a direct
comparison. We'd need to consider the size of = their base bloom filter, the
accumulated bandwidth as a result of= repeated filterload commands (to adjust
the fp rate), and also t= he overhead of receiving the merkle branch and
transactions in di= stinct messages (both due to matches and false positives).

Finally, I'd be open to removing the current "extended&qu= ot; filter from the BIP
as is all together for now. If a compelli= ng use case for being able to
filter the sigScript/witness arises= , then we can examine re-adding it with a
distinct service bit. A= fter all it would be harder to phase out the filter
once wider de= ployment was already reached. Similarly, if the 16% savings
achie= ved by removing the txid is attractive, then we can create an additional
filter just for the txids to allow those applications which need th= e
information to seek out that extra filter.

=
-- Laolu


On Fri, May 18, 2018 at 8:06 PM Pieter Wuille <pieter.wuille@gmail.com> wrote:
=
On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via b= itcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
Greg wrote:
> What about also making input prevouts filter based on the scriptpubk= ey being
> _spent_?=C2=A0 Layering wise in the processing it&#= 39;s a bit ugly, but if you
> validated the block you have the= data needed.

AFAICT, this would mean that in orde= r for a new node to catch up the filter
index (index all historic= al blocks), they'd either need to: build up a
utxo-set in mem= ory during indexing, or would require a txindex in order to
look = up the prev out's script. The first option increases the memory load
during indexing, and the second requires nodes to have a transactio= n index
(and would also add considerable I/O load). When proceedi= ng from tip, this
doesn't add any additional load assuming th= at your synchronously index the
block as you validate it, otherwi= se the utxo set will already have been
updated (the spent scripts= removed).

=
I was wondering about that too, b= ut it turns out that isn't necessary. At least in Bitcoin Core, all the= data needed for such a filter is in the block + undo files (the latter con= tain the scriptPubKeys of the outputs being spent).

I have a script ru= nning to compare the filter sizes assuming the regular
filter swi= tches to include the prev out's script rather than the prev
o= utpoint itself. The script hasn't yet finished (due to the increased I/= O
load to look up the scripts when indexing), but I'll report= back once it's
finished.

That= 's very helpful, thank you.

Cheers,

=
--=C2=A0
Pieter

--000000000000b4674c056cc12700--