* [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets @ 2018-05-25 17:54 Pieter Wuille 2018-05-25 18:42 ` Gregory Maxwell 0 siblings, 1 reply; 5+ messages in thread From: Pieter Wuille @ 2018-05-25 17:54 UTC (permalink / raw) To: Bitcoin Dev Hi all, I spent some time working out the optimal parameter selection for the Golomb Coded Sets that are proposed in BIP158: https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845 TL;DR: if we really want an FP rate of exactly 1 in 2^20, the Rice parameter should be 19, not 20. If we don't, we should pick an FP rate of 1 in a 1.4971*2^B. So for example M=784931 B=19 or M=1569861 B=20. Cheers, -- Pieter ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets 2018-05-25 17:54 [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets Pieter Wuille @ 2018-05-25 18:42 ` Gregory Maxwell 2018-05-25 21:13 ` Gregory Maxwell 0 siblings, 1 reply; 5+ messages in thread From: Gregory Maxwell @ 2018-05-25 18:42 UTC (permalink / raw) To: Pieter Wuille, Bitcoin Protocol Discussion On Fri, May 25, 2018 at 5:54 PM, Pieter Wuille via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote: > Hi all, > > I spent some time working out the optimal parameter selection for the > Golomb Coded Sets that are proposed in BIP158: > https://gist.github.com/sipa/576d5f09c3b86c3b1b75598d799fc845 > > TL;DR: if we really want an FP rate of exactly 1 in 2^20, the Rice > parameter should be 19, not 20. If we don't, we should pick an FP rate > of 1 in a 1.4971*2^B. So for example M=784931 B=19 or M=1569861 B=20. I did a rough analysis using Pieter's approximations on what parameters minimizes the total communications for a lite wallet scanning the chain and fetching a witnessless block whenever they get a filter hit. For a wallet with 1000 keys and blocks of 1MB if the number of entries in the is at least 5096 then M=784931 results in a lower total data rate rate (FP blocks + filters) than M=1569861. M=392465 (the optimal value for the rice parameter 18) is communications is better if at least 10192 entries are set, and M=196233 (optimal FP for rice 17) is better if at least 20384 entries are set. The prior filter set proposal is setting roughly 13300 entries per full block, and I guestimate that the in+out scripts only ones are setting about 7500 entries (if that actual number was in any of the recent posts I missed it, I'm guessing based on jimpo's sizes graph). The breakpoints are obviously different if the client is monitoring for, say, 10,000 keys instead of 1000 but I think it generally makes more sense to optimize for lower key counts since bigger users are more likely to tolerate the additional bandwidth usage. So I think that assuming that all-scripts inputs and outputs (but no txids) are used and that my guess of 7500 bits set for that configuration is roughly right, then M=1569861 and rice parameter 19 should be used. The actual optimal FP rate for total data transferred won't be one that gets the optimal rice coding efficiency, but since different clients will be monitoring for different numbers of keys, it probably makes sense to pick a parameter with optimal compression rather than optimal-data-transfer-for-a-specific-key-count-- at least then we're spending the least amount of filter bits per false positive rate, whatever that rate is... if we can't be optimal at least we can be efficient. :) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets 2018-05-25 18:42 ` Gregory Maxwell @ 2018-05-25 21:13 ` Gregory Maxwell 2018-05-29 22:38 ` Jim Posen 0 siblings, 1 reply; 5+ messages in thread From: Gregory Maxwell @ 2018-05-25 21:13 UTC (permalink / raw) To: Pieter Wuille, Bitcoin Protocol Discussion On Fri, May 25, 2018 at 6:42 PM, Gregory Maxwell <greg@xiph.org> wrote: > configuration is roughly right, then M=1569861 and rice parameter 19 > should be used. That should have been M=784931 B=19 ... paste error. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets 2018-05-25 21:13 ` Gregory Maxwell @ 2018-05-29 22:38 ` Jim Posen 2018-05-30 3:10 ` Lucas Ontivero 0 siblings, 1 reply; 5+ messages in thread From: Jim Posen @ 2018-05-29 22:38 UTC (permalink / raw) To: pieter.wuille, Bitcoin Protocol Discussion [-- Attachment #1.1: Type: text/plain, Size: 1497 bytes --] This is a really cool finding, thanks Pieter! I did some more analysis on selecting a good P value to reduce total data downloaded considering both filters themselves and blocks in the case of false positive matches, using data from mainnet. The quantity it minimizes is: filter_size(N, B) + block_size * false_positive_probability(C, N, B) N is the number of filter elements per block B is the Golomb-Rice coding parameter C is the number of filter elements watched by the client The main result is that: For C = 10, B = 13 is optimal For C = 100, B = 16 is optimal For C = 1,000, B = 20 is optimal For C = 10,000, B = 23 is optimal So any value of B in the range 16 to 20 seems reasonable, with M = 1.4971 * 2^B for optimal compression, as Pieter derived. The selection of the parameter depends on the target number of elements that a client may watch. I attached some of the results, and would be happy to share the CSV and raw notebook if people are interested. On Fri, May 25, 2018 at 2:14 PM Gregory Maxwell via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > On Fri, May 25, 2018 at 6:42 PM, Gregory Maxwell <greg@xiph.org> wrote: > > configuration is roughly right, then M=1569861 and rice parameter 19 > > should be used. > > That should have been M=784931 B=19 ... paste error. > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > [-- Attachment #1.2: Type: text/html, Size: 3477 bytes --] [-- Attachment #2: Filter FP Rate.html --] [-- Type: text/html, Size: 297617 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets 2018-05-29 22:38 ` Jim Posen @ 2018-05-30 3:10 ` Lucas Ontivero 0 siblings, 0 replies; 5+ messages in thread From: Lucas Ontivero @ 2018-05-30 3:10 UTC (permalink / raw) To: Jim Posen, Bitcoin Protocol Discussion [-- Attachment #1: Type: text/plain, Size: 2067 bytes --] Hi Jim, Yes please, could you share CSV? We are developing a Wallet that uses Golomb-Rice filters it would help a lot for determine the best P value depending on the estimated number of elements the client needs to watch. 2018-05-29 19:38 GMT-03:00 Jim Posen via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org>: > This is a really cool finding, thanks Pieter! > > I did some more analysis on selecting a good P value to reduce total data > downloaded considering both filters themselves and blocks in the case of > false positive matches, using data from mainnet. The quantity it minimizes > is: > > filter_size(N, B) + block_size * false_positive_probability(C, N, B) > > N is the number of filter elements per block > B is the Golomb-Rice coding parameter > C is the number of filter elements watched by the client > > The main result is that: > > For C = 10, B = 13 is optimal > For C = 100, B = 16 is optimal > For C = 1,000, B = 20 is optimal > For C = 10,000, B = 23 is optimal > > So any value of B in the range 16 to 20 seems reasonable, with M = 1.4971 > * 2^B for optimal compression, as Pieter derived. The selection of the > parameter depends on the target number of elements that a client may watch. > > I attached some of the results, and would be happy to share the CSV and > raw notebook if people are interested. > > > On Fri, May 25, 2018 at 2:14 PM Gregory Maxwell via bitcoin-dev < > bitcoin-dev@lists.linuxfoundation.org> wrote: > >> On Fri, May 25, 2018 at 6:42 PM, Gregory Maxwell <greg@xiph.org> wrote: >> > configuration is roughly right, then M=1569861 and rice parameter 19 >> > should be used. >> >> That should have been M=784931 B=19 ... paste error. >> _______________________________________________ >> bitcoin-dev mailing list >> bitcoin-dev@lists.linuxfoundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev >> > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > > [-- Attachment #2: Type: text/html, Size: 4612 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-05-30 3:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-25 17:54 [bitcoin-dev] Minimizing the redundancy in Golomb Coded Sets Pieter Wuille 2018-05-25 18:42 ` Gregory Maxwell 2018-05-25 21:13 ` Gregory Maxwell 2018-05-29 22:38 ` Jim Posen 2018-05-30 3:10 ` Lucas Ontivero
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox