* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-02-26 18:40 [bitcoin-dev] A design for Probabilistic Partial Pruning Keagan McClelland
@ 2021-02-27 7:10 ` Igor Cota
2021-02-28 3:41 ` Leo Wandersleb
2021-02-27 19:19 ` David A. Harding
2021-02-27 22:09 ` Yuval Kogman
2 siblings, 1 reply; 10+ messages in thread
From: Igor Cota @ 2021-02-27 7:10 UTC (permalink / raw)
To: Keagan McClelland, Bitcoin Protocol Discussion, chike, Artur Brugeman
[-- Attachment #1: Type: text/plain, Size: 3025 bytes --]
Hi Keagan,
I had a very similar idea. The only difference being for the node to decide
on a range of blocks to keep beforehand, rather than making the decision
block-by-block like you suggest.
I felt the other nodes would be better served by ranges due to the
sequential nature of IBD. Perhaps this would be computationally lighter as
well.
I also encourage you to read Ryosuke Abe's paper [1] that proposes a DHT
scheme to solve this same problem.
Cheers,
Igor
[1] https://arxiv.org/abs/1902.02174
On Fri, 26 Feb 2021 at 21:57, Keagan McClelland via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> Hi all,
>
> I've been thinking for quite some time about the problem of pruned nodes
> and ongoing storage costs for full nodes. One of the things that strikes me
> as odd is that we only really have two settings.
>
> A. Prune everything except the most recent blocks, down to the cache size
> B. Keep everything since genesis
>
> From my observations and conversations with various folks in the
> community, they would like to be able to run a "partially" pruned node to
> help bear the load of bootstrapping other nodes and helping with data
> redundancy in the network, but would prefer to not dedicate hundreds of
> Gigabytes of storage space to the cause.
>
> This led me to the idea that a node could randomly prune some of the
> blocks from history if it passed some predicate. A rough sketch of this
> would look as follows.
>
> 1. At node startup, it would generate a random seed, this would be unique
> to the node but not necessary that it be cryptographically secure.
> 2. In the node configuration it would also carry a "threshold" expressed
> as some percentage of blocks it wanted to keep.
> 3. As IBD occurs, based off of the threshold, the block hash, and the
> node's unique seed, the node would either decide to prune the data or keep
> it. The uniqueness of the node's hash should ensure that no block is
> systematically overrepresented in the set of nodes choosing this storage
> scheme.
> 4. Once the node's IBD is complete it would advertise this as a peer
> service, advertising its seed and threshold, so that nodes could
> deterministically deduce which of its peers had which blocks.
>
> The goals are to increase data redundancy in a way that more uniformly
> shares the load across nodes, alleviating some of the pressure of full
> archive nodes on the IBD problem. I am working on a draft BIP for this
> proposal but figured I would submit it as a high level idea in case anyone
> had any feedback on the initial design before I go into specification
> levels of detail.
>
> If you have thoughts on
>
> A. The protocol design itself
> B. The barriers to put this kind of functionality into Core
>
> I would love to hear from you,
>
> Cheers,
> Keagan
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
--
*Igor Cota*
Codex Apertus d.o.o.
[-- Attachment #2: Type: text/html, Size: 4113 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-02-27 7:10 ` Igor Cota
@ 2021-02-28 3:41 ` Leo Wandersleb
2021-03-01 6:22 ` Craig Raw
0 siblings, 1 reply; 10+ messages in thread
From: Leo Wandersleb @ 2021-02-28 3:41 UTC (permalink / raw)
To: bitcoin-dev
Only headers need to be downloaded sequentially so downloading relevant blocks
from one node is totally possible with gaps in between.
On 2/27/21 4:10 AM, Igor Cota via bitcoin-dev wrote:
> Hi Keagan,
>
> I had a very similar idea. The only difference being for the node to decide on
> a range of blocks to keep beforehand, rather than making the decision
> block-by-block like you suggest.
>
> I felt the other nodes would be better served by ranges due to the sequential
> nature of IBD. Perhaps this would be computationally lighter as well.
>
> I also encourage you to read Ryosuke Abe's paper [1] that proposes a DHT
> scheme to solve this same problem.
>
> Cheers,
> Igor
>
> [1] https://arxiv.org/abs/1902.02174
>
> On Fri, 26 Feb 2021 at 21:57, Keagan McClelland via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org
> <mailto:bitcoin-dev@lists.linuxfoundation.org>> wrote:
>
> Hi all,
>
> I've been thinking for quite some time about the problem of pruned nodes
> and ongoing storage costs for full nodes. One of the things that strikes
> me as odd is that we only really have two settings.
>
> A. Prune everything except the most recent blocks, down to the cache size
> B. Keep everything since genesis
>
> From my observations and conversations with various folks in the
> community, they would like to be able to run a "partially" pruned node to
> help bear the load of bootstrapping other nodes and helping with data
> redundancy in the network, but would prefer to not dedicate hundreds of
> Gigabytes of storage space to the cause.
>
> This led me to the idea that a node could randomly prune some of the
> blocks from history if it passed some predicate. A rough sketch of this
> would look as follows.
>
> 1. At node startup, it would generate a random seed, this would be unique
> to the node but not necessary that it be cryptographically secure.
> 2. In the node configuration it would also carry a "threshold" expressed
> as some percentage of blocks it wanted to keep.
> 3. As IBD occurs, based off of the threshold, the block hash, and the
> node's unique seed, the node would either decide to prune the data or keep
> it. The uniqueness of the node's hash should ensure that no block is
> systematically overrepresented in the set of nodes choosing this storage
> scheme.
> 4. Once the node's IBD is complete it would advertise this as a peer
> service, advertising its seed and threshold, so that nodes could
> deterministically deduce which of its peers had which blocks.
>
> The goals are to increase data redundancy in a way that more uniformly
> shares the load across nodes, alleviating some of the pressure of full
> archive nodes on the IBD problem. I am working on a draft BIP for this
> proposal but figured I would submit it as a high level idea in case anyone
> had any feedback on the initial design before I go into specification
> levels of detail.
>
> If you have thoughts on
>
> A. The protocol design itself
> B. The barriers to put this kind of functionality into Core
>
> I would love to hear from you,
>
> Cheers,
> Keagan
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> <mailto:bitcoin-dev@lists.linuxfoundation.org>
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
> --
> *Igor Cota*
> Codex Apertus d.o.o.
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-02-28 3:41 ` Leo Wandersleb
@ 2021-03-01 6:22 ` Craig Raw
2021-03-01 9:37 ` eric
0 siblings, 1 reply; 10+ messages in thread
From: Craig Raw @ 2021-03-01 6:22 UTC (permalink / raw)
To: Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 4937 bytes --]
Something to consider adding to this proposal is to keep the idea of
pruning - i.e. retain a sequentially uninterrupted number of the most
recent blocks.
Many users do not run a node for entirely altruistic reasons - they do so,
at least in part, because it allows them to use their wallets privately.
Without this ability, I think the number of users willing to run their node
in this configuration might be reduced.
Another related thought is to have a decreasing density over blocks over
time as you go backwards towards genesis, in order for the data density of
the storage to match the actual usage of the network, in which (I would
imagine) more recent blocks are more heavily requested than early ones.
Craig
On Sun, Feb 28, 2021 at 10:18 AM Leo Wandersleb via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> Only headers need to be downloaded sequentially so downloading relevant
> blocks
> from one node is totally possible with gaps in between.
>
> On 2/27/21 4:10 AM, Igor Cota via bitcoin-dev wrote:
> > Hi Keagan,
> >
> > I had a very similar idea. The only difference being for the node to
> decide on
> > a range of blocks to keep beforehand, rather than making the decision
> > block-by-block like you suggest.
> >
> > I felt the other nodes would be better served by ranges due to the
> sequential
> > nature of IBD. Perhaps this would be computationally lighter as well.
> >
> > I also encourage you to read Ryosuke Abe's paper [1] that proposes a DHT
> > scheme to solve this same problem.
> >
> > Cheers,
> > Igor
> >
> > [1] https://arxiv.org/abs/1902.02174
> >
> > On Fri, 26 Feb 2021 at 21:57, Keagan McClelland via bitcoin-dev
> > <bitcoin-dev@lists.linuxfoundation.org
> > <mailto:bitcoin-dev@lists.linuxfoundation.org>> wrote:
> >
> > Hi all,
> >
> > I've been thinking for quite some time about the problem of pruned
> nodes
> > and ongoing storage costs for full nodes. One of the things that
> strikes
> > me as odd is that we only really have two settings.
> >
> > A. Prune everything except the most recent blocks, down to the cache
> size
> > B. Keep everything since genesis
> >
> > From my observations and conversations with various folks in the
> > community, they would like to be able to run a "partially" pruned
> node to
> > help bear the load of bootstrapping other nodes and helping with data
> > redundancy in the network, but would prefer to not dedicate hundreds
> of
> > Gigabytes of storage space to the cause.
> >
> > This led me to the idea that a node could randomly prune some of the
> > blocks from history if it passed some predicate. A rough sketch of
> this
> > would look as follows.
> >
> > 1. At node startup, it would generate a random seed, this would be
> unique
> > to the node but not necessary that it be cryptographically secure.
> > 2. In the node configuration it would also carry a "threshold"
> expressed
> > as some percentage of blocks it wanted to keep.
> > 3. As IBD occurs, based off of the threshold, the block hash, and the
> > node's unique seed, the node would either decide to prune the data
> or keep
> > it. The uniqueness of the node's hash should ensure that no block is
> > systematically overrepresented in the set of nodes choosing this
> storage
> > scheme.
> > 4. Once the node's IBD is complete it would advertise this as a peer
> > service, advertising its seed and threshold, so that nodes could
> > deterministically deduce which of its peers had which blocks.
> >
> > The goals are to increase data redundancy in a way that more
> uniformly
> > shares the load across nodes, alleviating some of the pressure of
> full
> > archive nodes on the IBD problem. I am working on a draft BIP for
> this
> > proposal but figured I would submit it as a high level idea in case
> anyone
> > had any feedback on the initial design before I go into specification
> > levels of detail.
> >
> > If you have thoughts on
> >
> > A. The protocol design itself
> > B. The barriers to put this kind of functionality into Core
> >
> > I would love to hear from you,
> >
> > Cheers,
> > Keagan
> > _______________________________________________
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > <mailto:bitcoin-dev@lists.linuxfoundation.org>
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> >
> >
> > --
> > *Igor Cota*
> > Codex Apertus d.o.o.
> >
> > _______________________________________________
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
[-- Attachment #2: Type: text/html, Size: 6860 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-03-01 6:22 ` Craig Raw
@ 2021-03-01 9:37 ` eric
2021-03-01 20:55 ` Keagan McClelland
0 siblings, 1 reply; 10+ messages in thread
From: eric @ 2021-03-01 9:37 UTC (permalink / raw)
To: 'Bitcoin Protocol Discussion'
[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]
On Sun, Feb 28, 2021 at 10:18 AM Leo Wandersleb via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org <mailto:bitcoin-dev@lists.linuxfoundation.org> > wrote:
> Only headers need to be downloaded sequentially so downloading relevant blocks from one node is totally possible with gaps in between.
In fact this is exactly how libbitcoin v4 works. We download and store blocks in parallel. In the case of a restart block gaps are repopulated. Given that headers are validated, we go after the most responsive nodes. Based on standard deviation, we drop the slowest peers and rebalance load to new/empty channels. We make ordered but not necessarily sequential requests. There is no distinction between “initial” block download, a restart, or a single or few blocks at the top. So it’s referred to as continuous parallel block download.
But we don’t prune. Personally I consider this counterproductive. Apart from the complexity, it’s not healthy. And the chain grows linearly with storage cost falling exponentially, leading to a straightforward conclusion.
e
[-- Attachment #2: Type: text/html, Size: 2895 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-03-01 9:37 ` eric
@ 2021-03-01 20:55 ` Keagan McClelland
0 siblings, 0 replies; 10+ messages in thread
From: Keagan McClelland @ 2021-03-01 20:55 UTC (permalink / raw)
To: eric, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 3504 bytes --]
> Personally I consider this counterproductive. Apart from the complexity,
it’s not healthy. And the chain grows linearly with storage cost falling
exponentially, leading to a straightforward conclusion.
The motivation for this change is not to encourage full archival nodes to
prune, but to make it possible for pruned nodes to beef up what kind of
archive they retain. Personally I think using the falling storage costs as
a means of providing access to more users is more important than using it
to justify requiring higher node requirements.
> Something to consider adding to this proposal is to keep the idea of
pruning - i.e. retain a sequentially uninterrupted number of the most
recent blocks.
>
> Many users do not run a node for entirely altruistic reasons - they do
so, at least in part, because it allows them to use their wallets
privately. Without this ability, I think the number of users willing to run
their node in this configuration might be reduced.
>
> Another related thought is to have a decreasing density over blocks over
time as you go backwards towards genesis, in order for the data density of
the storage to match the actual usage of the network, in which (I would
imagine) more recent blocks are more heavily requested than early ones.
Per my above comments, this change is actually capitalizing primarily upon
those who wish to do it for more altruistic reasons. Furthermore, doing
linear block scans when you need to query blocks that you don't keep does
not leak privacy details in the same way that bloom filters do. You are not
signaling to the peer that there is something specific in that block that
you care about, because you don't actually know. You are signalling only
that you do not have that block right now, which from the other parts of
the design you are already leaking. In light of this, I don't think that it
is necessary for the blocks to be in sequential sets at all. If there is no
requirement on them being sequential, uniform randomness will take care of
the density problem automatically.
Keagan
On Mon, Mar 1, 2021 at 4:20 AM Eric Voskuil via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> On Sun, Feb 28, 2021 at 10:18 AM Leo Wandersleb via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>
>
> > Only headers need to be downloaded sequentially so downloading relevant
> blocks from one node is totally possible with gaps in between.
>
>
>
> In fact this is exactly how libbitcoin v4 works. We download and store
> blocks in parallel. In the case of a restart block gaps are repopulated.
> Given that headers are validated, we go after the most responsive nodes.
> Based on standard deviation, we drop the slowest peers and rebalance load
> to new/empty channels. We make ordered but not necessarily sequential
> requests. There is no distinction between “initial” block download, a
> restart, or a single or few blocks at the top. So it’s referred to as
> continuous parallel block download.
>
>
>
> But we don’t prune. Personally I consider this counterproductive. Apart
> from the complexity, it’s not healthy. And the chain grows linearly with
> storage cost falling exponentially, leading to a straightforward conclusion.
>
>
>
> e
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
[-- Attachment #2: Type: text/html, Size: 4797 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-02-26 18:40 [bitcoin-dev] A design for Probabilistic Partial Pruning Keagan McClelland
2021-02-27 7:10 ` Igor Cota
@ 2021-02-27 19:19 ` David A. Harding
2021-02-27 23:37 ` David A. Harding
2021-02-27 22:09 ` Yuval Kogman
2 siblings, 1 reply; 10+ messages in thread
From: David A. Harding @ 2021-02-27 19:19 UTC (permalink / raw)
To: Keagan McClelland, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 2324 bytes --]
On Fri, Feb 26, 2021 at 11:40:35AM -0700, Keagan McClelland via bitcoin-dev wrote:
> Hi all,
Hi Keagan,
> 4. Once the node's IBD is complete it would advertise this as a peer
> service, advertising its seed and threshold, so that nodes could
> deterministically deduce which of its peers had which blocks.
Although some of the details differed, I believe this general idea of
sharded block storage was previously discussed in the context of BIP159,
which warns:
"Peers may have different prune depths (depending on the peers
configuration, disk space, etc.) which can result in a
fingerprinting weakness (finding the prune depth through getdata
requests). NODE_NETWORK_LIMITED supporting peers SHOULD avoid
leaking the prune depth and therefore not serve blocks deeper than
the signaled NODE_NETWORK_LIMITED threshold (288 blocks)."
- BIP: https://github.com/bitcoin/bips/blob/master/bip-0159.mediawiki#counter-measures-for-peer-fingerprinting
- Discussion thread 1: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014186.html
- Discussion thread 2: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-May/014314.html
- Discussion thread 2, continued: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-April/014186.html
- BIP159-related PR, review comments: https://github.com/bitcoin/bitcoin/pull/10387
> If you have thoughts on
>
> A. The protocol design itself
> B. The barriers to put this kind of functionality into Core
>
> I would love to hear from you,
I think it would be unlikely for any popular node software to adopt a
technique that could make specific nodes easily fingerprintable on an
ongoing basis unless it solved some other urgent problem. Luke Dashjr's
rough data collection currently shows 5,629 archival listening nodes,[1]
which is a substantial fraction of the roughly 10,000 listening nodes
reported by Addy Yeow,[2] so I don't think we're near the point of
needing to worry about the unavailability of historic blocks.
[1] https://luke.dashjr.org/programs/bitcoin/files/charts/services.html
[2] https://bitnodes.io/dashboard/
However, if there's a reasonable solution to the fingerprinting problem,
I do think node developers would find that very interesting.
-Dave
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bitcoin-dev] A design for Probabilistic Partial Pruning
2021-02-26 18:40 [bitcoin-dev] A design for Probabilistic Partial Pruning Keagan McClelland
2021-02-27 7:10 ` Igor Cota
2021-02-27 19:19 ` David A. Harding
@ 2021-02-27 22:09 ` Yuval Kogman
2021-02-27 22:13 ` Yuval Kogman
2 siblings, 1 reply; 10+ messages in thread
From: Yuval Kogman @ 2021-02-27 22:09 UTC (permalink / raw)
To: Keagan McClelland, Bitcoin Protocol Discussion
On Fri, 26 Feb 2021 at 20:57, Keagan McClelland via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> The goals are to increase data redundancy in a way that more uniformly shares the load across nodes, alleviating some of the pressure of full archive nodes on the IBD problem. I am working on a draft BIP for this proposal but figured I would submit it as a high level idea in case anyone had any feedback on the initial design before I go into specification levels of detail.
You might be interested in an approach (henceforth "SeF") by Swanand
Kadhe, Jichan Chung and Kannan Ramchandran which employs fountain
codes, presented at Scaling Bitcoin 2019:
https://arxiv.org/abs/1906.12140
From a cursory search it appears there is quite a bit of
related/followup work as well.
The simplest fountain code, the Luby Transform (applied in this work)
the encoder divides a message into smaller chunks, and then constructs
an infinite stream of codewords which are XORs of d randomly selected
chunks where d is sampled from the robust soliton distribution. The
simplest decoder simply XORs new k=1 codewords from the relevant k>1
codewords, and the full message can be recovered with overwhelming
probability and in linear time with a sufficiently large random sample
of codewords from the encoded stream. Note that the decoder must know
which chunks went into a codeword, but this is usually addressed using
pseudorandomness, which has other motivations in an adversarial
setting.
In SeF, the general idea is that "droplet nodes" are pruning nodes
which retain some number (equivalent to your threshold parameter) of
codewords from the encoding concatenated blocks (to obtain a fixed
message size), and serve these to compatible clients. This is more
robust than retaining a random sample of blocks, and also performs
well according to their simulations.
Even if this paper is not directly applicable to your efforts, the
theory of fountain codes should be of interest (many variants exist),
and there is work on fountain codes. There is also some work on
fountain codes in an adversarial setting (Falcon codes) but I'm only
vaguely familiar with it, and if i'm not mistaken most of the
considerations are either already implicitly addressed by the Bitcoin
protocol or explicitly addressed in the SeF paper, whose results also
take into account malicious droplet nodes.
^ permalink raw reply [flat|nested] 10+ messages in thread