Hi Dave

A node that stores the full blockchain (I will use the term archival node) requires over 100GB of disk space, which I believe is one of the most significant barriers to more people running full nodes. And I believe the ecosystem would benefit substantially if more users were running full nodes.


Thanks for your proposal.

I agree that 100GB of data may be cumbersome for some systems, especially if you target end user systems (Laptops/Desktops). Though, in my opinion, for those systems, CPU consumption is the biggest UX blocker.
Bootstrapping a full node on a decent consumer system with default parameters takes days, and, during this period, you probably run at full CPU capacity and you will be disturbed by constant fan noise. Standard tasks may be impossible because your system will be slowed down to a point where even word processing may get difficult.
This is because Core (with its default settings) is made to sync as fast as possible.

Once you have verified the chain and you reach the chain tip, indeed, it will be much better (until you shutdown for a couple of days/hours and have to re-sync/catch-up).

1. I agree that we need to have a way for pruned nodes to partially serve historical blocks.
My personal measurements told me that around ~80% of historical block serving are between tip and -1’000 blocks.
Currently, Core nodes have only two modes of operations, „server all historical blocks“ or „none“.
This makes little sense especially if you prune to a target size of, lets say, 80GB (~80% of the chain).
Ideally, there would be a mode where your full node can signal a third mode „I keep the last 1000 blocks“ (or make this more dynamic).

2. Bootstrapping new peers
I’m not sure if full nodes must be the single point of historical data storage. Full nodes provide a valuable service (verification, relay, filtering, etc.). I’m not sure if serving historical blocks is one of them. Historical blocks could be made available on CDN’s or other file storage networks. You are going to verify them anyways,... the serving part is pure data storage.
I’m also pretty sure that some users have stopping running full nodes because their upstream bandwidth consumption (because of serving historical blocks) was getting intolerable.
Especially „consumer“ peers must have been hit by this (little experience in how to reduce traffic, upstream in general is bad for consumers-connections, little resources in general).

Having a second option built into full nodes (or as an external bootstrap service/app) how to download historical blocks during bootstrapping could probably be a relieve for "small nodes“.
It could be a little daemon that downloads historical blocks from CDN’s, etc. and feeds them into your full node over p2p/8333 and kickstarts your bootstrapping without bothering valuable peers.
Or, the alternative download, could be built into the full nodes main logic.
And, if it wasn’t obvious, this must not bypass the verification!

I’m also aware of the downsides of this. This can eventually reduce decentralisation of the storage of historical bitcoin blockchain data and – eventually – increase the upstream bandwidth of peers willing to serve historical blocks (especially in a transition phase to a second „download“-option).
Maybe it’s a tradeoff between reducing decentralisation by killing low resource nodes because serving historical blocks is getting too resource-intense _or_ reducing decentralisation by moving some percentage of the historical data storage away from the bitcoin p2p network.
The later seems more promising to me.


To your proposal:
- Isn’t there a tiny finger-printing element if peers have to pick an segmentation index?
- SPV bloom filter clients can’t use fragmented blocks to filter txns? Right? How could they avoid connecting to those peers?

</jonas>