Thank you for your feedback AJ and Riccardo.
Nice observation about using nBits from every 2016th block as a short specifier of chain work. You can get some savings from the 4 byte nBits encoding over VLQ for total chain work as in my spec.
I tried it out on the current chain. At block height 516,387, there are 258 total checkpoints in the response payload with an interval of 2016. The size of the checkpts message is:
- 9,304 bytes using hash + nBits
- 10,934 bytes using hash + chain work delta encoded as VLQ
- 11,030 bytes using hash + chain work total encoded as VLQ
The saving from using deltas instead of the total seems negligible to me especially considering the additional computation it requires. Going from total chain work as VLQ to nBits is a 16% savings in the size of a checkpts message. According to some rather rough benchmarks, it takes ~3us to generate the message with nBits versus ~105us to generate each message with VLQ chain work (including block index lookups and serialization time).
The downside, however, is that the new P2P message would be tightly coupled to a specific parameter in Bitcoin's consensus protocol, and one that is changed in many alt chains. Also, it would require that checkpoints can only be fetched at intervals of 2016, instead of intervals chosen by the clients. Being able to specify the interval is a very nice property for longer chains, where a client may select really large intervals, then bisect that range even further to request a smaller PoW sample (eg. start by fetching every 10,000th, then every 100th).
Personally, I strongly think using total chain work instead of nBits is the right tradeoff and is worth the extra 1KB. I'm curious to hear others' opinions. Note that the checkpoints message is only fetched once per peer per download from genesis. Subsequent catchups only fetch checkpoints from the locator fork point. I also don't find the caching argument compelling -- the time to generate checkpts response messages is fast enough anyway.
I also finally got around to pulling numbers on the space savings from the nVersion omission. As a reminder of how this works, three bits in the encoding indicator represent a value 1-7 of the distance in block height since another block with the same version. Looking at the current Bitcoin main chain, this is a table of the occurrences of these values:
Height distance | # of Blocks |
1 | 469537 |
2 | 22301 |
3 | 8833 |
4 | 4368 |
5 | 2633 |
6 | 1630 |
7 | 1114 |
8+ | 5967 |
You can read this as "469,537 blocks have the same version as their parent", "22,301 have the same version as their parent's parent", etc. Given the information in this table, we may consider only allocating 2 bits in the encoding header rather than 3.