Thank you for your feedback AJ and Riccardo.

Nice observation about using nBits from every 2016th block as a short specifier of chain work. You can get some savings from the 4 byte nBits encoding over VLQ for total chain work as in my spec.

I tried it out on the current chain. At block height 516,387, there are 258 total checkpoints in the response payload with an interval of 2016. The size of the checkpts message is:

- 9,304 bytes using hash + nBits

- 10,934 bytes using hash + chain work delta encoded as VLQ

- 11,030 bytes using hash + chain work total encoded as VLQ

The saving from using deltas instead of the total seems negligible to me especially considering the additional computation it requires. Going from total chain work as VLQ to nBits is a 16% savings in the size of a checkpts message. According to some rather rough benchmarks, it takes ~3us to generate the message with nBits versus ~105us to generate each message with VLQ chain work (including block index lookups and serialization time).

The downside, however, is that the new P2P message would be tightly coupled to a specific parameter in Bitcoin's consensus protocol, and one that is changed in many alt chains. Also, it would require that checkpoints can only be fetched at intervals of 2016, instead of intervals chosen by the clients. Being able to specify the interval is a very nice property for longer chains, where a client may select really large intervals, then bisect that range even further to request a smaller PoW sample (eg. start by fetching every 10,000th, then every 100th).

Personally, I strongly think using total chain work instead of nBits is the right tradeoff and is worth the extra 1KB. I'm curious to hear others' opinions. Note that the checkpoints message is only fetched once per peer per download from genesis. Subsequent catchups only fetch checkpoints from the locator fork point. I also don't find the caching argument compelling -- the time to generate checkpts response messages is fast enough anyway.

I also finally got around to pulling numbers on the space savings from the nVersion omission. As a reminder of how this works, three bits in the encoding indicator represent a value 1-7 of the distance in block height since another block with the same version. Looking at the current Bitcoin main chain, this is a table of the occurrences of these values:

Height distance	# of Blocks
1	469537
2	22301
3	8833
4	4368
5	2633
6	1630
7	1114
8+	5967

You can read this as "469,537 blocks have the same version as their parent", "22,301 have the same version as their parent's parent", etc. Given the information in this table, we may consider only allocating 2 bits in the encoding header rather than 3.

On Fri, Mar 30, 2018 at 1:06 AM, Riccardo Casatta <riccardo.casatta@gmail.com> wrote:

Yes, I think the checkpoints and the compressed headers streams should be handled in chunks of 2016 headers and queried by chunk number instead of height, falling back to current method if the chunk is not full yet.

This is cache friendly and allows to avoid bit 0 and bit 1 in the bitfield (because they are always 1 after the first header in the chunk of 2016).

2018-03-30 8:14 GMT+02:00 Anthony Towns <aj@erisian.com.au>:
On Thu, Mar 29, 2018 at 05:50:30PM -0700, Jim Posen via bitcoin-dev wrote:
> Taken a step further though, I'm really interested in treating the checkpoints
> as commitments to chain work [...]

In that case, shouldn't the checkpoints just be every 2016 blocks and
include the corresponding bits value for that set of blocks?

That way every node commits to (approximately) how much work their entire
chain has by sending something like 10kB of data (currently), and you
could verify the deltas in each node's chain's target by downloading the
2016 headers between those checkpoints (~80kB with the proposed compact
encoding?) and checking the timestamps and proof of work match both the
old target and the new target from adjacent checkpoints.

(That probably still works fine even if there's a hardfork that allows
difficulty to adjust more frequently: a bits value at block n*2016 will
still enforce *some* lower limit on how much work blocks n*2016+{1..2016}
will have to contribute; so will still allow you to estimate how much work
will have been done, it may just be less precise than the estimate you could
generate now)

Cheers,
aj

--
Riccardo Casatta - @RCasatta