On Sun, Apr 28, 2013 at 6:29 PM, Mike Hearn <mike@plan99.net> wrote:
I'd imagined that nodes would be able to pick their own ranges to keep rather than have fixed chosen intervals. "Everything or two weeks" is rather restrictive - presumably node operators are constrained by physical disk space, which means the quantity of blocks they would want to keep can vary with sizes of blocks, cost of storage, etc.

Sure, that's why eventually several levels may be useful.  

Adding new fields to the addr message and relaying those fields to newer nodes means every node could advertise the height at which it pruned. I know it means a longer time before the data is available everywhere vs service bits, but it seems like most nodes won't be pruning right away anyway. There's plenty of time for upgrades.

That's a more flexible model, indeed. I'm not sure how important speed of propagation will be though - it may be very slow, given that there are 100000s of IPs circulating, and only a few are relayed in one go between nodes. Even then, I'd like to see the "relay/validation" responsibility split off from the "serve historic data" one, and have separate service bits for those.
 
If an old node connected to a new node and getdata-d blocks that had been pruned, immediate disconnection should make the old node go find a different one. It means the combination of old node+not run for a long time might take a while before it can find a node that has what it wants, but that doesn't seem like a big deal.

Disconnecting in case something is requested that isn't served seems like an acceptable behaviour, yes. A specific message indicating data is pruned may be more flexible, but more complex to handle too. 

What is the use case for NODE_VALIDATE? Nodes that throw away blocks almost immediately? Why would a node do that?

NODE_VALIDATE doesn't say anything about which blocks are available, it just means it relays and validates (and thus is not an SPV node). It can be combined with NODE_BLOCKS_2016 if those blocks are also served.

The reason for splitting them is that I think over time these may be handled by different implementations. You could have stupid storage/bandwidth nodes that just keep the blockchain around, and others that validate it. Even if that doesn't happen implementation-wise, I think these are sufficiently independent functions to start thinking about them as such.

-- 
Pieter