Take care, here-- if a scheme is used where e.g. the full solution had
to be exactly identical to a prior weak block then the result would be
making mining not progress free because bigger miners would have
disproportionately more access to the weak/strong one/two punch. I
think what you're thinking here is okay, but it wasn't clear to me if
you'd caught that particular potential issue.
I'm assuming the optimized protocol would be forward-error-coded (e.g. using IBLTs) and NOT require the full solution (or follow-on weak blocks) to be exactly the same.
One possible improvement on this is to cache Merkle nodes/subtrees. When a weak block is created, nodes could cache the hashes for the Merkle nodes along with each node's children. A miner could then describe their block in terms of Merkle nodes (describing groups of 2^n transactions), which would give them the ability to copy e.g. 87.5% or 96.875% or whatever of their new block from someone else's weak block but with a few modifications (e.g. new transactions) in the remaining non-prespecified portion. This gives small miners the ability to trade off versatility (do I specify all of the transactions with my own Merkle structure?) versus propagation speed (do I copy all of my Merkle tree from another miner's weak block?) with all steps in between.
I've got a proposal for a block propagation protocol inspired by bittorrent applied to the Merkle tree instead of chunks of a file. Weak blocks would fit in with this blocktorrent protocol quite nicely by caching and pre-transmitting Merkle nodes. I don't want to hijack this thread, so I'll post it under a separate subject in an hour or so.