I created a draft BIP detailing a way to add auxiliary headers to Bitcoin in a bandwidth efficient way. The overhead per auxiliary header is only around 104 bytes per header. This is much smaller than would be required by embedding the hash of the header in the coinbase of the block.
It is a soft fork and it uses the last transaction in the block to store the hash of the auxiliary header.
It makes use of the fact that the last transaction in the block has a much less complex Merkle branch than the other transactions.