I can't find a link, but I've discussed this before somewhere a while ago... perhaps one of the IRC meetings? I'll see if I can't turn something up.
The main reason not to was validation performance -- we already usually compute the flat hash, so the merkle tree would be extra work for just CTV.
However, from an API perspective, I agree that a merkle tree could be superior for CTV. It does depend on use case. If you have just, say, 3 outputs, a merkle tree probably just 'gets in the way' compared to the concatenation. It is only when you have many outputs and your need to do a random-index insertion that it adds value. In many applications, you might be biased to editing the last output (e.g., change outputs?) and then SHASTREAM would allow you to O(1) edit the tail.
Best,
Jeremy