I think it’s just a tradition to use double SHA256. One reason we might want to keep dSHA256 is a blind signature might be done by giving only the single SHA256 hash to the signer. At the same time, a non-Bitcoin signature scheme might use SHA512-SHA256. So a blind signer could distinguish the message type without learning the message.
I make it a 0x01000000 at the end of the message because the last 4 bytes has been the nHashType in the legacy/BIP143 protocol. Since the maximum legacy nHashType is 0xff, no collision could ever occur.
Putting a 64-byte constant at the beginning should also work, since a collision means SHA256 is no longer preimage resistance. I don’t know much about SHA256 optimisation. How good it is as we put a 64-byte constant at the beginning, while we also make the message 64-byte longer?
For CHECKSIGFROMSTACK (CSFS), I think the question is whether we want to make it as a separate opcode, or combine that with CHECKSIG. If it is a separate opcode, I think it should be a separate BIP. If it is combined with CHECKSIG, we could do something like this: If the bit 10 of SIGHASH2 is set, CHECKSIG will pop one more item from stack, and serialize its content with the transaction digest. Any thought?