Here is a Merkle set data structure, whose format may be useful for utxo commitments in Bitcoin blocks. It may also be useful for any other distributed computation which wants an audit trail:
This is a fairly straightforward Patricia Trie, with a simple format and a simple reference implementation plus a performance optimized non-reference implementation which is much more cache coherent. It will need to be ported to C and be properly turned before the potential performance gains can be realized though.
The clever things which affect the format spec are:
It uses blake2s as the internal hash function. This is the fastest hash function to use on 512 bit inputs because blake2b uses a 1024 bit block size. It might make sense to use a hypothetical variant of blake which is optimized for 64 bits with a 512 bit block size, but that hasn't been specified. Sha256 would take advantage of hardware acceleration, but that isn't available everywhere.
Two bits of security are sacrificed to include metadata inline which halves the CPU cost of hashing.
When one side of a node is empty and the other contains exactly two things the secure hash of the child is adopted verbatim rather than rehashing it. This roughly halves the amount of hashing done, and makes it more resistant to malicious data, and cleans up some implementation details, at the cost of some extra complexity.