As you point out, depending on the mempool, sometimes a miner makes more fee by including A and B, while other times a miner makes more fee by including C (the replacement for A and B) and D (a hypothetical transaction that cannot be fit into a block that contains A and B but can be fit into a block with C.
So what are we to make of this? Is it better to relay C or better to not relay C?
Clearly it is better for the miner if they know about C, because knowing about C costs them nothing, but not knowing about C will sometimes result in them earning less fees.
Clearly it is better for the people who are creating C for those transactions to be mined instead of the more expensive A and B transactions.
Everyone else is better off in that more transactions would get included in blocks.
A concern about burdening full nodes with extra transactions to relay that may not be more profitable to mine than the transactions they replace is still rational -- though intuitively it seems like there would be a limit on how many times an attacker could cheaply reorganize transactions into something with a higher fee rate.
Perhaps there are also concerns with reconstruction of blocks from compact blocks, given that miners would have more decisions to make about which tx to include?