I believe that with correctly configured incentives, you can make it more profitable to delay some transactions with lower fees but still include them in the next block then to include them all at once. This would smooth out the inclusion of transactions.
This may require changing the difficulty scaling from using a simple logarithm to a function that first behaves like a logarithm up to some multiple of the standard block size, after which difficulty starts increasing faster and reaches a greater-than-linear ratio in expected required hash per mined bit. Perhaps tipping over at around a blocksize 3x the standard blocksize. Since the standard blocksize increases with continous load after retargeting, the blocksize at which this happens also increases.
Also, together with the above the fee pool does slightly counteract what you say, as they'll get a share via the pool when there's less transactions available the next time they create a block (like insurance).
For a user, what's the incentive to pay an individual miner the fee directly?