So I checked filter sizes (as a proportion of block size) for each of the sub-filters. The graph is attached.
As interpretation, the first ~120,000 blocks are so small that the Golomb-Rice coding can't compress the filters that well, which is why the filter sizes are so high proportional to the block size. Except for the input filter, because the coinbase input is skipped, so many of them have 0 elements. But after block 120,000 or so, the filter compression converges pretty quickly to near the optimal value. The encouraging thing here is that if you look at the ratio of the combined size of the separated filters vs the size of a filter containing all of them (currently known as the basic filter), they are pretty much the same size. The mean of the ratio between them after block 150,000 is 99.4%. So basically, not much compression efficiently is lost by separating the basic filter into sub-filters.