Hi all,
I'm still doing a little more investigation before opening up a
formal bip PR, but getting close. Here are some more findings.
After moving the compression from main.cpp to streams.h
(CDataStream) it was a simple matter to add compression to
transactions as well. Results as follows:
range = block size range
ubytes = average size of uncompressed transactions
cbytes = average size of compressed transactions
cmp_ratio% = compression ratio
datapoints = number of datapoints taken
range |
ubytes |
cbytes |
cmp_ratio% |
datapoints |
0-250b |
220 |
227 |
-3.16 |
23780 |
250-500b |
356 |
354 |
0.68 |
20882 |
500-600 |
534 |
505 |
5.29 |
2772 |
600-700 |
653 |
608 |
6.95 |
1853 |
700-800 |
757 |
649 |
14.22 |
578 |
800-900 |
822 |
758 |
7.77 |
661 |
900-1KB |
954 |
862 |
9.69 |
906 |
1KB-10KB |
2698 |
2222 |
17.64 |
3370 |
10KB-100KB |
15463 |
12092 |
21.8 |
15429 |
A couple of obvious observations. Transactions don't compress
well below 500 bytes but do very well beyond 1KB where there are a
great deal of those large spam type transactions. However, most
transactions happen to be in the < 500 byte range. So the next
step was to appy bundling, or the creating of a "blob" for those
smaller transactions, if and only if there are multiple tx's in
the getdata receive queue for a peer. Doing that yields some very
good compression ratios. Some examples as follows:
The best one I've seen so far was the following where 175
transactions were bundled into one blob before being compressed.
That yielded a 20% compression ratio, but that doesn't take into
account the savings from the unneeded 174 message headers (24
bytes each) as well as 174 TCP ACK's of 52 bytes each which yields
and additional 76*174=13224 bytes, making the overall bandwidth
savings 32%, in this particular case.
2015-11-18 01:09:09.002061 compressed blob from 79890 to 67426
txcount:175
To be sure, this was an extreme example. Most transaction blobs
were in the 2 to 10 transaction range. Such as the following:
2015-11-17 21:08:28.469313 compressed blob from 3199 to 2876
txcount:10
But even here the savings are 10%, far better than the "nothing"
we would get without bundling, but add to that the 76 byte * 9
transaction savings and we have a total 20% savings in bandwidth
for transactions that otherwise would not be compressible.
The same bundling was applied to blocks and very good compression
ratios are seen when sync'ing the blockchain.
Overall the bundling or blobbing of tx's and blocks seems to be a
good idea for improving bandwith use but also there is a
scalability factor here, when the system is busy, transactions are
bundled more often, compressed, sent faster, keeping message queue
and network chatter to a minimum.
I think I have enough information to put together a formal BIP
with the exception of which compression library to implement.
These tests were done using ZLib but I'll also be running tests in
the coming days with LZO (Jeff Garzik's suggestion) and perhaps
Snappy. If there are any other libraries that people would like
me to get results for please let me know and I'll pick maybe the
top 2 or 3 and get results back to the group.
On 13/11/2015 1:58 PM, Peter Tschipper wrote: