public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Matt Corallo <lf-lists@mattcorallo.com>
To: Pieter Wuille <pieter.wuille@gmail.com>,
	Bitcoin Protocol Discussion
	<bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Compact Block Relay BIP
Date: Wed, 18 May 2016 01:49:10 +0000	[thread overview]
Message-ID: <573BCA16.2050704@mattcorallo.com> (raw)
In-Reply-To: <5730C37E.2000004@gmail.com>

Implemented a few of your suggestions.

Also opened a formal pull request for the BIP at
https://github.com/bitcoin/bips/pull/389 and the code at
https://github.com/bitcoin/bitcoin/pull/8068.

On 05/09/16 17:06, Pieter Wuille via bitcoin-dev wrote:
> On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
>> Hi all,
>>
>> The following is a BIP-formatted design spec for compact block relay
>> designed to limit on wire bytes during block relay. You can find the
>> latest version of this document at
>> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
> 
> Hi Matt,
> 
> thank you for working on this!
> 
>> ===New data structures===
>> Several new data structures are added to the P2P network to relay
>> compact blocks: PrefilledTransaction, HeaderAndShortIDs,
>> BlockTransactionsRequest, and BlockTransactions. Additionally, we
>> introduce a new variable-length integer encoding for use in these data
>> structures.
>>
>> For the purposes of this section, CompactSize refers to the
>> variable-length integer encoding used across the existing P2P protocol
>> to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.
> 
> This is a not, but I think it's a bit strange to have two separate
> variable length integers in the same specification. I understand is one
> is already the default for variable-length integers currently, and there
> are reasons to use the other one for efficiency reasons in some places,
> but perhaps we should aim to get everything using the latter?

Fixed, the whole thing now uses New Varints.

>> ====New VarInt====
>> Variable-length integers: bytes are a MSB base-128 encoding of the number.
>> The high bit in each byte signifies whether another digit follows. To make
>> sure the encoding is one-to-one, one is subtracted from all but the last
>> digit.
> 
> Maybe it's worth mentioning that it is based on ASN.1 BER's compressed
> integer format (see
> https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
> section 8.1.3.5), though with a small modification to make every integer
> have a single unique encoding.
> 
>> ====HeaderAndShortIDs====
>> A HeaderAndShortIDs structure is used to relay a block header, the short
>> transactions IDs used for matching already-available transactions, and a
>> select few transactions which we expect a peer may be missing.
>>
>> |shortids||List of uint64_ts||8*shortids_length bytes||Little
>> Endian||The short transaction IDs calculated from the transactions which
>> were not provided explicitly in prefilledtxn
> 
> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
> 
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
> 
>   log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
> 
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits = log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
> 34.54, or 5 byte txids would suffice.
> 
> Note that 1 in 10000 failures may sound like a lot, but this is for each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
> 
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).

I switched to 6-byte short txids.

>> ====Short transaction IDs====
>> Short transaction IDs are used to represent a transaction without
>> sending a full 256-bit hash. They are calculated by:
>> # single-SHA256 hashing the block header with the nonce appended (in
>> little-endian)
>> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
>> each corresponding 8-byte chunk of the hash from the previous step
>> # Adding each of the XORed 8-byte chunks together (in little-endian)
>> iteratively to find the short transaction ID
> 
> An alternative would be using SipHash-1-3 (a form of SipHash with
> reduced iteration counts; the default is SipHash-2-4). SipHash was
> designed as a Message Authentication Code, where the security
> requirements are much stronger than in our case (in particular, we don't
> care about observers being able to finding the key, as the key is just
> public knowledge here). One of the designers of SipHash has commented
> that SipHash-1-3 for collision resistance in hash tables may be enough:
> https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946
> 
> Using SipHash-1-3 on modern hardware would take ~32 CPU cycles per txid.

Switched to SipHash2-4.

>> ===Implementation Notes===
> 
> There are a few more heuristics that MAY be used to improve performance:
> 
> * Receivers should treat short txids in blocks that match multiple
> mempool transactions as non-matches, and request the transactions. This
> significantly reduces the failure to reconstruct.

Done.

> * When constructing a compact block to send, the sender can verify it
> against its own mempool to check for collisions, and if so, choose to
> either try another nonce, or increase the short txid length.

Additionally we should compare to the orphan pool (which apparently
helps a lot).


  parent reply	other threads:[~2016-05-18  1:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02 22:13 [bitcoin-dev] Compact Block Relay BIP Matt Corallo
2016-05-03  5:02 ` Gregory Maxwell
2016-05-06  3:09   ` Matt Corallo
2016-05-08  0:40 ` Johnathan Corgan
2016-05-08  3:24   ` Matt Corallo
2016-05-09  9:35     ` Tom Zander
2016-05-09 10:43       ` Gregory Maxwell
2016-05-09 11:32         ` Tom
     [not found]           ` <CAAS2fgR01=SfpAdHhFd_DFa9VNiL=e1g4FiguVRywVVSqFe9rA@mail.gmail.com>
2016-05-09 12:12             ` [bitcoin-dev] Fwd: " Gregory Maxwell
2016-05-09 23:37               ` [bitcoin-dev] " Peter R
2016-05-10  1:42                 ` Peter R
2016-05-10  2:12                 ` Gregory Maxwell
2016-05-09 13:40           ` Peter Todd
2016-05-09 13:57             ` Tom
2016-05-09 14:04               ` Bryan Bishop
2016-05-09 17:06 ` Pieter Wuille
2016-05-09 18:34   ` Peter R
2016-05-10  5:28   ` Rusty Russell
2016-05-10 10:07     ` Gregory Maxwell
2016-05-10 21:23       ` Rusty Russell
2016-05-11  1:12         ` Matt Corallo
2016-05-18  1:49   ` Matt Corallo [this message]
2016-05-08 10:25 Nicolas Dorier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573BCA16.2050704@mattcorallo.com \
    --to=lf-lists@mattcorallo.com \
    --cc=bitcoin-dev@lists.linuxfoundation.org \
    --cc=pieter.wuille@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox