public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Eric Voskuil <eric@voskuil.org>
To: Tomas <tomas@tomasvdw.nl>,
	Bitcoin Protocol Discussion
	<bitcoin-dev@lists.linuxfoundation.org>,
	Libbitcoin Development <libbitcoin@lists.dyne.org>
Subject: Re: [bitcoin-dev] Using a storage engine without UTXO-index
Date: Thu, 6 Apr 2017 16:38:23 -0700	[thread overview]
Message-ID: <eebc06a4-5ab8-46a8-2f50-a472cb57a775@voskuil.org> (raw)
In-Reply-To: <1491516747.3791700.936828232.69F82904@webmail.messagingengine.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 04/06/2017 03:12 PM, Tomas via bitcoin-dev wrote:

Hi Tomas,

> I have been working on a bitcoin implementation that uses a
> different approach to indexing for verifying the order of
> transactions. Instead of using an index of unspent outputs, double
> spends are verified by using a spend-tree where spends are scanned
> against spent outputs instead of unspent outputs.

This is the approach that genjix used in libbitcoin version2. With the
exception of de-linking (not deleted) in the case of reorgs, the
entire store is append only, implemented in a small set of memory
mapped files. The downsides to the approach are:

(1) higher than necessary storage space requirement due to storing the
indexing data required for correlate the spends, and

(2) higher than necessary validation complexity and cost in terms of
computing the spent-ness (including spender height) of an output.

His implementation used a hash table, so performance-wise it did quite
well and would theoretically outperform a tree, O(1) vs. O(log2(N)).

> This allows for much better concurrency, as not only blocks, but
> also individual inputs can be verified fully in parallel.

I was successful in parallelizing input validation (across the inputs
of an unconfirmed tx and across the set of all inputs in a block)
using the v2 store. However, it is not the case that the spends
approach is necessary for concurrency.

To resolve the above two problems the version3 store does not use a
spends table/index. Nor does it store any table of UTXOs. Yet
validation is highly parallelized. Instead of additional indexes it
uses the tx hash table, augmented with 32 bits per output for spender
height. So there is a O(1) cost of finding the tx and a O(N) cost of
finding the spender height where N is the number of outputs in the tx.
But because the number of outputs in a tx is bounded (by block size)
this is constant time in the number of transactions.

This works out much faster than the spends table, and without the
storage cost or complexity disadvantages. It also scales with
available hardware, as the memory mapped files become in-memory hash
tables. For low memory machines we found it was important to implement
an opaque UTXO cache to limit paging, but for higher end systems zero
cache is optimal.

> I am sharing this not only to ask for your feedback, but also to
> call for a clear separation of protocol and implementations: As
> this solution, reversing the costs of outputs and inputs, seems to
> have excellent performance characteristics (as shown in the test
> results), updates to the protocol addressing the UTXO growth, might
> not be worth considering *protocol improvements* and it might be
> best to address these concerns as implementation details.

I don't follow this part, maybe you could clarify. A spends index
grows with the size of the spend set (forever) as it cannot be pruned,
which certainly exceeds the size of the UTXO set (unless nothing is
spent). The advantage is that you don't have to keep rewriting the
store when you use a spends set (because the store can be append only).

Feel free to message me if you'd like to discuss in more detail, or to
continue on the libbitcoin mailing list (copied).

e
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCAAGBQJY5tFpAAoJEDzYwH8LXOFOcMgH/2mw5iOvUYNwvZ2z0KKTSUOA
Pd8d5mKoWvd94QxhQ+RyTbkEkMhHl75+zcBgRsfUTtZlBIe/Z0+OgVIN6ibEw+WD
w7k3HqgQi9gLgydEelxTAX+z3dJ24n4kCCdKAmZbBuK+Yr/7AViugbEqYemKepku
pRWZZS74MUvrYesc0xPn4Ao3DTzMjjY0K2mkuqV8jlwdfZjlAQX9pTx+iSCuMhkd
HJ8w7s8QnjVnUeOlLe29mZwaFJPyOTLJMqgDE6s2sXacAy5QQbVCatygvDQ8A/wC
ktBnKPFb2lGX3bGKu/KwABegBy/hyec+NP0wFR+0MVivCwTK1+SjeHu5MNOSVlM=
=tfVj
-----END PGP SIGNATURE-----


  reply	other threads:[~2017-04-06 23:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 22:12 [bitcoin-dev] Using a storage engine without UTXO-index Tomas
2017-04-06 23:38 ` Eric Voskuil [this message]
2017-04-07  0:17   ` Tomas
2017-04-08 22:37     ` Eric Voskuil
2017-04-08 23:58       ` Tomas
2017-04-11  1:44         ` Eric Voskuil
2017-04-11  8:43           ` Tomas
2017-04-11  9:41             ` Eric Voskuil
2017-04-11 10:04               ` Tomas
     [not found] ` <CAAS2fgTEMCkDWdhCWt1EsUrnt3+Z_8m+Y1PTsff5Rc0CBnCKWQ@mail.gmail.com>
2017-04-07  0:48   ` Tomas
2017-04-07  1:09     ` Gregory Maxwell
2017-04-07  1:29       ` Tomas
2017-04-07 18:52         ` Tom Harding
2017-04-07 19:42           ` Gregory Maxwell
2017-04-08 18:27             ` Tom Harding
2017-04-08 19:23               ` Tomas
2017-04-07  7:55 ` Marcos mayorga
2017-04-07  8:47   ` Tomas
2017-04-07 14:14     ` Greg Sanders
2017-04-07 16:02       ` Tomas
2017-04-07 18:18 ` Gregory Maxwell
2017-04-07 18:39   ` Bram Cohen
2017-04-07 19:55     ` Eric Voskuil
2017-04-07 21:44       ` Tomas
2017-04-07 23:51         ` Eric Voskuil
2017-04-07 21:14     ` Tomas
2017-04-08  0:44       ` Gregory Maxwell
2017-04-08  7:28         ` Tomas
2017-04-08 19:23           ` Johnson Lau
2017-04-08 19:56             ` Tomas
2017-04-08 20:21               ` Johnson Lau
2017-04-08 20:42                 ` Tomas
2017-04-08 22:12                 ` Gregory Maxwell
2017-04-08 22:34                   ` Tomas
2017-04-08 21:22     ` Troy Benjegerdes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eebc06a4-5ab8-46a8-2f50-a472cb57a775@voskuil.org \
    --to=eric@voskuil.org \
    --cc=bitcoin-dev@lists.linuxfoundation.org \
    --cc=libbitcoin@lists.dyne.org \
    --cc=tomas@tomasvdw.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox