public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] A Better MMR Definition
@ 2017-02-23  1:15 Peter Todd
  2017-02-23  3:07 ` Bram Cohen
  2017-02-23 17:53 ` Chris Priest
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Todd @ 2017-02-23  1:15 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

Reposting something that came up recently in a private discussion with some
academics:

Concretely, let's define a prunable MMR with the following grammar. This
definition is an improvement on whats in the python-proofmarshal by committing
to the number of items in the tree implicitly; an obvious max-log2(n)-sized
proof-of-tree-size can be obtained by following the right-most nodes:

    Maybe(T) := UNPRUNED <T> | PRUNED <Commitment(T)>

    FullNode(0) := <Value>
    FullNode(n) := <Maybe(FullNode(n-1)> <Maybe(FullNode(n-1))>

    PartialNode(0) := SOME <FullNode(0)> | NONE
    PartialNode(n) := <Maybe(FullNode(n-1))> <Maybe(PartialNode(n-1))>

    MMR := FULL <N> <FullNode(n)> | PARTIAL <N> <PartialNode(n)>

Basically we define it in four parts. First we define Maybe(T) to represent
pruned and unpruned (hash only) data. Secondly we define full nodes within 2^n
sized trees. Third we define partial nodes. And finally we define the MMR
itself as being either a full or partial node.

First of all, with pruning we can define a rule that if any operation (other
than checking commitment hashes) attempts to access pruned data, it should
immediately fail. In particular, no operation should be able to determine if
data is or isn't pruned. Equally, note how an implementation can keep track of
what data was accessed during any given operation, and prune the rest, which
means a proof is just the parts of the data structure accessed during one or
more operations.

With that, notice how proving the soundness of the proofs becomes trivial: if
validation is deterministic, it is obviously impossible to construct two
different proofs that prove contradictory statements, because a proof is simply
part of the data structure itself. Contradiction would imply that the two
proofs are different, but that's easily rejected by simply checking the hash of
the data.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23  1:15 [bitcoin-dev] A Better MMR Definition Peter Todd
@ 2017-02-23  3:07 ` Bram Cohen
  2017-02-23  7:41   ` Peter Todd
  2017-02-23 17:53 ` Chris Priest
  1 sibling, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-23  3:07 UTC (permalink / raw)
  To: Peter Todd, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

On Wed, Feb 22, 2017 at 5:15 PM, Peter Todd via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

>
> With that, notice how proving the soundness of the proofs becomes trivial:
> if
> validation is deterministic, it is obviously impossible to construct two
> different proofs that prove contradictory statements, because a proof is
> simply
> part of the data structure itself. Contradiction would imply that the two
> proofs are different, but that's easily rejected by simply checking the
> hash of
> the data.
>

My code works this way. Proofs are serialization of a subset of the tree,
and to validate a proof you ask a single function whether a particular
value is included in that tree subset, and it answers yes or no, so
obviously it's impossible for a single value to both validate and not
validate. The proof code was quite terrifying before I made this change
(which I did on your suggestion), and it's much cleaner and simpler now. It
also in principle supports compact proofs of multiple inclusions and
exclusions in the same serialization of a subset of the tree because the
upper branches won't have to be repeated. I haven't written code for
generating those, but the validation code will happily accept them.

I'm not sure what you mean by MMRs though. Are you talking about MMRs where
each mountain is a set of diffs to the old things and are periodically
consolidated? Or do later mountains refer to internals of earlier ones? Or
do they have 'maybe' values which mean that the earlier mountain should be
referred to? Are these patricia tries or something flatter and more fixed
depth?

My code doesn't keep track of tree size, by the way. It would be trivial to
add that functionality to the library, and including it in the hashing
creates complexity and doesn't seem to have any benefit over sending that
data in a side channel.

[-- Attachment #2: Type: text/html, Size: 2291 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23  3:07 ` Bram Cohen
@ 2017-02-23  7:41   ` Peter Todd
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Todd @ 2017-02-23  7:41 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3547 bytes --]

On Wed, Feb 22, 2017 at 07:07:08PM -0800, Bram Cohen wrote:
> On Wed, Feb 22, 2017 at 5:15 PM, Peter Todd via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> >
> > With that, notice how proving the soundness of the proofs becomes trivial:
> > if
> > validation is deterministic, it is obviously impossible to construct two
> > different proofs that prove contradictory statements, because a proof is
> > simply
> > part of the data structure itself. Contradiction would imply that the two
> > proofs are different, but that's easily rejected by simply checking the
> > hash of
> > the data.
> >
> 
> My code works this way. Proofs are serialization of a subset of the tree,
> and to validate a proof you ask a single function whether a particular
> value is included in that tree subset, and it answers yes or no, so
> obviously it's impossible for a single value to both validate and not
> validate. The proof code was quite terrifying before I made this change
> (which I did on your suggestion), and it's much cleaner and simpler now. It
> also in principle supports compact proofs of multiple inclusions and
> exclusions in the same serialization of a subset of the tree because the
> upper branches won't have to be repeated. I haven't written code for
> generating those, but the validation code will happily accept them.

That's an improvement, but I think we can do even better if we think of missing
pruned data as analogous to virtual memory: pruned data is the same as a page
that has been swapped to disk, with the magical property that hashing allows us
to swap it back in from an untrusted source.

Thus a proof should actually be whatever data we expect our counterparty to
have flushed, ranging from none at all, to 100% (modulo a root hash). An
implementation should then do operations as normal, using parts of the proof on
an as-needed basis where pruned data is encountered.

Thus if you have a key-value map and do a get() operation, you'd expect the
proof to *not* be what the get operates on, but rather be a *context* argument
to the get() operation. The other way around is actually an example of doing
computations on untrusted data, and bad API design!

> I'm not sure what you mean by MMRs though. Are you talking about MMRs where
> each mountain is a set of diffs to the old things and are periodically
> consolidated? Or do later mountains refer to internals of earlier ones? Or
> do they have 'maybe' values which mean that the earlier mountain should be
> referred to? Are these patricia tries or something flatter and more fixed
> depth?

I'm talking about these MMR's: https://github.com/proofchains/python-proofmarshal/blob/master/proofmarshal/mmr.py

Notably I'm talking about an insertion ordered list, indexed by position, that
supports append and update operations, but *not* insertions; this is different
than what you've recently published re: UTXO commitments. That's a full
key-value map, something MMR's are delibrately are not doing.

Draw out a MMR based on the formal definition you're replying too and you'll
see the new structure.

> My code doesn't keep track of tree size, by the way. It would be trivial to
> add that functionality to the library, and including it in the hashing
> creates complexity and doesn't seem to have any benefit over sending that
> data in a side channel.

Like I say above, you're solving a different problem than MMR's solve.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23  1:15 [bitcoin-dev] A Better MMR Definition Peter Todd
  2017-02-23  3:07 ` Bram Cohen
@ 2017-02-23 17:53 ` Chris Priest
  2017-02-23 18:19   ` Peter Todd
  2017-02-23 23:13   ` Bram Cohen
  1 sibling, 2 replies; 29+ messages in thread
From: Chris Priest @ 2017-02-23 17:53 UTC (permalink / raw)
  To: Peter Todd, Bitcoin Protocol Discussion

On 2/22/17, Peter Todd via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> Reposting something that came up recently in a private discussion with some
> academics:
>
> Concretely, let's define a prunable MMR with the following grammar. This
> definition is an improvement on whats in the python-proofmarshal by
> committing
> to the number of items in the tree implicitly; an obvious max-log2(n)-sized
> proof-of-tree-size can be obtained by following the right-most nodes:
>
>     Maybe(T) := UNPRUNED <T> | PRUNED <Commitment(T)>
>
>     FullNode(0) := <Value>
>     FullNode(n) := <Maybe(FullNode(n-1)> <Maybe(FullNode(n-1))>
>
>     PartialNode(0) := SOME <FullNode(0)> | NONE
>     PartialNode(n) := <Maybe(FullNode(n-1))> <Maybe(PartialNode(n-1))>
>
>     MMR := FULL <N> <FullNode(n)> | PARTIAL <N> <PartialNode(n)>
>
> Basically we define it in four parts. First we define Maybe(T) to represent
> pruned and unpruned (hash only) data. Secondly we define full nodes within
> 2^n
> sized trees. Third we define partial nodes. And finally we define the MMR
> itself as being either a full or partial node.
>
> First of all, with pruning we can define a rule that if any operation
> (other
> than checking commitment hashes) attempts to access pruned data, it should
> immediately fail. In particular, no operation should be able to determine
> if
> data is or isn't pruned. Equally, note how an implementation can keep track
> of
> what data was accessed during any given operation, and prune the rest,
> which
> means a proof is just the parts of the data structure accessed during one
> or
> more operations.
>
> With that, notice how proving the soundness of the proofs becomes trivial:
> if
> validation is deterministic, it is obviously impossible to construct two
> different proofs that prove contradictory statements, because a proof is
> simply
> part of the data structure itself. Contradiction would imply that the two
> proofs are different, but that's easily rejected by simply checking the hash
> of
> the data.
>
> --
> https://petertodd.org 'peter'[:-1]@petertodd.org
>


What problem does this try to solve, and what does it have to do with bitcoin?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23 17:53 ` Chris Priest
@ 2017-02-23 18:19   ` Peter Todd
  2017-02-23 18:28     ` G. Andrew Stone
  2017-02-23 23:13   ` Bram Cohen
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-23 18:19 UTC (permalink / raw)
  To: Chris Priest; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Thu, Feb 23, 2017 at 09:53:58AM -0800, Chris Priest wrote:
> On 2/22/17, Peter Todd via bitcoin-dev
> <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > Reposting something that came up recently in a private discussion with some
> > academics:
> >
> > Concretely, let's define a prunable MMR with the following grammar. This
> > definition is an improvement on whats in the python-proofmarshal by
> > committing
> > to the number of items in the tree implicitly; an obvious max-log2(n)-sized
> > proof-of-tree-size can be obtained by following the right-most nodes:
> 
> What problem does this try to solve, and what does it have to do with bitcoin?

See the discussion on TXO commitments for how MMR's could be used; a better MMR
makes for a better TXO commitment.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23 18:19   ` Peter Todd
@ 2017-02-23 18:28     ` G. Andrew Stone
  2017-02-23 18:31       ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: G. Andrew Stone @ 2017-02-23 18:28 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion, Peter Todd

[-- Attachment #1: Type: text/plain, Size: 1226 bytes --]

Can an insertion ordered MMR allow an efficient nonexistence proof?
On Feb 23, 2017 1:20 PM, "Peter Todd via bitcoin-dev" <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Thu, Feb 23, 2017 at 09:53:58AM -0800, Chris Priest wrote:
> > On 2/22/17, Peter Todd via bitcoin-dev
> > <bitcoin-dev@lists.linuxfoundation.org> wrote:
> > > Reposting something that came up recently in a private discussion with
> some
> > > academics:
> > >
> > > Concretely, let's define a prunable MMR with the following grammar.
> This
> > > definition is an improvement on whats in the python-proofmarshal by
> > > committing
> > > to the number of items in the tree implicitly; an obvious
> max-log2(n)-sized
> > > proof-of-tree-size can be obtained by following the right-most nodes:
> >
> > What problem does this try to solve, and what does it have to do with
> bitcoin?
>
> See the discussion on TXO commitments for how MMR's could be used; a
> better MMR
> makes for a better TXO commitment.
>
> --
> https://petertodd.org 'peter'[:-1]@petertodd.org
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 2022 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23 18:28     ` G. Andrew Stone
@ 2017-02-23 18:31       ` Peter Todd
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Todd @ 2017-02-23 18:31 UTC (permalink / raw)
  To: G. Andrew Stone; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 357 bytes --]

On Thu, Feb 23, 2017 at 01:28:18PM -0500, G. Andrew Stone wrote:
> Can an insertion ordered MMR allow an efficient nonexistence proof?

Why do you want a non-existance proof?

It supports an efficient *spentness* proof, which is sufficient for what we
need in Bitcoin, and much more scalable.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23 17:53 ` Chris Priest
  2017-02-23 18:19   ` Peter Todd
@ 2017-02-23 23:13   ` Bram Cohen
  2017-02-23 23:51     ` Peter Todd
  1 sibling, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-23 23:13 UTC (permalink / raw)
  To: Chris Priest, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1196 bytes --]

On Thu, Feb 23, 2017 at 9:53 AM, Chris Priest via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

>
> What problem does this try to solve, and what does it have to do with
> bitcoin?
>

I can't speak to MMRs (they look a bit redundant with the actual blockchain
history to my eye) but circling back to utxo commitments, the benefits are
that it enables actual proofs of non-fraud: You can prove the validity of a
block based on just the previous block (and maybe some previous headers
because of mining rewards) and can prove to a light node that a utxo hasn't
been spent yet.

A major factor in the way of getting utxo commitments in blocks is
performance. The txo set is of course vastly larger and more unwieldy. If
you make the utxo commitments trail by a small fixed number of blocks
(between 2 and 5) their latency problems shouldn't be a big deal as long as
the overall performance is good enough. My thesis is that with appropriate
format and implementation tricks it's possible to get performance good
enough to no longer be a gating factor to deployment.

Disappointingly there hasn't been any feedback about my implementation,
just discussion about merkle sets generally.

[-- Attachment #2: Type: text/html, Size: 1633 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23 23:13   ` Bram Cohen
@ 2017-02-23 23:51     ` Peter Todd
  2017-02-24  0:49       ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-23 23:51 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2947 bytes --]

On Thu, Feb 23, 2017 at 03:13:43PM -0800, Bram Cohen wrote:
> On Thu, Feb 23, 2017 at 9:53 AM, Chris Priest via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> >
> > What problem does this try to solve, and what does it have to do with
> > bitcoin?
> >
> 
> I can't speak to MMRs (they look a bit redundant with the actual blockchain
> history to my eye) but circling back to utxo commitments, the benefits are

In what way do you see MMRs as redundant?

Remember that with UTXO commitments because access patterns are uniform, you'll
over time have a lot more "redundancy" in the form of lost-coins evenly spread
out across the whole keyspace.

> that it enables actual proofs of non-fraud: You can prove the validity of a
> block based on just the previous block (and maybe some previous headers
> because of mining rewards) and can prove to a light node that a utxo hasn't
> been spent yet.
>
> A major factor in the way of getting utxo commitments in blocks is
> performance. The txo set is of course vastly larger and more unwieldy. If

That statement is incorrect with pruning: you can maintain a commitment to the
TXO set, without actually storing the entire TXO set, because you don't need to
store anything for nodes that have already been spent.

Concretely, this can be done with nothing more than adding a FullySpent node
type to the MMR definition I published earlier, with the rule being that only a
left or right child of an inner node be a FullySpent node, not both; if both
sides are spent, the inner node itself becomes FullySpent. Equally, I think you
can re-use the Empty node for this, but I need to think a little about the
implications re: partial inner nodes.

Regardless, with a generalized commitment scheme, the serialization/commitment
to an Empty node is simply '0', the encoding of an unspent txout surrounded by
spent txouts will be similar in size to a position integer followed by the
txout...


A subtlety of this construction is that you can only prove that a specific
txout # is unspent, but that's actually sufficient, as you can also prove what
# a txout txid corresponds too with a previous version of the MMR.

> you make the utxo commitments trail by a small fixed number of blocks
> (between 2 and 5) their latency problems shouldn't be a big deal as long as
> the overall performance is good enough. My thesis is that with appropriate
> format and implementation tricks it's possible to get performance good
> enough to no longer be a gating factor to deployment.
> 
> Disappointingly there hasn't been any feedback about my implementation,
> just discussion about merkle sets generally.

Well, I think at this point there's still discussion over whether or not a UTXO
set commitment is the right approach to begin with; if it's not your
implementation isn't relevant.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-23 23:51     ` Peter Todd
@ 2017-02-24  0:49       ` Bram Cohen
  2017-02-24  1:09         ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-24  0:49 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On Thu, Feb 23, 2017 at 3:51 PM, Peter Todd <pete@petertodd.org> wrote:

> On Thu, Feb 23, 2017 at 03:13:43PM -0800, Bram Cohen wrote:
> >
> > I can't speak to MMRs (they look a bit redundant with the actual
> blockchain
> > history to my eye) but circling back to utxo commitments, the benefits
> are
>
> In what way do you see MMRs as redundant?
>

You can readily prove something is in the TXO or STXO set using the actual
blockchain, and the proofs will be nice and compact because even light
nodes are expected to already have all the historical headers.

What you can't do with MMRs or the blockchain is make a compact proof that
something is still in the utxo set, which is the whole point of utxo
commitments.

It's totally reasonable for full nodes to independently update and
recalculate the utxo set as part of their validation process. The same
can't be done for a balanced version of the txo set because it's too big.
Relying on proofs as a crutch for using the full txo set would badly
exacerbate the already extant problem of miners doing spv mining, and
increase the bandwidth a full validating node had to use by a multiple.

This whole conversation is badly sidetracked. If people have comments on my
merkle set I'd like to engage further with them, but mmrs need to be argued
independently on their own merits before being used as a counterpoint to
utxo commitments.

[-- Attachment #2: Type: text/html, Size: 1848 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  0:49       ` Bram Cohen
@ 2017-02-24  1:09         ` Peter Todd
  2017-02-24  2:50           ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-24  1:09 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]

On Thu, Feb 23, 2017 at 04:49:01PM -0800, Bram Cohen wrote:
> On Thu, Feb 23, 2017 at 3:51 PM, Peter Todd <pete@petertodd.org> wrote:
> 
> > On Thu, Feb 23, 2017 at 03:13:43PM -0800, Bram Cohen wrote:
> > >
> > > I can't speak to MMRs (they look a bit redundant with the actual
> > blockchain
> > > history to my eye) but circling back to utxo commitments, the benefits
> > are
> >
> > In what way do you see MMRs as redundant?
> >
> 
> You can readily prove something is in the TXO or STXO set using the actual
> blockchain, and the proofs will be nice and compact because even light
> nodes are expected to already have all the historical headers.
> 
> What you can't do with MMRs or the blockchain is make a compact proof that
> something is still in the utxo set, which is the whole point of utxo
> commitments.

I think you've misunderstood what TXO commitments are. From my article:

"A merkle tree committing to the state of all transaction outputs, both spent
and unspent, can provide a method of compactly proving the current state of an
output."
-https://petertodd.org/2016/delayed-txo-commitments#txo-commitments:

I'm proposing that we commit to not just the set of transaction outputs, but
also the current *state* of those outputs, with the same commitment structure.

Concretely, each leaf node in the TXO commitment tree needs to commit to - at
minimum - the outpoint (txid:n) and spent/unspent status (possibly structurally
as mentioned elsewhere in this thread). It's probably also valuable to commit
to the scriptPubKey, nValue, as well, though technically that's redundant as
the txid already commits to that (there's some implementation options here).

> It's totally reasonable for full nodes to independently update and
> recalculate the utxo set as part of their validation process. The same
> can't be done for a balanced version of the txo set because it's too big.

Why would you commit to a balanced version of the TXO set? I'm proposing
committing to an insertion-ordered list, indexed by txout #.

> Relying on proofs as a crutch for using the full txo set would badly
> exacerbate the already extant problem of miners doing spv mining, and
> increase the bandwidth a full validating node had to use by a multiple.
> 
> This whole conversation is badly sidetracked. If people have comments on my
> merkle set I'd like to engage further with them, but mmrs need to be argued
> independently on their own merits before being used as a counterpoint to
> utxo commitments.

Hmm? That's exactly what I'm doing. Also, as per the above, I think you've
misunderstood what my TXO commitment proposal is.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  1:09         ` Peter Todd
@ 2017-02-24  2:50           ` Bram Cohen
  2017-02-24  2:58             ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-24  2:50 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 810 bytes --]

On Thu, Feb 23, 2017 at 5:09 PM, Peter Todd <pete@petertodd.org> wrote:

> I think you've misunderstood what TXO commitments are. From my article:
>
> "A merkle tree committing to the state of all transaction outputs, both
> spent
> and unspent, can provide a method of compactly proving the current state
> of an
> output."
> -https://petertodd.org/2016/delayed-txo-commitments#txo-commitments:
>

The proposal on that page is of a tree which does require random access
updates, it just positions entries in the order they happened to be added
instead of sorting by their hash. Once you start updating it to indicate
spent status all the exact same issues of TXO size and cache coherence on
updates show up again, but now you're using a more complex bespoke data
structure instead of a basic fundamental one.

[-- Attachment #2: Type: text/html, Size: 1276 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  2:50           ` Bram Cohen
@ 2017-02-24  2:58             ` Peter Todd
  2017-02-24  3:02               ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-24  2:58 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

On Thu, Feb 23, 2017 at 06:50:10PM -0800, Bram Cohen wrote:
> On Thu, Feb 23, 2017 at 5:09 PM, Peter Todd <pete@petertodd.org> wrote:
> 
> > I think you've misunderstood what TXO commitments are. From my article:
> >
> > "A merkle tree committing to the state of all transaction outputs, both
> > spent
> > and unspent, can provide a method of compactly proving the current state
> > of an
> > output."
> > -https://petertodd.org/2016/delayed-txo-commitments#txo-commitments:
> >
> 
> The proposal on that page is of a tree which does require random access
> updates, it just positions entries in the order they happened to be added
> instead of sorting by their hash. Once you start updating it to indicate
> spent status all the exact same issues of TXO size and cache coherence on
> updates show up again, but now you're using a more complex bespoke data
> structure instead of a basic fundamental one.

Sorry, but I was replying to your statement:

> What you can't do with MMRs or the blockchain is make a compact proof that
> something is still in the utxo set, which is the whole point of utxo
> commitments.

So to be clear, do you agree or disagree with me that you *can* extract a
compact proof from a MMR that a given output is unspent?

I just want to make sure we're on the same page here before we discuss
performance characteristics.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  2:58             ` Peter Todd
@ 2017-02-24  3:02               ` Bram Cohen
  2017-02-24  3:15                 ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-24  3:02 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

On Thu, Feb 23, 2017 at 6:58 PM, Peter Todd <pete@petertodd.org> wrote:

>
> So to be clear, do you agree or disagree with me that you *can* extract a
> compact proof from a MMR that a given output is unspent?
>

After wading through your logic on how updates are done, I agree that that
can be done, but apples to apples compact proofs can also be done in a utxo
commitment, and proofs of the validity of updates can be done in a utxo
commitment, so there isn't any performance advantage to all that extra
complexity.

[-- Attachment #2: Type: text/html, Size: 886 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  3:02               ` Bram Cohen
@ 2017-02-24  3:15                 ` Peter Todd
  2017-02-24  3:32                   ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-24  3:15 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

On Thu, Feb 23, 2017 at 07:02:36PM -0800, Bram Cohen wrote:
> On Thu, Feb 23, 2017 at 6:58 PM, Peter Todd <pete@petertodd.org> wrote:
> 
> >
> > So to be clear, do you agree or disagree with me that you *can* extract a
> > compact proof from a MMR that a given output is unspent?
> >
> 
> After wading through your logic on how updates are done, I agree that that
> can be done, but apples to apples compact proofs can also be done in a utxo
> commitment, and proofs of the validity of updates can be done in a utxo
> commitment, so there isn't any performance advantage to all that extra
> complexity.

Glad we're on the same page with regard to what's possible in TXO commitments.

Secondly, am I correct in saying your UTXO commitments scheme requires random
access? While you describe it as a "merkle set", obviously to be merkelized
it'll have to have an ordering of some kind. What do you propose that ordering
to be?

Maybe more specifically, what exact values do you propose to be in the set?

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  3:15                 ` Peter Todd
@ 2017-02-24  3:32                   ` Bram Cohen
  2017-02-24  4:36                     ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-24  3:32 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 958 bytes --]

On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <pete@petertodd.org> wrote:

>
> Glad we're on the same page with regard to what's possible in TXO
> commitments.
>
> Secondly, am I correct in saying your UTXO commitments scheme requires
> random
> access? While you describe it as a "merkle set", obviously to be merkelized
> it'll have to have an ordering of some kind. What do you propose that
> ordering
> to be?
>

The ordering is by the bits in the hash. Technically it's a Patricia Trie.
I'm using 'merkle tree' to refer to basically anything with a hash root.


> Maybe more specifically, what exact values do you propose to be in the set?
>
>
That is unspecified in the implementation, it just takes a 256 bit value
which is presumably a hash of something. The intention is to nail down a
simple format and demonstrate good performance and leave those semantics to
a higher layer. The simplest thing would be to hash together the txid and
output number.

[-- Attachment #2: Type: text/html, Size: 1570 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  3:32                   ` Bram Cohen
@ 2017-02-24  4:36                     ` Peter Todd
  2017-02-24 22:20                       ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-24  4:36 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5151 bytes --]

On Thu, Feb 23, 2017 at 07:32:43PM -0800, Bram Cohen wrote:
> On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <pete@petertodd.org> wrote:
> 
> >
> > Glad we're on the same page with regard to what's possible in TXO
> > commitments.
> >
> > Secondly, am I correct in saying your UTXO commitments scheme requires
> > random
> > access? While you describe it as a "merkle set", obviously to be merkelized
> > it'll have to have an ordering of some kind. What do you propose that
> > ordering
> > to be?
> >
> 
> The ordering is by the bits in the hash. Technically it's a Patricia Trie.
> I'm using 'merkle tree' to refer to basically anything with a hash root.

The hash of what? The values in the set?

> > Maybe more specifically, what exact values do you propose to be in the set?
> >
> >
> That is unspecified in the implementation, it just takes a 256 bit value
> which is presumably a hash of something. The intention is to nail down a
> simple format and demonstrate good performance and leave those semantics to
> a higher layer. The simplest thing would be to hash together the txid and
> output number.

Ok, so let's assume the values in the set are the unspent outpoints.

Since we're ordering by the hash of the values in the set, outpoints will be
distributed uniformly in the set, and thus the access pattern of data in the
set is uniform.

Now let's fast-forward 10 years. For the sake of argument, assume that for
every 1 UTXO in the set that corresponds to funds in someone's wallet that are
likely to be spent, there are 2^12 = 4096 UTXO's that have been permanently
lost (and/or created in spam attacks) and thus will never be spent.

Since lost UTXO's are *also* uniformly distributed, if I'm processing a new
block that spends 2^12 = 4096 UTXO's, on average for each UTXO spent, I'll
have to update log2(4096) = 12 more digests than I would have had those "dead"
UTXO's not existed.

Concretely, imagine our UTXO set had just 8 values in it, and we were updating
two of them:

               #
              / \
             /   \
            /     \
           /       \
          /         \
         #           #
        / \         / \
       /   \       /   \
      #     .     .     #
     / \   / \   / \   / \
    .   X .   . .   . X   .

To mark two coins as spent, we've had to update 5 inner nodes.


Now let's look at what happens in an insertion-ordered TXO commitment scheme.
For sake of argument, let's assume the best possible case, where every UTXO
spent in that same block was recently created. Since the UTXO's are recently
created, chances are almost every single one of those "dead" UTXO's will have
been created in the past. Thus, since this is an insertion-ordered data
structure, those UTXO's exist in an older part of the data structure that our
new block doesn't need to modify at all.

Concretely, again let's imagine a TXO commitment with 8 values in it, and two
of them being spent:

               #
              / \
             /   \
            /     \
           /       \
          /         \
         .           #
        / \         / \
       /   \       /   \
      .     .     .     #
     / \   / \   / \   / \
    .   . .   . .   . X   X

To mark two coins as spent, we've only had to update 3 inner nodes; while our
tree is higher with those lost coins, those extra inner nodes are amortised
across all the coins we have to update.


The situation gets even better when we look at the *new* UTXO's that our block
creates. Suppose our UTXO set has size n. To mark a single coin as spent, we
have to update log2(n) inner nodes. We do get to amortise this a bit at the top
levels in the tree, but even if we assume the amortisation is totally free,
we're updating at least log2(n) - log2(m) inner nodes "under" the amortised
nodes at the top of the tree for *each* new node.

Meanwhile with an insertion-ordered TXO commitment, each new UTXO added to the
data set goes in the same place - the end. So almost none of the existing data
needs to be touched to add the new UTXOs. Equally, the hashing required for the
new UTXO's can be done in an incremental fashion that's very L1/L2 cache
friendly.


tl;dr: Precisely because access patterns in TXO commitments are *not* uniform,
I think we'll find that from a L1/L2/etc cache perspective alone, TXO
commitments will result in better performance than UTXO commitments.


Now it is true that Bitcoin's current design means we'll need a map of
confirmed outpoints to TXO insertion order indexes. But it's not particularly
hard to add that "metadata" to transactions on the P2P layer in the same way
that segwit added witnesses to transactions without modifying how txids were
calculated; if you only connect to peers who provide you with TXO index
information in blocks and transactions, you don't need to keep that map
yourself.

Finally, note how this makes transactions *smaller* in many circumstances: it's
just a 8-byte max index rather than a 40 byte outpoint.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24  4:36                     ` Peter Todd
@ 2017-02-24 22:20                       ` Bram Cohen
  2017-02-25  4:12                         ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-24 22:20 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 7156 bytes --]

So your idea is to cluster entries by entry time because newer things are
more likely to leave and updating multiple things near each other is
cheaper?

That can be done with my tool. Instead of using hashes for the values being
stored, you use position entries. The first entry gets a value of all
zeros, the next one a one followed by all zeros, then the next two
correspond to the first two with the second bit flipped to one, then the
next four the first four with the third bit flipped to one, etc. It
probably performs a little bit better to do it two bits at a time instead
of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101,
0110, 0111, 1001, etc. If you were to really use this you'd probably want
to to add some optimizations to use the fact that the terminals fit in 64
bits instead of 256, but it mostly works unchanged, and gets whatever
benefits there are to this clustering plus the high performance
implementation tricks I've built which I keep complaining that nobody's
giving feedback on.

I'm not sold on this being a win: The empirical access patterns are
unknown, it requires an extra cache miss per lookup to find the entry
number, it may be that everything is optimized well enough without it for
there to be no meaningful gains, and it's a bunch of extra complexity. What
should be done is that a plain vanilla UTXO set solution is optimized as
well as it can be first, and then the insertion ordering trick is tried as
an optimization to see if it's an improvement. Without that baseline
there's no meaningful basis for comparison, and I'm quite confident that a
naive implementation which just allocates individual nodes will
underperform the thing I've come up with, even without adding optimizations
related to fitting in 64 bits.

On Thu, Feb 23, 2017 at 8:36 PM, Peter Todd <pete@petertodd.org> wrote:

> On Thu, Feb 23, 2017 at 07:32:43PM -0800, Bram Cohen wrote:
> > On Thu, Feb 23, 2017 at 7:15 PM, Peter Todd <pete@petertodd.org> wrote:
> >
> > >
> > > Glad we're on the same page with regard to what's possible in TXO
> > > commitments.
> > >
> > > Secondly, am I correct in saying your UTXO commitments scheme requires
> > > random
> > > access? While you describe it as a "merkle set", obviously to be
> merkelized
> > > it'll have to have an ordering of some kind. What do you propose that
> > > ordering
> > > to be?
> > >
> >
> > The ordering is by the bits in the hash. Technically it's a Patricia
> Trie.
> > I'm using 'merkle tree' to refer to basically anything with a hash root.
>
> The hash of what? The values in the set?
>
> > > Maybe more specifically, what exact values do you propose to be in the
> set?
> > >
> > >
> > That is unspecified in the implementation, it just takes a 256 bit value
> > which is presumably a hash of something. The intention is to nail down a
> > simple format and demonstrate good performance and leave those semantics
> to
> > a higher layer. The simplest thing would be to hash together the txid and
> > output number.
>
> Ok, so let's assume the values in the set are the unspent outpoints.
>
> Since we're ordering by the hash of the values in the set, outpoints will
> be
> distributed uniformly in the set, and thus the access pattern of data in
> the
> set is uniform.
>
> Now let's fast-forward 10 years. For the sake of argument, assume that for
> every 1 UTXO in the set that corresponds to funds in someone's wallet that
> are
> likely to be spent, there are 2^12 = 4096 UTXO's that have been permanently
> lost (and/or created in spam attacks) and thus will never be spent.
>
> Since lost UTXO's are *also* uniformly distributed, if I'm processing a new
> block that spends 2^12 = 4096 UTXO's, on average for each UTXO spent, I'll
> have to update log2(4096) = 12 more digests than I would have had those
> "dead"
> UTXO's not existed.
>
> Concretely, imagine our UTXO set had just 8 values in it, and we were
> updating
> two of them:
>
>                #
>               / \
>              /   \
>             /     \
>            /       \
>           /         \
>          #           #
>         / \         / \
>        /   \       /   \
>       #     .     .     #
>      / \   / \   / \   / \
>     .   X .   . .   . X   .
>
> To mark two coins as spent, we've had to update 5 inner nodes.
>
>
> Now let's look at what happens in an insertion-ordered TXO commitment
> scheme.
> For sake of argument, let's assume the best possible case, where every UTXO
> spent in that same block was recently created. Since the UTXO's are
> recently
> created, chances are almost every single one of those "dead" UTXO's will
> have
> been created in the past. Thus, since this is an insertion-ordered data
> structure, those UTXO's exist in an older part of the data structure that
> our
> new block doesn't need to modify at all.
>
> Concretely, again let's imagine a TXO commitment with 8 values in it, and
> two
> of them being spent:
>
>                #
>               / \
>              /   \
>             /     \
>            /       \
>           /         \
>          .           #
>         / \         / \
>        /   \       /   \
>       .     .     .     #
>      / \   / \   / \   / \
>     .   . .   . .   . X   X
>
> To mark two coins as spent, we've only had to update 3 inner nodes; while
> our
> tree is higher with those lost coins, those extra inner nodes are amortised
> across all the coins we have to update.
>
>
> The situation gets even better when we look at the *new* UTXO's that our
> block
> creates. Suppose our UTXO set has size n. To mark a single coin as spent,
> we
> have to update log2(n) inner nodes. We do get to amortise this a bit at
> the top
> levels in the tree, but even if we assume the amortisation is totally free,
> we're updating at least log2(n) - log2(m) inner nodes "under" the amortised
> nodes at the top of the tree for *each* new node.
>
> Meanwhile with an insertion-ordered TXO commitment, each new UTXO added to
> the
> data set goes in the same place - the end. So almost none of the existing
> data
> needs to be touched to add the new UTXOs. Equally, the hashing required
> for the
> new UTXO's can be done in an incremental fashion that's very L1/L2 cache
> friendly.
>
>
> tl;dr: Precisely because access patterns in TXO commitments are *not*
> uniform,
> I think we'll find that from a L1/L2/etc cache perspective alone, TXO
> commitments will result in better performance than UTXO commitments.
>
>
> Now it is true that Bitcoin's current design means we'll need a map of
> confirmed outpoints to TXO insertion order indexes. But it's not
> particularly
> hard to add that "metadata" to transactions on the P2P layer in the same
> way
> that segwit added witnesses to transactions without modifying how txids
> were
> calculated; if you only connect to peers who provide you with TXO index
> information in blocks and transactions, you don't need to keep that map
> yourself.
>
> Finally, note how this makes transactions *smaller* in many circumstances:
> it's
> just a 8-byte max index rather than a 40 byte outpoint.
>
> --
> https://petertodd.org 'peter'[:-1]@petertodd.org
>

[-- Attachment #2: Type: text/html, Size: 8700 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-24 22:20                       ` Bram Cohen
@ 2017-02-25  4:12                         ` Peter Todd
  2017-02-25  6:23                           ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-02-25  4:12 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5105 bytes --]

On Fri, Feb 24, 2017 at 02:20:19PM -0800, Bram Cohen wrote:
> So your idea is to cluster entries by entry time because newer things are
> more likely to leave and updating multiple things near each other is
> cheaper?

Yes, exactly.

> That can be done with my tool. Instead of using hashes for the values being
> stored, you use position entries. The first entry gets a value of all
> zeros, the next one a one followed by all zeros, then the next two
> correspond to the first two with the second bit flipped to one, then the
> next four the first four with the third bit flipped to one, etc. It
> probably performs a little bit better to do it two bits at a time instead
> of one so that the entries are 00, 01, 10, 11, 0001, 0010, 0011, 0101,
> 0110, 0111, 1001, etc. If you were to really use this you'd probably want
> to to add some optimizations to use the fact that the terminals fit in 64
> bits instead of 256, but it mostly works unchanged, and gets whatever

So to be clear, what you're proposing there is to use the insertion order as
the index - once you go that far you've almost entirely re-invented my
proposal!

In fact, when I was working my proofchains/proofmarshal libraries I put some
thought into whether or not I could leave out the MMR merkelized list
implementation and use only the key-value map I also wrote. I decided to
include both as they aren't quite the same datastructure - using a list for a
list has advantages.

> benefits there are to this clustering plus the high performance
> implementation tricks I've built which I keep complaining that nobody's
> giving feedback on.

Your merkle-set implementation is 1500 lines of densely written Python with
almost no comments, and less than a 100 lines of (also uncommented) tests. By
comparison, my Python MMR implementation is 300 lines of very readable Python
with lots of comments, a 200 line explanation at the top, and 200 lines of
(commented) tests. Yet no-one is taking the (still considerable) effort to
understand and comment on my implementation. :)

Fact is, what you've written is really daunting to review, and given it's not
in the final language anyway, it's unclear what basis to review it on anyway. I
suspect you'd get more feedback if the codebase was better commented, in a
production language, and you have actual real-world benchmarks and performance
figures.

In particular, while at the top of merkle_set.py you have a list of advantages,
and a bunch of TODO's, you don't explain *why* the code has any of these
advantages. To figure that out, I'd have to read and understand 1500 lines of
densely written Python. Without a human-readable pitch, not many people are
going to do that, myself included.

> I'm not sold on this being a win: The empirical access patterns are
> unknown,

Lost coins alone guarantees that access patterns will be biased towards new
coins being more likely to be spent. That basis alone is sufficient to justify
an insertion-ordered data structure. Additionally, people have done graphs of
the average age of UTXO's when spent, and that data clearly shows that newer
coins are more likely to be spent than older coins.

> unknown, it requires an extra cache miss per lookup to find the entry
> number,

Like I mentioned in the email you're replying to, that extra lookup can be
easily avoided with a change to how transactions/blocks are serialized; if all
your peers support TXO commitments you can even discard the lookup database
entirely, as it's only a backwards compatibility measure.

> it may be that everything is optimized well enough without it for
> there to be no meaningful gains, and it's a bunch of extra complexity. What

Optimization is itself extra complexity. If you're data structure has worse
inherent performance, and you have to make up the different with a highly
optimized implementation, that's likely to lead to more overall complexity than
using a data structure with better inherent performance.

Your current merkle-set implementation definitely _is_ very complex. An
apples-to-apples comparison is with my merkelized key:value tree(1), also a
patricia tree, which like the MMR is only about 300 lines of well-commented and
straight-forward code.

1) https://github.com/proofchains/python-proofmarshal/blob/master/proofmarshal/merbinnertree.py

> should be done is that a plain vanilla UTXO set solution is optimized as
> well as it can be first, and then the insertion ordering trick is tried as
> an optimization to see if it's an improvement. Without that baseline
> there's no meaningful basis for comparison, and I'm quite confident that a
> naive implementation which just allocates individual nodes will
> underperform the thing I've come up with, even without adding optimizations
> related to fitting in 64 bits.

To be clear, "insertion ordering" isn't a simple trick, it's a fundamental
change to what the data structure is. Once you do that, you're talking about my
proposal.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-25  4:12                         ` Peter Todd
@ 2017-02-25  6:23                           ` Bram Cohen
  2017-02-28 16:43                             ` G. Andrew Stone
  2017-03-01 22:31                             ` Peter Todd
  0 siblings, 2 replies; 29+ messages in thread
From: Bram Cohen @ 2017-02-25  6:23 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3731 bytes --]

On Fri, Feb 24, 2017 at 8:12 PM, Peter Todd <pete@petertodd.org> wrote:

>
> So to be clear, what you're proposing there is to use the insertion order
> as
> the index - once you go that far you've almost entirely re-invented my
> proposal!
>

I'm not 'proposing' this, I'm saying it could be done simply but I'm
skeptical of the utility. Probably the most compelling argument for it is
that the insertion indexed values are much smaller so they can be compacted
down a lot resulting in using less memory and more locality and fewer
hashes, but your implementation doesn't take advantage of that.


> Your merkle-set implementation is 1500 lines of densely written Python


The reference implementation which is included in those 1500 lines is less
than 300 lines and fairly straightforward. The non-reference implementation
always behaves semantically identically to the reference implementation, it
just does so faster and using less memory.


> with
> almost no comments,


The comments at the top explain both the proof format and the in-memory
data structures very precisely. The whole codebase was reviewed by a
coworker of mine and comments were added explaining the subtleties which
tripped him up.


> and less than a 100 lines of (also uncommented) tests.


Those tests get 98% code coverage and extensively hit not only the lines of
code but the semantic edge cases as well. The lines which aren't hit are
convenience functions and error conditions of the parsing code for when
it's passed bad data.


> By
> comparison, my Python MMR implementation is 300 lines of very readable
> Python
> with lots of comments, a 200 line explanation at the top, and 200 lines of
> (commented) tests. Yet no-one is taking the (still considerable) effort to
> understand and comment on my implementation. :)
>

Given that maaku's Merkle prefix trees were shelved because of performance
problems despite being written in C and operating in basically the same way
as your code and my reference code, it's clear that non-optimized Python
won't be touching the bitcoin codebase any time soon.


>
> Fact is, what you've written is really daunting to review, and given it's
> not
> in the final language anyway, it's unclear what basis to review it on
> anyway.


It should reviewed based on semantic correctness and performance.
Performance can only be accurately and convincingly determined by porting
to C and optimizing it, which mostly involves experimenting with different
values for the two passed in magic numbers.


> I
> suspect you'd get more feedback if the codebase was better commented, in a
> production language, and you have actual real-world benchmarks and
> performance
> figures.
>

Porting to C should be straightforward. Several people have already
expressed interest in doing so, and it's written in intentionally C-ish
Python, resulting in some rather odd idioms which is a bit part of why you
think it looks 'dense'. A lot of that weird offset math should be much more
readable in C because it's all structs and x.y notation can be used instead
of adding offsets.


> In particular, while at the top of merkle_set.py you have a list of
> advantages,
> and a bunch of TODO's, you don't explain *why* the code has any of these
> advantages. To figure that out, I'd have to read and understand 1500 lines
> of
> densely written Python. Without a human-readable pitch, not many people are
> going to do that, myself included.
>

It's all about cache coherence. When doing operations it pulls in a bunch
of things which are near each other in memory instead of jumping all over
the place. The improvements it gets should be much greater than the ones
gained from insertion ordering, although the two could be accretive.

[-- Attachment #2: Type: text/html, Size: 5256 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-25  6:23                           ` Bram Cohen
@ 2017-02-28 16:43                             ` G. Andrew Stone
  2017-02-28 23:10                               ` Bram Cohen
  2017-03-01 22:31                             ` Peter Todd
  1 sibling, 1 reply; 29+ messages in thread
From: G. Andrew Stone @ 2017-02-28 16:43 UTC (permalink / raw)
  To: Bram Cohen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 5391 bytes --]

I can understand how Bram's transaction double sha256 hashed UTXO set
patricia trie allows a client to quickly validate inputs because the inputs
of a transaction are specified in the same manner.  So to verify that an
input is unspent the client simply traverses the patricia trie.

It also makes sense that if transaction inputs were specified by a [block
height, tx index, output index] triple we'd have a much more size-efficient
transaction format.  This format would make look up pretty simple in
Peter's pruned time-ordered TXO merkle mountain range, although you'd have
translate the triple to an index, which means we'd have to at a minimum
keep track of the number of TXOs in each block, and then probably do a
linear search starting from the location where the block's TXOs begin in
the MMR.  (The ultimate option I guess is to specify transaction inputs by
a single number which is essentially the index of the TXO in a (never
actually created) insertion-ordered TXO array...)

But since transactions' prevouts are not specified by [block height, tx
index, output index] or by TXO index, I don't understand how an insertion
ordered TXO tree can result in efficient lookups.  Can you help me
understand this?



On Sat, Feb 25, 2017 at 1:23 AM, Bram Cohen via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Fri, Feb 24, 2017 at 8:12 PM, Peter Todd <pete@petertodd.org> wrote:
>
>>
>> So to be clear, what you're proposing there is to use the insertion order
>> as
>> the index - once you go that far you've almost entirely re-invented my
>> proposal!
>>
>
> I'm not 'proposing' this, I'm saying it could be done simply but I'm
> skeptical of the utility. Probably the most compelling argument for it is
> that the insertion indexed values are much smaller so they can be compacted
> down a lot resulting in using less memory and more locality and fewer
> hashes, but your implementation doesn't take advantage of that.
>
>
>> Your merkle-set implementation is 1500 lines of densely written Python
>
>
> The reference implementation which is included in those 1500 lines is less
> than 300 lines and fairly straightforward. The non-reference implementation
> always behaves semantically identically to the reference implementation, it
> just does so faster and using less memory.
>
>
>> with
>> almost no comments,
>
>
> The comments at the top explain both the proof format and the in-memory
> data structures very precisely. The whole codebase was reviewed by a
> coworker of mine and comments were added explaining the subtleties which
> tripped him up.
>
>
>> and less than a 100 lines of (also uncommented) tests.
>
>
> Those tests get 98% code coverage and extensively hit not only the lines
> of code but the semantic edge cases as well. The lines which aren't hit are
> convenience functions and error conditions of the parsing code for when
> it's passed bad data.
>
>
>> By
>> comparison, my Python MMR implementation is 300 lines of very readable
>> Python
>> with lots of comments, a 200 line explanation at the top, and 200 lines of
>> (commented) tests. Yet no-one is taking the (still considerable) effort to
>> understand and comment on my implementation. :)
>>
>
> Given that maaku's Merkle prefix trees were shelved because of performance
> problems despite being written in C and operating in basically the same way
> as your code and my reference code, it's clear that non-optimized Python
> won't be touching the bitcoin codebase any time soon.
>
>
>>
>> Fact is, what you've written is really daunting to review, and given it's
>> not
>> in the final language anyway, it's unclear what basis to review it on
>> anyway.
>
>
> It should reviewed based on semantic correctness and performance.
> Performance can only be accurately and convincingly determined by porting
> to C and optimizing it, which mostly involves experimenting with different
> values for the two passed in magic numbers.
>
>
>> I
>> suspect you'd get more feedback if the codebase was better commented, in a
>> production language, and you have actual real-world benchmarks and
>> performance
>> figures.
>>
>
> Porting to C should be straightforward. Several people have already
> expressed interest in doing so, and it's written in intentionally C-ish
> Python, resulting in some rather odd idioms which is a bit part of why you
> think it looks 'dense'. A lot of that weird offset math should be much more
> readable in C because it's all structs and x.y notation can be used instead
> of adding offsets.
>
>
>> In particular, while at the top of merkle_set.py you have a list of
>> advantages,
>> and a bunch of TODO's, you don't explain *why* the code has any of these
>> advantages. To figure that out, I'd have to read and understand 1500
>> lines of
>> densely written Python. Without a human-readable pitch, not many people
>> are
>> going to do that, myself included.
>>
>
> It's all about cache coherence. When doing operations it pulls in a bunch
> of things which are near each other in memory instead of jumping all over
> the place. The improvements it gets should be much greater than the ones
> gained from insertion ordering, although the two could be accretive.
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 7570 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-28 16:43                             ` G. Andrew Stone
@ 2017-02-28 23:10                               ` Bram Cohen
  2017-02-28 23:24                                 ` Pieter Wuille
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-02-28 23:10 UTC (permalink / raw)
  To: G. Andrew Stone; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

On Tue, Feb 28, 2017 at 8:43 AM, G. Andrew Stone <g.andrew.stone@gmail.com>
wrote:

>
> But since transactions' prevouts are not specified by [block height, tx
> index, output index] or by TXO index, I don't understand how an insertion
> ordered TXO tree can result in efficient lookups.  Can you help me
> understand this?
>

You have to have a lookup table going from prevouts to txo index. Lookups
on that are relatively fast because looking up things in a hashtable is a
single cache miss, while looking up things in a tree is logarithmic cache
misses.

The purported benefit of using txout is that because recent things are
spent much more than old things, there's a lot of clustering of updates. If
you update two things near each other they share the top branches of
updates in the tree, resulting in less hashing and cache misses. But since
everything is log scale I suspect such benefits are small. My guess is
transaction ordering has much larger potential from compression because you
cram information about lots of things into a single leaf node because they
have very small diffs from each other. That said, those benefits are also
smaller than and accretive to the simple implementation tricks I already
implemented which cause things near each other in the tree to be near each
other in memory.

[-- Attachment #2: Type: text/html, Size: 1730 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-28 23:10                               ` Bram Cohen
@ 2017-02-28 23:24                                 ` Pieter Wuille
  2017-03-01  1:47                                   ` Bram Cohen
  0 siblings, 1 reply; 29+ messages in thread
From: Pieter Wuille @ 2017-02-28 23:24 UTC (permalink / raw)
  To: Bitcoin Dev, Bram Cohen

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

On Feb 28, 2017 15:10, "Bram Cohen via bitcoin-dev" <
bitcoin-dev@lists.linuxfoundation.org> wrote:

On Tue, Feb 28, 2017 at 8:43 AM, G. Andrew Stone <g.andrew.stone@gmail.com>
wrote:

>
> But since transactions' prevouts are not specified by [block height, tx
> index, output index] or by TXO index, I don't understand how an insertion
> ordered TXO tree can result in efficient lookups.  Can you help me
> understand this?
>

You have to have a lookup table going from prevouts to txo index. Lookups
on that are relatively fast because looking up things in a hashtable is a
single cache miss, while looking up things in a tree is logarithmic cache
misses.


I'm wondering if there is some confusion here.

Yes, someone needs to have a lookup table from prevouts to TXO tree
positions. But because an insertion-ordered TXO tree does not rebalance,
that table can be maintained by wallets or service providers for just their
own coins, instead of by every full node and miner individually for
everyone's coins.

In the simplest committed TXO model, full nodes simply maintain the TXO
root hash, and every transaction/block comes with a proof that its inputs
are in the TXO tree, and the necessary information to update the root after
spending the inputs and adding the outputs.

-- 
Pieter

[-- Attachment #2: Type: text/html, Size: 2276 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-28 23:24                                 ` Pieter Wuille
@ 2017-03-01  1:47                                   ` Bram Cohen
  2017-03-01  1:56                                     ` Peter Todd
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-03-01  1:47 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

On Tue, Feb 28, 2017 at 3:24 PM, Pieter Wuille <pieter.wuille@gmail.com>
wrote:

>
> Yes, someone needs to have a lookup table from prevouts to TXO tree
> positions. But because an insertion-ordered TXO tree does not rebalance,
> that table can be maintained by wallets or service providers for just their
> own coins, instead of by every full node and miner individually for
> everyone's coins.
>

That falls apart if you want to support proofs of non-spend, which is the
point of the whole exercise

[-- Attachment #2: Type: text/html, Size: 951 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-03-01  1:47                                   ` Bram Cohen
@ 2017-03-01  1:56                                     ` Peter Todd
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Todd @ 2017-03-01  1:56 UTC (permalink / raw)
  To: Bram Cohen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On Tue, Feb 28, 2017 at 05:47:30PM -0800, Bram Cohen via bitcoin-dev wrote:
> On Tue, Feb 28, 2017 at 3:24 PM, Pieter Wuille <pieter.wuille@gmail.com>
> wrote:
> 
> >
> > Yes, someone needs to have a lookup table from prevouts to TXO tree
> > positions. But because an insertion-ordered TXO tree does not rebalance,
> > that table can be maintained by wallets or service providers for just their
> > own coins, instead of by every full node and miner individually for
> > everyone's coins.
> >
> 
> That falls apart if you want to support proofs of non-spend, which is the
> point of the whole exercise

Can you explain in more detail what you mean there?

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-02-25  6:23                           ` Bram Cohen
  2017-02-28 16:43                             ` G. Andrew Stone
@ 2017-03-01 22:31                             ` Peter Todd
  2017-03-31 20:38                               ` Bram Cohen
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Todd @ 2017-03-01 22:31 UTC (permalink / raw)
  To: Bram Cohen; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4056 bytes --]

On Fri, Feb 24, 2017 at 10:23:20PM -0800, Bram Cohen wrote:
> > Your merkle-set implementation is 1500 lines of densely written Python
> 
> 
> The reference implementation which is included in those 1500 lines is less
> than 300 lines and fairly straightforward. The non-reference implementation
> always behaves semantically identically to the reference implementation, it
> just does so faster and using less memory.

Great!

But do you see my point here? Even though I spent some time reading through
that code, I didn't realise you had a 300 line reference implementation
embedded within those 1500 lines. This makes it less likely for you to get any
review on either.

A better way to present your work would have been to at least explain that at
the top of the file, and perhaps even better, split the reference
implementation and optimized implementation into two separate files. If you did
this, you'd be more likely to get others to review your work.

> > with
> > almost no comments,
> 
> 
> The comments at the top explain both the proof format and the in-memory
> data structures very precisely. The whole codebase was reviewed by a
> coworker of mine and comments were added explaining the subtleties which
> tripped him up.

Yes, and it's good that you have those comments. But the codebase itself could
definitely use more, and adding those comments would help get more people
reviewing your work.

> > and less than a 100 lines of (also uncommented) tests.
> 
> 
> Those tests get 98% code coverage and extensively hit not only the lines of
> code but the semantic edge cases as well. The lines which aren't hit are
> convenience functions and error conditions of the parsing code for when
> it's passed bad data

Great! But you see how without comments, it'll take a tremendous amount of work
for an external reviewer like myself to determine what is being tested, and
what edge cases you're targeting.

In fact, I'd suggest that for things like edge cases, you test edge cases in
separate unit tests that explain what edge cases you're trying to catch.

> > By
> > comparison, my Python MMR implementation is 300 lines of very readable
> > Python
> > with lots of comments, a 200 line explanation at the top, and 200 lines of
> > (commented) tests. Yet no-one is taking the (still considerable) effort to
> > understand and comment on my implementation. :)
> >
> 
> Given that maaku's Merkle prefix trees were shelved because of performance
> problems despite being written in C and operating in basically the same way
> as your code and my reference code, it's clear that non-optimized Python
> won't be touching the bitcoin codebase any time soon.

To be clear, I gave my implementation as an example of how hard it is to get
external review, not to suggest it's going to be a part of Bitcoin; I've
pointed a lot of people to it when they asked for a MMR implementation, and I'm
sure if some of those people had reviewed it carefully they would have
suggested changes. Yet they haven't, because doing good review is a lot of
work!

> > In particular, while at the top of merkle_set.py you have a list of
> > advantages,
> > and a bunch of TODO's, you don't explain *why* the code has any of these
> > advantages. To figure that out, I'd have to read and understand 1500 lines
> > of
> > densely written Python. Without a human-readable pitch, not many people are
> > going to do that, myself included.
> >
> 
> It's all about cache coherence. When doing operations it pulls in a bunch
> of things which are near each other in memory instead of jumping all over
> the place. The improvements it gets should be much greater than the ones
> gained from insertion ordering, although the two could be accretive.

That's good, but that paragraph should be part of your MerkleSet git repo,
preferably in the README, where reviewers will immediately find it and get
excited about reviewing your code.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-03-01 22:31                             ` Peter Todd
@ 2017-03-31 20:38                               ` Bram Cohen
  2017-04-01 10:18                                 ` praxeology_guy
  0 siblings, 1 reply; 29+ messages in thread
From: Bram Cohen @ 2017-03-31 20:38 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

On Wed, Mar 1, 2017 at 2:31 PM, Peter Todd <pete@petertodd.org> wrote:

>
> A better way to present your work would have been to at least explain that
> at
> the top of the file, and perhaps even better, split the reference
> implementation and optimized implementation into two separate files. If
> you did
> this, you'd be more likely to get others to review your work.
>

I've now added explanation to the README, reorganized the files, and added
some comments:

https://github.com/bramcohen/MerkleSet

In fact, I'd suggest that for things like edge cases, you test edge cases in
> separate unit tests that explain what edge cases you're trying to catch.
>

The tests work by doing a lot of exercising on pseudorandom data, an
approach which does a good job of hitting all the lines of code and edge
cases and requiring very little updating as the implementation changes, at
the expense of it taking a while for tests to run. The advantage of very
custom unit tests is that they run almost instantly, at the cost of
requiring painstaking maintenance and missing more stuff. I've come to
favor this approach in my old age.

The proportion of code devoted to tests is more than it looks like at first
blush, because all the audit methods are just for testing.

[-- Attachment #2: Type: text/html, Size: 1913 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-03-31 20:38                               ` Bram Cohen
@ 2017-04-01 10:18                                 ` praxeology_guy
  2017-04-01 19:46                                   ` praxeology_guy
  0 siblings, 1 reply; 29+ messages in thread
From: praxeology_guy @ 2017-04-01 10:18 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2906 bytes --]

Peter Todd,

This MMR structure looks good to me. I really like how wallets keep their MMR proof and txo index instead of requiring the entire network to maintain an index on txids w/ plain old utxo snapshots.

Re: "only left or right child of inner node be a fully spent node"... that sounds fine to me, but the software should virtually consider that the previous dissapearing leaf nodes still exist. It would instead say be a special case handled by the meta hashing function. Would save a good amount of time from unneccesary hashing. Might also do the rule: if a parent node has a single fully spent child node, its hash is equal to its other child's hash.

Below is questions about txo/utxo MMR commitments after reading: "https://petertodd.org/2016/delayed-txo-commitments".

I'm mainly concerned about the performance of recalculating all of the node hashes on old spends. But probably with a long enough delay policy, it shouldn't be an issue.

Then the issues with people keeping their MMR proofs up to date and discovering received txos before they get pruned. Sure would be nice if a wallet didn't have to keep on updating their MMR proof. Hopefully spends would refer to old txos by their MMR index.

How are you ordering MMR additions? Are you only adding old utxos to the MMR? Or every old txo? I think you are doing all old txos (mostly would be spent nodes), but why not just old utxos? Are you doing it to make MMR index = blockchain txo index, so such an index can be used in all TX inputs except for non-confirmed transactions? Potentially a tx could use a MMR.ix, allblock'stxo.ix (if we want to maintian that index), or tx.id & vout.ix depending on how old the tx is.

What is the process for removing old utxos from the utxo set, and placing them into the MMR? Are you going to keep height in the utxo db, and then iterate through the whole thing?

Are you still proposing a 1 year txo commitment delay? Do you have any data/preformance studies judging the cost of a larger utxo and longer delay vs smaller utxo and shorder delay? I would figure a longer delay would be better as long as the utxo doesn't get too big. Longer delay means less MMR maintainance.

Have you considered not making a new commitment on every block, instead maybe every 6*24 blocks or so? This could reduce the number of times the same nodes are re-hashed, while not having much impact on security.

What about re-orgs? You'd have to restore the previous leaf & inner nodes. You'd need both the txos and MMR proofs, right? It looks like you clear this info from the TXO journal when the delay duration threshold is met. I guess this info might also be stored with the block data, and could be recovered from there.

What are your current thoughts on block size weighting for MMR proofs? Just count the extra byte length that full nodes need to relay as simliar to SegWit witness data?

Cheers,
Praxeology Guy

[-- Attachment #2: Type: text/html, Size: 3547 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [bitcoin-dev] A Better MMR Definition
  2017-04-01 10:18                                 ` praxeology_guy
@ 2017-04-01 19:46                                   ` praxeology_guy
  0 siblings, 0 replies; 29+ messages in thread
From: praxeology_guy @ 2017-04-01 19:46 UTC (permalink / raw)
  Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

gmaxwell told me that most nodes would keep a full copy of the top of the MMR tree.

Here I am exploring how this could be policy-ized to solve two problems:
- MMR proofs change over time
- How to coordinate nodes to get them to keep different portions of the MMR, so that everyone can prune most of the structure, but the entire network still retains multiple copies of the full MMR.

Define deltaLeafHeight as the number of tree layers between a node and the leaves in the MMR data structure. We make it a policy that nodes are expected to have all nodes above deltaLeafHeight = DLH_REQUIRED, but that nodes are free to prune any nodes with a deltaLeafHeight < DLH_REQUIRED. Of course a node could prune at DLH_REQUIRED or higher, but what I am proposing is that messages and proofs by default would only include nodes at deltaLeafHeight < DLH_REQUIRED.

Given the above, If a wallet didn't want to be continuously concerned about updating their MMR proof for its coins, then for each coin:
- store the set of utxo digests that are children of the "root nearest" node that is at deltaLeafHeight = DLH_REQUIRED. Call such a set of utxo digests the "pruned relatives".
- Pruned relative count = 2^DLH_REQUIRED -1
- Guessing the spentness status of the pruned relatives would worst case take 2^(pruned relative count) guesses.
- in the case where the MMR holds all txos (not just utxos at addition time)... the wallet should also keep record of which of the pruned relatives were utxos.
- Any future information discovered about whether a pruned relative is spent would reduce the worst case guess count by a factor of 2.

As an example, in the case where DLH_REQUIRED = 3:
- pruned relative count = 7
- worst case spentness guess count = 128

Wallets storing the digests of pruned relatives could also help the entire network be able to discover otherwise lost portions of the MMR. If wallets stored not just the pruned relatives digests, but also their corresponding utxos, they could help other nodes find lost coins.

Cheers,
Praxeology Guy

[-- Attachment #2: Type: text/html, Size: 2399 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-04-01 19:46 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-23  1:15 [bitcoin-dev] A Better MMR Definition Peter Todd
2017-02-23  3:07 ` Bram Cohen
2017-02-23  7:41   ` Peter Todd
2017-02-23 17:53 ` Chris Priest
2017-02-23 18:19   ` Peter Todd
2017-02-23 18:28     ` G. Andrew Stone
2017-02-23 18:31       ` Peter Todd
2017-02-23 23:13   ` Bram Cohen
2017-02-23 23:51     ` Peter Todd
2017-02-24  0:49       ` Bram Cohen
2017-02-24  1:09         ` Peter Todd
2017-02-24  2:50           ` Bram Cohen
2017-02-24  2:58             ` Peter Todd
2017-02-24  3:02               ` Bram Cohen
2017-02-24  3:15                 ` Peter Todd
2017-02-24  3:32                   ` Bram Cohen
2017-02-24  4:36                     ` Peter Todd
2017-02-24 22:20                       ` Bram Cohen
2017-02-25  4:12                         ` Peter Todd
2017-02-25  6:23                           ` Bram Cohen
2017-02-28 16:43                             ` G. Andrew Stone
2017-02-28 23:10                               ` Bram Cohen
2017-02-28 23:24                                 ` Pieter Wuille
2017-03-01  1:47                                   ` Bram Cohen
2017-03-01  1:56                                     ` Peter Todd
2017-03-01 22:31                             ` Peter Todd
2017-03-31 20:38                               ` Bram Cohen
2017-04-01 10:18                                 ` praxeology_guy
2017-04-01 19:46                                   ` praxeology_guy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox