Re: [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX, Turing Completeness, and other considerations

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

From: Jeremy Rubin <jeremy.l.rubin@gmail.com>
To: Anthony Towns <aj@erisian.com.au>
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists.linuxfoundation.org>
Subject: Re: [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX, Turing Completeness, and other considerations
Date: Sat, 5 Mar 2022 12:20:02 +0000	[thread overview]
Message-ID: <CAD5xwhjG7kN=LatZRpQxqmaqtoRP31BcyeN2zHtOUsGt=6oJ3w@mail.gmail.com> (raw)
In-Reply-To: <20220305055924.GB5308@erisian.com.au>

[-- Attachment #1: Type: text/plain, Size: 12526 bytes --]

On Sat, Mar 5, 2022 at 5:59 AM Anthony Towns <aj@erisian.com.au> wrote:

> On Fri, Mar 04, 2022 at 11:21:41PM +0000, Jeremy Rubin via bitcoin-dev
> wrote:
> > I've seen some discussion of what the Annex can be used for in Bitcoin.
>
>
> https://www.erisian.com.au/meetbot/taproot-bip-review/2019/taproot-bip-review.2019-11-12-19.00.log.html
>
> includes some discussion on that topic from the taproot review meetings.
>
> The difference between information in the annex and information in
> either a script (or the input data for the script that is the rest of
> the witness) is (in theory) that the annex can be analysed immediately
> and unconditionally, without necessarily even knowing anything about
> the utxo being spent.
>

I agree that should happen, but there are cases where this would not work.
E.g., imagine OP_LISP_EVAL + OP_ANNEX... and then you do delegation via the
thing in the annex.

Now the annex can be executed as a script.



>
> The idea is that we would define some simple way of encoding (multiple)
> entries into the annex -- perhaps a tag/length/value scheme like
> lightning uses; maybe if we add a lisp scripting language to consensus,
> we just reuse the list encoding from that? -- at which point we might
> use one tag to specify that a transaction uses advanced computation, and
> needs to be treated as having a heavier weight than its serialized size
> implies; but we could use another tag for per-input absolute locktimes;
> or another tag to commit to a past block height having a particular hash.
>

Yes, this seems tough to do without redefining checksig to allow partial
annexes. Hence thinking we should make our current checksig behavior
require it be 0, future operations should be engineered with specific
structured annex in mind.



>
> It seems like a good place for optimising SIGHASH_GROUP (allowing a group
> of inputs to claim a group of outputs for signing, but not allowing inputs
> from different groups to ever claim the same output; so that each output
> is hashed at most once for this purpose) -- since each input's validity
> depends on the other inputs' state, it's better to be able to get at
> that state as easily as possible rather than having to actually execute
> other scripts before your can tell if your script is going to be valid.
>

I think SIGHASH_GROUP could be some sort of mutable stack value, not ANNEX.
you want to be able to compute what range you should sign, and then the
signature should cover the actual range not the argument itself.

Why sign the annex literally?

Why require that all signatures in one output sign the exact same digest?
What if one wants to sign for value and another for value + change?



>
> > The BIP is tight lipped about it's purpose
>
> BIP341 only reserves an area to put the annex; it doesn't define how
> it's used or why it should be used.
>
>
It does define how it's used, Checksig must commit to it. Were there no
opcodes dependent on it I would agree, and that would be preferable.




> > Essentially, I read this as saying: The annex is the ability to pad a
> > transaction with an additional string of 0's
>
> If you wanted to pad it directly, you can do that in script already
> with a PUSH/DROP combo.
>

You cannot, because the push/drop would not be signed and would be
malleable.

The annex is not malleable, so it can be used to this as authenticated
padding.



>
> The point of doing it in the annex is you could have a short byte
> string, perhaps something like "0x010201a4" saying "tag 1, data length 2
> bytes, value 420" and have the consensus intepretation of that be "this
> transaction should be treated as if it's 420 weight units more expensive
> than its serialized size", while only increasing its witness size by
> 6 bytes (annex length, annex flag, and the four bytes above). Adding 6
> bytes for a 426 weight unit increase seems much better than adding 426
> witness bytes.
>
>
Yes, that's what I say in the next sentence,

*> Or, we might somehow make the witness a small language (e.g., run length
encoded zeros) such that we can very quickly compute an equivalent number
of zeros to 'charge' without actually consuming the space but still
consuming a linearizable resource... or something like that.*

so I think we concur on that.



> > Introducing OP_ANNEX: Suppose there were some sort of annex pushing
> opcode,
> > OP_ANNEX which puts the annex on the stack
>
> I think you'd want to have a way of accessing individual entries from
> the annex, rather than the annex as a single unit.
>

Or OP_ANNEX + OP_SUBSTR + OP_POVARINTSTR? Then you can just do 2 pops for
the length and the tag and then get the data.


>
> > Now suppose that I have a computation that I am running in a script as
> > follows:
> >
> > OP_ANNEX
> > OP_IF
> >     `some operation that requires annex to be <1>`
> > OP_ELSE
> >     OP_SIZE
> >     `some operation that requires annex to be len(annex) + 1 or does a
> > checksig`
> > OP_ENDIF
> >
> > Now every time you run this,
>
> You only run a script from a transaction once at which point its
> annex is known (a different annex gives a different wtxid and breaks
> any signatures), and can't reference previous or future transactions'
> annexes...
>
>
In a transaction validator, yes. But in a satisfier, no.

And it doesn't break the signatures if we add the ability to only sign over
a part of an annex either/multiple annexes, since the annex could be
mutable partially.


Not true about accessing previous TXNs annexes. All coins spend from
Coinbase transactions. If you can get the COutpoint you're spending, you
can get the parent of the COutpoint... and iterate backwards so on and so
forth. Then you have the CB txn, which commits to the tree of wtxids. So
you get previous transactions annexes comitted there.


For future transactions, you can, as a miner with decent hashrate you could
promise what your Coinbase transaction would be for a future block and what
the Outputs would be, and then you can pop open that as well... but you
can't show valid PoW for that one so I'm not sure that's different than
authenticated data. But where it does have a use is that you could, if you
had OP_COUTPOINTVERIFY, say that this coin is only spendable if a miner
mines the specific block that you want at a certain height (e.g., with only
your txn in it?) and then they can claim the outpoint in the future... so
maybe there is something there bizzare that can happen with that
capability....



> > Because the Annex is signed, and must be the same, this can also be
> > inconvenient:
>
> The annex is committed to by signatures in the same way nVersion,
> nLockTime and nSequence are committed to by signatures; I think it helps
> to think about it in a similar way.
>

nSequence, yes, nLockTime is per-tx.

BTW i think we now consider nSeq/nLock to be misdesigned given desire to
vary these per-input/per-tx....\

so if the annex is like these perhaps it's also misdesigned.

>
> > Suppose that you have a Miniscript that is something like: and(or(PK(A),
> > PK(A')), X, or(PK(B), PK(B'))).
> >
> > A or A' should sign with B or B'. X is some sort of fragment that might
> > require a value that is unknown (and maybe recursively defined?) so
> > therefore if we send the PSBT to A first, which commits to the annex, and
> > then X reads the annex and say it must be something else, A must sign
> > again. So you might say, run X first, and then sign with A and C or B.
> > However, what if the script somehow detects the bitstring WHICH_A WHICH_B
> > and has a different Annex per selection (e.g., interpret the bitstring
> as a
> > int and annex must == that int). Now, given and(or(K1, K1'),... or(Kn,
> > Kn')) we end up with needing to pre-sign 2**n annex values somehow...
> this
> > seems problematic theoretically.
>
> Note that you need to know what the annex will contain before you sign,
> since the annex is committed to via the signature. If "X" will need
> entries in the annex that aren't able to be calculated by the other
> parties, then they need to be the first to contribute to the PSBT, not A.
>
> I think the analogy to locktimes would be "I need the locktime to be at
> least block 900k, should I just sign that now, or check that nobody else
> is going to want it to be block 950k or something? Or should I just sign
> with nLockTime at 900k, 910k, 920k, 930k, etc and let someone else pick
> the right one?" The obvious solution is just to work out what the
> nLockTime should be first, then run signing rounds. Likewise, work out
> what the annex should be first, then run the signing rounds.
>


Yes, my point is this is computationally hard to do sometimes.

>
> CLTV also has the problem that if you have one script fragment with
> CLTV by time, and another with CLTV by height, you can't come up with
> an nLockTime that will ever satisfy both. If you somehow have script
> fragments that require incompatible interpretations of the annex, you're
> likewise going to be out of luck.
>
>
Yes, see above. If we don't know how the annex will be structured or used,
this is the point of this thread....

We need to drill down how to not introduce these problems.



> Having a way of specifying locktimes in the annex can solve that
> particular problem with CLTV (different inputs can sign different
> locktimes, and you could have different tags for by-time/by-height so
> that even the same input can have different clauses requiring both),
> but the general problem still exists.
>
> (eg, you might have per-input by-height absolute locktimes as annex
> entry 3, and per-input by-time absolute locktimes as annex entry 4,
> so you might convert:
>
>  "900e3 CLTV DROP" -> "900e3 3 PUSH_ANNEX_ENTRY GREATERTHANOREQUAL VERIFY"
>
>  "500e6 CLTV DROP" -> "500e6 4 PUSH_ANNEX_ENTRY GREATERTHANOREQUAL VERIFY"
>
> for height/time locktime checks respectively)
>
> > Of course this wouldn't be miniscript then. Because miniscript is just
> for
> > the well behaved subset of script, and this seems ill behaved. So maybe
> > we're OK?
>
> The CLTV issue hit miniscript:
>
> https://medium.com/blockstream/dont-mix-your-timelocks-d9939b665094


Maybe the humour didn't hit -- we can only define well behaved as best we
know, and the solution was to re-define miniscript to only be the well
defined subset of miniscript once the bug in the spec was found.



>
> > It seems like one good option is if we just go on and banish the
> OP_ANNEX.
> > Maybe that solves some of this? I sort of think so. It definitely seems
> > like we're not supposed to access it via script, given the quote from
> above:
>
> How the annex works isn't defined, so it doesn't make any sense to
> access it from script. When how it works is defined, I expect it might
> well make sense to access it from script -- in a similar way that the
> CLTV and CSV opcodes allow accessing nLockTime and nSequence from script.
>

That's false: CLTV and CSV expressly do not allow accessing it from script,
only lower bounding it (and transitively proving that it was not of the
other flavour).

So you can't actually get the exact nLockTime / Sequence on the stack
(exception: if you use the maximum allowable value, then there are no other
values...)


Given that it's not defined at all, that's why I'm skeptical about signing
it at all presently.

If theres a future upgrade, it would be compatible as we can add new
sighash flags to cover that.


> > One solution would be to... just soft-fork it out. Always must be 0. When
> > we come up with a use case for something like an annex, we can find a way
> > to add it back.
>
> The point of reserving the annex the way it has been is exactly this --
> it should not be used now, but when we agree on how it should be used,
> we have an area that's immediately ready to be used.


> (For the cases where you don't need script to enforce reasonable values,
> reserving it now means those new consensus rules can be used immediately
> with utxos that predate the new consensus rules -- so you could update
> offchain contracts from per-tx to per-input locktimes immediately without
> having to update the utxo on-chain first)
>

I highly doubt that we will not need new sighash flags once it is ready to
allow partial covers of the annex, e.g. like the structured ones described
above.

We're already doing a soft fork for the new annex rules, so this isn't a
big deal...

Legacy outputs can use these new sighash flags as well, in theory (maybe
I'll do a post on why we shouldn't...)



Cheers,

Jeremy

[-- Attachment #2: Type: text/html, Size: 23424 bytes --]

next prev parent reply	other threads:[~2022-03-05 12:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-04 23:21 [bitcoin-dev] Annex Purpose Discussion: OP_ANNEX, Turing Completeness, and other considerations Jeremy Rubin
2022-03-04 23:33 ` ZmnSCPxj
2022-03-06 12:55   ` Christian Decker
2022-03-05  5:59 ` Anthony Towns
2022-03-05 12:20   ` Jeremy Rubin [this message]
2022-03-07  8:08     ` Anthony Towns
2022-03-06 13:12   ` Christian Decker
2022-03-06 13:21     ` Jeremy Rubin
2022-03-07  0:59 ` Antoine Riard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD5xwhjG7kN=LatZRpQxqmaqtoRP31BcyeN2zHtOUsGt=6oJ3w@mail.gmail.com' \
    --to=jeremy.l.rubin@gmail.com \
    --cc=aj@erisian.com.au \
    --cc=bitcoin-dev@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox