[bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)
@ 2022-03-07 23:35 ZmnSCPxj
  2022-03-10  5:05 ` Billy Tetrud
  0 siblings, 1 reply; 6+ messages in thread
From: ZmnSCPxj @ 2022-03-07 23:35 UTC (permalink / raw)
  To: Billy Tetrud; +Cc: Bitcoin Protocol Discussion

Good morning Billy,

Changed subject since this is only tangentially related to `OP_FOLD`.

> Let me organize my thoughts on this a little more clearly. There's a couple possibilities I can think of for a jet-like system:
>
> A. We could implement jets now without a consensus change, and without requiring all nodes to upgrade to new relay rules. Probably. This would give upgraded nodes improved validation performance and many upgraded nodes relay savings (transmitting/receiving fewer bytes). Transactions would be weighted the same as without the use of jets tho.
> B. We could implement the above + lighter weighting by using a soft fork to put the jets in a part of the blockchain hidden from unupgraded nodes, as you mentioned. 
> C. We could implement the above + the jet registration idea in a soft fork. 
>
> For A:
>
> * Upgraded nodes query each connection for support of jets in general, and which specific jets they support.
> * For a connection to another upgraded node that supports the jet(s) that a transaction contains, the transaction is sent verbatim with the jet included in the script (eg as some fake opcode line like 23 OP_JET, indicating to insert standard jet 23 in its place). When validation happens, or when a miner includes it in a block, the jet opcode call is replaced with the script it represents so hashing happens in a way that is recognizable to unupgraded nodes.
> * For a connection to a non-upgraded node that doesn't support jets, or an upgraded node that doesn't support the particular jet included in the script, the jet opcode call is replaced as above before sending to that node. In addition, some data is added to the transaction that unupgraded nodes propagate along but otherwise ignore. Maybe this is extra witness data, maybe this is some kind of "annex", or something else. But that data would contain the original jet opcode (in this example "23 OP_JET") so that when that transaction data reaches an upgraded node that recognizes that jet again, it can swap that back in, in place of the script fragment it represents. 
>
> I'm not 100% sure the required mechanism I mentioned of "extra ignored data" exists, and if it doesn't, then all nodes would at least need to be upgraded to support that before this mechanism could fully work.

I am not sure that can even be *made* to exist.
It seems to me a trivial way to launch a DDoS: Just ask a bunch of fullnodes to add this 1Mb of extra ignored data in this tiny 1-input-1-output transaction so I pay only a small fee if it confirms but the bandwidth of all fullnodes is wasted transmitting and then ignoring this block of data.

> But even if such a mechanism doesn't exist, a jet script could still be used, but it would be clobbered by the first nonupgraded node it is relayed to, and can't then be converted back (without using a potentially expensive lookup table as you mentioned). 

Yes, and people still run Bitcoin Core 0.8.x.....

> > If the script does not weigh less if it uses a jet, then there is no incentive for end-users to use a jet
>
> That's a good point. However, I'd point out that nodes do lots of things that there's no individual incentive for, and this might be one where people either altruistically use jets to be lighter on the network, or use them in the hopes that the jet is accepted as a standard, reducing the cost of their scripts. But certainly a direct incentive to use them is better. Honest nodes can favor connecting to those that support jets.

Since you do not want a dynamic lookup table (because of the cost of lookup), how do new jets get introduced?
If a new jet requires coordinated deployment over the network, then you might as well just softfork and be done with it.
If a new jet can just be entered into some configuration file, how do you coordinate those between multiple users so that there *is* some benefit for relay?

> >if a jet would allow SCRIPT weights to decrease, upgraded nodes need to hide them from unupgraded nodes
> > we have to do that by telling unupgraded nodes "this script will always succeed and has weight 0"
>
> Right. It doesn't have to be weight zero, but that would work fine enough. 
>
> > if everybody else has not upgraded, a user of a new jet has no security.
>
> For case A, no security is lost. For case B you're right. For case C, once nodes upgrade to the initial soft fork, new registered jets can take advantage of relay-cost weight savings (defined by the soft fork) without requiring any nodes to do any upgrading, and nodes could be further upgraded to optimize the validation of various of those registered jets, but those processing savings couldn't change the weighting of transactions without an additional soft fork.
>
> > Consider an attack where I feed you a SCRIPT that validates trivially but is filled with almost-but-not-quite-jettable code
>
> I agree a pattern-matching lookup table is probably not a great design. But a lookup table like that is not needed for the jet registration idea. After the necessary soft fork, there would be standard rules for which registered jets nodes are required to keep an index of, and so the lookup table would be a straightforward jet hash lookup rather than a pattern-matching lookup, which wouldn't have the same DOS problems. A node would simply find a jet opcode call like "ab38cd39e OP_JET" and just lookup ab38cd39e in its index. 

How does the unupgraded-to-upgraded boundary work?
Having a static lookup table is better since you can pattern-match on strings of specific, static length, and we can take a page from `rsync` and use its "rolling checksum" idea which works with identifying strings of a certain specific length at arbitrary offsets.

Say you have jetted sequences where the original code is 42 bytes, and another jetted sequence where the original code is 54 bytes, you would keep a 42-byte rolling checksum and a separate 54-byte rolling checksum, and then when it matches, you check if the last 42 or 54 bytes matched the jetted sequences.

It does imply having a bunch of rolling checksums around, though.
Sigh.

---

To make jets more useful, we should redesign the language so that `OP_PUSH` is not in the opcode stream, but instead, we have a separate table of constants that is attached / concatenated to the actual SCRIPT.

So for example instead of an HTLC having embedded `OP_PUSH`es like this:

   OP_IF
       OP_HASH160 <hash> OP_EQUALVERIFY OP_DUP OP_HASH160 <acceptor pkh>
   OP_ELSE
       <timeout> OP_CHECKLOCKTIMEVERIFY OP_DROP OP_DUP OP_HASH160 <offerrer pkh>
   OP_ENDIF
   OP_EQUALVERIFY
   OP_CHECKSIG

We would have:

   constants:
       h = <hash>
       a = <acceptor pkh>
       t = <timeout>
       o = <offerer pkh>
   script:
       OP_IF
           OP_HASH160 h OP_EQUALVERIFY OP_DUP OP_HASH160 a
       OP_ELSE
           t OP_CHECKLOCKTIMEVERIFY OP_DROP OP_DUP OP_HASH160 o
       OP_ENDIF
       OP_EQUALVERIFY
       OP_CHECKSIG

The above allows for more compressibility, as the entire `script` portion can be recognized as a jet outright.
Move the incompressible hashes out of the main SCRIPT body.

We should note as well that this makes it *easier* to create recursive covenants (for good or ill) out of `OP_CAT` and whatever opcode you want that allows recursive covenants in combination with `OP_CAT`.
Generally, recursive covenants are *much* more interesting if they can change some variables at each iteration, and having a separate table-of-constants greatly facilitates that.

Indeed, the exercise of `OP_TLUV` in [drivechains-over-recursive-convenants][] puts the loop variables into the front of the SCRIPT to make it easier to work with the SCRIPT manipulation.

[drivechains-over-recursive-covenants]: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-February/019976.html

---

Perhaps we can consider the general vs specific tension in information-theoretic terms.

A language which supports more computational power --- i.e. more general --- must, by necessity, have longer symbols, as a basic law of information theory.
After all, a general language can express more things.

However, we do recognize that certain sequences of things-to-say are much more likely than others.
That is, we expect that certain sequences "make sense" to do.
That is why "jets" are even proposed, they are shortcuts towards those.

Assuming a general language is already deployed for Bitcoin, then a new opcode is a jet as it simply makes the SCRIPT shorter.

Instead of starting with a verbose (by necessity) general language, we could instead start with a terse but restricted language, and slowly loosen up its restrictions by adding new capabilities in softforks.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)
  2022-03-07 23:35 [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT) ZmnSCPxj
@ 2022-03-10  5:05 ` Billy Tetrud
  2022-03-10  6:43   ` ZmnSCPxj
  0 siblings, 1 reply; 6+ messages in thread
From: Billy Tetrud @ 2022-03-10  5:05 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 13747 bytes --]

Hi ZmnSCPxj,

>  Just ask a bunch of fullnodes to add this 1Mb of extra ignored data in
this tiny 1-input-1-output transaction so I pay only a small fee

I'm not suggesting that you wouldn't have to pay a fee for it. You'd pay a
fee for it as normal, so there's no DOS vector. Doesn't adding
extra witness data do what would be needed here? Eg simply adding extra
data onto the witness script that will remain unconsumed after successful
execution of the script?

> how do new jets get introduced?

In scenario A, new jets get introduced by being added to bitcoin software
as basically relay rules.

> If a new jet requires coordinated deployment over the network, then you
might as well just softfork and be done with it.

It would not need a coordinated deployment. However, the more nodes that
supported that jet, the more efficient using it would be for the network.

> If a new jet can just be entered into some configuration file, how do you
coordinate those between multiple users so that there *is* some benefit for
relay?

When a new version of bitcoin comes out, people generally upgrade to it
eventually. No coordination is needed. 100% of the network need not support
a jet. Just some critical mass to get some benefit.

> Having a static lookup table is better since you can pattern-match on
strings of specific, static length

Sorry, better than what exactly?

> How does the unupgraded-to-upgraded boundary work?

This is what I'm thinking. Imagine a simple script:

OP_DUP
OP_ADD

with witness

1

This would execute as 1+1 = 2 -> success. Let's say the script is
jettified so we can instead write it as:

OP_JET
1b5f03cf # adler32 hash of the replaced script

with a witness:

OP_JET   # Some number that represents OP_JET
1b5f03cf
0
1

A jet-aware node transmitting to another jet-aware node can transmit it as
is (tho it would need to do a swap out to validate). For a jet-aware node
to transmit this to a non-jet aware node, it would need to swap the OP_JET
call with the script it represents. So the transaction sent to the non-jet
aware node would have:

Script:

OP_DUP
OP_ADD

Witness:

OP_JET
1b5f03cf
0
1

And you can see that this would execute and succeed by adding 1+1 and
ending up with the stack:

2
0
1b5f03cf
OP_JET

Which would succeed because of the non-zero top of stack.

When the non-jet aware node sends this to a jet-aware node, that node would
see the extra items on the stack after script execution, and would
interpret them as an OP_JET call specifying that OP_JET should replace the
witness items starting at index 0 with `1b5f03cf  OP_JET`. It does this and
then sends that along to the next hop.

In order to support this without a soft fork, this extra otherwise
unnecessary data would be needed, but for jets that represent long scripts,
the extra witness data could be well worth it (for the network).

However, this extra data would be a disincentive to do transactions this
way, even when its better for the network. So it might not be worth doing
it this way without a soft fork. But with a soft fork to upgrade nodes to
support an OP_JET opcode, the extra witness data can be removed (replaced
with out-of-band script fragment transmission for nodes that don't support
a particular jet).

One interesting additional thing that could be done with this mechanism is
to add higher-order function ability to jets, which could allow nodes to
add OP_FOLD or similar functions as a jet without requiring additional soft
forks.  Hypothetically, you could imagine a jet script that uses an OP_LOOP
jet be written as follows:

5             # Loop 5 times
1             # Loop the next 1 operation
3c1g14ad
OP_JET
OP_ADD  # The 1 operation to loop

The above would sum up 5 numbers from the stack. And while this summation
jet can't be represented in bitcoin script on its own (since bitcoin script
can't manipulate opcode calls), the jet *call* can still be represented as:

OP_ADD
OP_ADD
OP_ADD
OP_ADD
OP_ADD

which means all of the above replacement functionality would work just as
well.

So my point here is that jets implemented in a way similar to this would
give a much wider range of "code as compression" possibilities than
implementing a single opcode like op_fold.

> To make jets more useful, we should redesign the language so that
`OP_PUSH` is not in the opcode stream, but instead, we have a separate
table of constants that is attached / concatenated to the actual SCRIPT.

This can already be done, right? You just have to redesign the script to
consume and swap/rot around the data in the right way to separate them out
from the main script body.


On Mon, Mar 7, 2022 at 5:35 PM ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote:

> Good morning Billy,
>
> Changed subject since this is only tangentially related to `OP_FOLD`.
>
> > Let me organize my thoughts on this a little more clearly. There's a
> couple possibilities I can think of for a jet-like system:
> >
> > A. We could implement jets now without a consensus change, and
> without requiring all nodes to upgrade to new relay rules. Probably. This
> would give upgraded nodes improved validation performance and many upgraded
> nodes relay savings (transmitting/receiving fewer bytes). Transactions
> would be weighted the same as without the use of jets tho.
> > B. We could implement the above + lighter weighting by using a soft fork
> to put the jets in a part of the blockchain hidden from unupgraded nodes,
> as you mentioned.
> > C. We could implement the above + the jet registration idea in a soft
> fork.
> >
> > For A:
> >
> > * Upgraded nodes query each connection for support of jets in general,
> and which specific jets they support.
> > * For a connection to another upgraded node that supports the jet(s)
> that a transaction contains, the transaction is sent verbatim with the jet
> included in the script (eg as some fake opcode line like 23 OP_JET,
> indicating to insert standard jet 23 in its place). When validation
> happens, or when a miner includes it in a block, the jet opcode call is
> replaced with the script it represents so hashing happens in a way that is
> recognizable to unupgraded nodes.
> > * For a connection to a non-upgraded node that doesn't support jets, or
> an upgraded node that doesn't support the particular jet included in the
> script, the jet opcode call is replaced as above before sending to that
> node. In addition, some data is added to the transaction that unupgraded
> nodes propagate along but otherwise ignore. Maybe this is extra witness
> data, maybe this is some kind of "annex", or something else. But that data
> would contain the original jet opcode (in this example "23 OP_JET") so that
> when that transaction data reaches an upgraded node that recognizes that
> jet again, it can swap that back in, in place of the script fragment it
> represents.
> >
> > I'm not 100% sure the required mechanism I mentioned of "extra ignored
> data" exists, and if it doesn't, then all nodes would at least need to be
> upgraded to support that before this mechanism could fully work.
>
> I am not sure that can even be *made* to exist.
> It seems to me a trivial way to launch a DDoS: Just ask a bunch of
> fullnodes to add this 1Mb of extra ignored data in this tiny
> 1-input-1-output transaction so I pay only a small fee if it confirms but
> the bandwidth of all fullnodes is wasted transmitting and then ignoring
> this block of data.
>
> > But even if such a mechanism doesn't exist, a jet script could still be
> used, but it would be clobbered by the first nonupgraded node it is relayed
> to, and can't then be converted back (without using a potentially expensive
> lookup table as you mentioned).
>
> Yes, and people still run Bitcoin Core 0.8.x.....
>
> > > If the script does not weigh less if it uses a jet, then there is no
> incentive for end-users to use a jet
> >
> > That's a good point. However, I'd point out that nodes do lots of things
> that there's no individual incentive for, and this might be one where
> people either altruistically use jets to be lighter on the network, or use
> them in the hopes that the jet is accepted as a standard, reducing the cost
> of their scripts. But certainly a direct incentive to use them is better.
> Honest nodes can favor connecting to those that support jets.
>
> Since you do not want a dynamic lookup table (because of the cost of
> lookup), how do new jets get introduced?
> If a new jet requires coordinated deployment over the network, then you
> might as well just softfork and be done with it.
> If a new jet can just be entered into some configuration file, how do you
> coordinate those between multiple users so that there *is* some benefit for
> relay?
>
> > >if a jet would allow SCRIPT weights to decrease, upgraded nodes need to
> hide them from unupgraded nodes
> > > we have to do that by telling unupgraded nodes "this script will
> always succeed and has weight 0"
> >
> > Right. It doesn't have to be weight zero, but that would work fine
> enough.
> >
> > > if everybody else has not upgraded, a user of a new jet has no
> security.
> >
> > For case A, no security is lost. For case B you're right. For case C,
> once nodes upgrade to the initial soft fork, new registered jets can take
> advantage of relay-cost weight savings (defined by the soft fork) without
> requiring any nodes to do any upgrading, and nodes could be further
> upgraded to optimize the validation of various of those registered jets,
> but those processing savings couldn't change the weighting of transactions
> without an additional soft fork.
> >
> > > Consider an attack where I feed you a SCRIPT that validates trivially
> but is filled with almost-but-not-quite-jettable code
> >
> > I agree a pattern-matching lookup table is probably not a great design.
> But a lookup table like that is not needed for the jet registration idea.
> After the necessary soft fork, there would be standard rules for which
> registered jets nodes are required to keep an index of, and so the lookup
> table would be a straightforward jet hash lookup rather than a
> pattern-matching lookup, which wouldn't have the same DOS problems. A node
> would simply find a jet opcode call like "ab38cd39e OP_JET" and just lookup
> ab38cd39e in its index.
>
> How does the unupgraded-to-upgraded boundary work?
> Having a static lookup table is better since you can pattern-match on
> strings of specific, static length, and we can take a page from `rsync` and
> use its "rolling checksum" idea which works with identifying strings of a
> certain specific length at arbitrary offsets.
>
> Say you have jetted sequences where the original code is 42 bytes, and
> another jetted sequence where the original code is 54 bytes, you would keep
> a 42-byte rolling checksum and a separate 54-byte rolling checksum, and
> then when it matches, you check if the last 42 or 54 bytes matched the
> jetted sequences.
>
> It does imply having a bunch of rolling checksums around, though.
> Sigh.
>
> ---
>
> To make jets more useful, we should redesign the language so that
> `OP_PUSH` is not in the opcode stream, but instead, we have a separate
> table of constants that is attached / concatenated to the actual SCRIPT.
>
> So for example instead of an HTLC having embedded `OP_PUSH`es like this:
>
>    OP_IF
>        OP_HASH160 <hash> OP_EQUALVERIFY OP_DUP OP_HASH160 <acceptor pkh>
>    OP_ELSE
>        <timeout> OP_CHECKLOCKTIMEVERIFY OP_DROP OP_DUP OP_HASH160
> <offerrer pkh>
>    OP_ENDIF
>    OP_EQUALVERIFY
>    OP_CHECKSIG
>
> We would have:
>
>    constants:
>        h = <hash>
>        a = <acceptor pkh>
>        t = <timeout>
>        o = <offerer pkh>
>    script:
>        OP_IF
>            OP_HASH160 h OP_EQUALVERIFY OP_DUP OP_HASH160 a
>        OP_ELSE
>            t OP_CHECKLOCKTIMEVERIFY OP_DROP OP_DUP OP_HASH160 o
>        OP_ENDIF
>        OP_EQUALVERIFY
>        OP_CHECKSIG
>
> The above allows for more compressibility, as the entire `script` portion
> can be recognized as a jet outright.
> Move the incompressible hashes out of the main SCRIPT body.
>
> We should note as well that this makes it *easier* to create recursive
> covenants (for good or ill) out of `OP_CAT` and whatever opcode you want
> that allows recursive covenants in combination with `OP_CAT`.
> Generally, recursive covenants are *much* more interesting if they can
> change some variables at each iteration, and having a separate
> table-of-constants greatly facilitates that.
>
> Indeed, the exercise of `OP_TLUV` in
> [drivechains-over-recursive-convenants][] puts the loop variables into the
> front of the SCRIPT to make it easier to work with the SCRIPT manipulation.
>
> [drivechains-over-recursive-covenants]:
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-February/019976.html
>
> ---
>
> Perhaps we can consider the general vs specific tension in
> information-theoretic terms.
>
> A language which supports more computational power --- i.e. more general
> --- must, by necessity, have longer symbols, as a basic law of information
> theory.
> After all, a general language can express more things.
>
> However, we do recognize that certain sequences of things-to-say are much
> more likely than others.
> That is, we expect that certain sequences "make sense" to do.
> That is why "jets" are even proposed, they are shortcuts towards those.
>
> Assuming a general language is already deployed for Bitcoin, then a new
> opcode is a jet as it simply makes the SCRIPT shorter.
>
> Instead of starting with a verbose (by necessity) general language, we
> could instead start with a terse but restricted language, and slowly loosen
> up its restrictions by adding new capabilities in softforks.
>
> Regards,
> ZmnSCPxj
>
>

[-- Attachment #2: Type: text/html, Size: 16284 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)
  2022-03-10  5:05 ` Billy Tetrud
@ 2022-03-10  6:43   ` ZmnSCPxj
  2022-03-11 14:11     ` Billy Tetrud
  0 siblings, 1 reply; 6+ messages in thread
From: ZmnSCPxj @ 2022-03-10  6:43 UTC (permalink / raw)
  To: Billy Tetrud; +Cc: Bitcoin Protocol Discussion

Good morning Billy,

> Hi ZmnSCPxj,
>
> >  Just ask a bunch of fullnodes to add this 1Mb of extra ignored data in this tiny 1-input-1-output transaction so I pay only a small fee
>
> I'm not suggesting that you wouldn't have to pay a fee for it. You'd pay a fee for it as normal, so there's no DOS vector. Doesn't adding extra witness data do what would be needed here? Eg simply adding extra data onto the witness script that will remain unconsumed after successful execution of the script?

I think we would want to have a cleanstack rule at some point (do not remember out-of-hand if Taproot already enforces one).

So now being nice to the network is *more* costly?
That just *dis*incentivizes jet usage.

> > how do new jets get introduced?
>
> In scenario A, new jets get introduced by being added to bitcoin software as basically relay rules. 
>
> > If a new jet requires coordinated deployment over the network, then you might as well just softfork and be done with it.
>
> It would not need a coordinated deployment. However, the more nodes that supported that jet, the more efficient using it would be for the network. 
>
> > If a new jet can just be entered into some configuration file, how do you coordinate those between multiple users so that there *is* some benefit for relay?
>
> When a new version of bitcoin comes out, people generally upgrade to it eventually. No coordination is needed. 100% of the network need not support a jet. Just some critical mass to get some benefit. 

How large is the critical mass needed?

If you use witness to transport jet information across non-upgraded nodes, then that disincentivizes use of jets and you can only incentivize jets by softfork, so you might as well just get a softfork.

If you have no way to transport jet information from an upgraded through a non-upgraded back to an upgraded node, then I think you need a fairly large buy-in from users before non-upgraded nodes are rare enough that relay is not much affected, and if the required buy-in is large enough, you might as well softfork.

> > Having a static lookup table is better since you can pattern-match on strings of specific, static length
>
> Sorry, better than what exactly? 

Than using a dynamic lookup table, which is how I understood your previous email about "scripts in the 1000 past blocks".

> > How does the unupgraded-to-upgraded boundary work?
> <snip>
> When the non-jet aware node sends this to a jet-aware node, that node would see the extra items on the stack after script execution, and would interpret them as an OP_JET call specifying that OP_JET should replace the witness items starting at index 0 with `1b5f03cf  OP_JET`. It does this and then sends that along to the next hop.

It would have to validate as well that the SCRIPT sub-section matches the jet, else I could pretend to be a non-jet-aware node and give you a SCRIPT sub-section that does not match the jet and would cause your validation to diverge from other nodes.

Adler32 seems a bit short though, it seems to me that it may lead to two different SCRIPT subsections hashing to the same hash.

Suppose I have two different node softwares.
One uses a particular interpretation for a particular Adler32 hash.
The other uses a different interpretation.
If we are not careful, if these two jet-aware software talk to each other, they will ban each other from the network and cause a chainsplit.
Since the Bitcoin software is open source, nothing prevents anyone from using a different SCRIPT subsection for a particular Adler32 hash if they find a collision and can somehow convince people to run their modified software.

> In order to support this without a soft fork, this extra otherwise unnecessary data would be needed, but for jets that represent long scripts, the extra witness data could be well worth it (for the network). 
>
> However, this extra data would be a disincentive to do transactions this way, even when its better for the network. So it might not be worth doing it this way without a soft fork. But with a soft fork to upgrade nodes to support an OP_JET opcode, the extra witness data can be removed (replaced with out-of-band script fragment transmission for nodes that don't support a particular jet). 

Which is why I pointed out that each individual jet may very well require a softfork, or enough buy-in that you might as well just softfork.

> One interesting additional thing that could be done with this mechanism is to add higher-order function ability to jets, which could allow nodes to add OP_FOLD or similar functions as a jet without requiring additional soft forks.  Hypothetically, you could imagine a jet script that uses an OP_LOOP jet be written as follows:
>
> 5             # Loop 5 times
> 1             # Loop the next 1 operation
> 3c1g14ad 
> OP_JET
> OP_ADD  # The 1 operation to loop
>
> The above would sum up 5 numbers from the stack. And while this summation jet can't be represented in bitcoin script on its own (since bitcoin script can't manipulate opcode calls), the jet *call* can still be represented as:
>
> OP_ADD  
> OP_ADD  
> OP_ADD  
> OP_ADD  
> OP_ADD  
>
> which means all of the above replacement functionality would work just as well. 
>
> So my point here is that jets implemented in a way similar to this would give a much wider range of "code as compression" possibilities than implementing a single opcode like op_fold. 

Yes, that is certainly the case, and nothing really prevents us bringing "programming as compression" to its logical conclusion.

> > To make jets more useful, we should redesign the language so that `OP_PUSH` is not in the opcode stream, but instead, we have a separate table of constants that is attached / concatenated to the actual SCRIPT.
>
> This can already be done, right? You just have to redesign the script to consume and swap/rot around the data in the right way to separate them out from the main script body. 

Yes, but that implies additional operations (and execution overhead), increasing the costs to use jets, which makes it even less palatable to use jets, *in addition to* the witness hack disincentivizing jets.

So I would suggest that, if we were to seriously pursue jets, we should really replace most of the `OP_PUSH` opcodes with variants that look up in a static table at the start, before the executable script body.
I.e. opcodes 0x01 to 0x4e instead mean "push contents of `c1` to `c78` from the constants table", and have aliases `a` through `z` for `c1` to `c26`, etc.
That way, replacing the `OP_PUSH` is shorter in the actual SCRIPT (instead of a bunch of stack manipulations) and hopefully the overhead of the constants table can be kept low.

In particular, this helps jets compose more easily; if we want a SCRIPT that incorporates an existing jet, we do not have to manipulate the stack in a way that the existing jet expects, we just load the proper data into the constants table.

Or something, anyway.
This seems a fair amount of complexity here.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)
  2022-03-10  6:43   ` ZmnSCPxj
@ 2022-03-11 14:11     ` Billy Tetrud
  2022-03-16 15:38       ` ZmnSCPxj
  0 siblings, 1 reply; 6+ messages in thread
From: Billy Tetrud @ 2022-03-11 14:11 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 12596 bytes --]

> I think we would want to have a cleanstack rule at some point

Ah is this a rule where a script shouldn't validate if more than just a
true is left on the stack? I can see how that would prevent the
non-soft-fork version of what I'm proposing.

> How large is the critical mass needed?

Well it seems we've agreed that were we going to do this, we would want to
at least do a soft-fork to make known jet scripts lighter weight (and
unknown jet scripts not-heavier) than their non-jet counterparts. So given
a situation where this soft fork happens, and someone wants to implement a
new jet, how much critical mass would be needed for the network to get some
benefit from the jet? Well, the absolute minimum for some benefit to happen
is that two nodes that support that jet are connected. In such a case, one
node can send that jet scripted transaction along without sending the data
of what the jet stands for. The jet itself is pretty small, like 2 or so
bytes. So that does impose a small additional cost on nodes that don't
support a jet. For 100,000 nodes, that means 200,000 bytes of transmission
would need to be saved for a jet to break even. So if the jet stands for a
22 byte script, it would break even when 10% of the network supported it.
If the jet stood for a 102 byte script, it would break even when 2% of the
network supported it. So how much critical mass is necessary for it to be
worth it depends on what the script is.

>  Than using a dynamic lookup table, which is how I understood your
previous email about "scripts in the 1000 past blocks".

Ah, I didn't mean using a dynamic lookup. This was about the idea of jet
registration, where registered jets would be kept a count of for some
number of blocks (eg 1000) and dropped if they don't reach a threshold rate
of usage. A static lookup table would work for this, I agree.

> It would have to validate as well that the SCRIPT sub-section matches the
jet

Seems like a good idea.

> Adler32 seems a bit short though

You might be right. Certainly some more care would need to be taken in an
actual implementation than I've taken writing my back of the napkin idea
out ; )

> nothing prevents anyone from using a different SCRIPT subsection for a
particular Adler32 hash if they find a collision and can somehow convince
people to run their modified software.

Someone that can convince people to run their modified software can
*always* cause those people to chainsplit from the main chain, so I don't
think the above ideas are special in this regard.

>> it might not be worth doing it this way without a soft fork
> Which is why I pointed out that each individual jet may very well require
a softfork, or enough buy-in that you might as well just softfork.

I'm saying something rather different actually. I don't think each
individual jet requires a softfork to be quite useful. What I meant by "it
might not be worth doing it this way without a soft fork" is that we
probably want to implement a soft fork to allow all jet scripts to have
reduced blockweight. However, once most nodes support that soft fork, new
individual jets do not need a softfork for the network to take advantage of
them. As I mused about above, even 10% of the network supporting a jet
standin for a medium length script could result in significant network
bandwidth savings. Different sections of the network could decide
individually what jets they want to support without needing the usual chaos
of a soft fork for each one, but of course the more the better for a
popular jet. There would be benefits for eventually soft forking such jets
in (to make them weigh even less based on implementation of optimized
validation functions), and real life usage of those jets could inform the
decisions around them. They could already be well tested in the wild before
being upgraded in a softfork.

> Yes, but that implies additional operations (and execution overhead),
increasing the costs to use jets, which makes it even less palatable to use
jets, *in addition to* the witness hack disincentivizing jets.

For the use of a single jet, this can be completely solved by that jet. All
the additional operations you're talking about only need to happen in a
general bitcoin script evaluator. But a jet evaluator can be hand optimized
for that jet, which could operate in exactly the the function-like way you
suggested, because it would actually be a function, under the hood.

> this helps jets compose more easily; if we want a SCRIPT that
incorporates an existing jet, we do not have to manipulate the stack in a
way that the existing jet expects, we just load the proper data into the
constants table.

I think I see what you're saying about multiple jets composing together
easily. I think your idea about loading constants from their initial
positions has merit - basically function arguments that you can reference
by position rather than needing to arrange them in the right order on the
stack. Such is the existence of a stack-based language. I like the idea of
eg `c1` to `c26` paralleling the pushdata opcodes. The question I have is:
where would the constants table come from? Would it reference the original
positions of items on the witness stack?

On Thu, Mar 10, 2022 at 12:44 AM ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote:

> Good morning Billy,
>
> > Hi ZmnSCPxj,
> >
> > >  Just ask a bunch of fullnodes to add this 1Mb of extra ignored data
> in this tiny 1-input-1-output transaction so I pay only a small fee
> >
> > I'm not suggesting that you wouldn't have to pay a fee for it. You'd pay
> a fee for it as normal, so there's no DOS vector. Doesn't adding
> extra witness data do what would be needed here? Eg simply adding extra
> data onto the witness script that will remain unconsumed after successful
> execution of the script?
>
> I think we would want to have a cleanstack rule at some point (do not
> remember out-of-hand if Taproot already enforces one).
>
> So now being nice to the network is *more* costly?
> That just *dis*incentivizes jet usage.
>
> > > how do new jets get introduced?
> >
> > In scenario A, new jets get introduced by being added to bitcoin
> software as basically relay rules.
> >
> > > If a new jet requires coordinated deployment over the network, then
> you might as well just softfork and be done with it.
> >
> > It would not need a coordinated deployment. However, the more nodes that
> supported that jet, the more efficient using it would be for the network.
> >
> > > If a new jet can just be entered into some configuration file, how do
> you coordinate those between multiple users so that there *is* some benefit
> for relay?
> >
> > When a new version of bitcoin comes out, people generally upgrade to it
> eventually. No coordination is needed. 100% of the network need not support
> a jet. Just some critical mass to get some benefit.
>
> How large is the critical mass needed?
>
> If you use witness to transport jet information across non-upgraded nodes,
> then that disincentivizes use of jets and you can only incentivize jets by
> softfork, so you might as well just get a softfork.
>
> If you have no way to transport jet information from an upgraded through a
> non-upgraded back to an upgraded node, then I think you need a fairly large
> buy-in from users before non-upgraded nodes are rare enough that relay is
> not much affected, and if the required buy-in is large enough, you might as
> well softfork.
>
> > > Having a static lookup table is better since you can pattern-match on
> strings of specific, static length
> >
> > Sorry, better than what exactly?
>
> Than using a dynamic lookup table, which is how I understood your previous
> email about "scripts in the 1000 past blocks".
>
> > > How does the unupgraded-to-upgraded boundary work?
> > <snip>
> > When the non-jet aware node sends this to a jet-aware node, that node
> would see the extra items on the stack after script execution, and would
> interpret them as an OP_JET call specifying that OP_JET should replace the
> witness items starting at index 0 with `1b5f03cf  OP_JET`. It does this and
> then sends that along to the next hop.
>
> It would have to validate as well that the SCRIPT sub-section matches the
> jet, else I could pretend to be a non-jet-aware node and give you a SCRIPT
> sub-section that does not match the jet and would cause your validation to
> diverge from other nodes.
>
> Adler32 seems a bit short though, it seems to me that it may lead to two
> different SCRIPT subsections hashing to the same hash.
>
> Suppose I have two different node softwares.
> One uses a particular interpretation for a particular Adler32 hash.
> The other uses a different interpretation.
> If we are not careful, if these two jet-aware software talk to each other,
> they will ban each other from the network and cause a chainsplit.
> Since the Bitcoin software is open source, nothing prevents anyone from
> using a different SCRIPT subsection for a particular Adler32 hash if they
> find a collision and can somehow convince people to run their modified
> software.
>
> > In order to support this without a soft fork, this extra otherwise
> unnecessary data would be needed, but for jets that represent long scripts,
> the extra witness data could be well worth it (for the network).
> >
> > However, this extra data would be a disincentive to do transactions this
> way, even when its better for the network. So it might not be worth doing
> it this way without a soft fork. But with a soft fork to upgrade nodes to
> support an OP_JET opcode, the extra witness data can be removed (replaced
> with out-of-band script fragment transmission for nodes that don't support
> a particular jet).
>
> Which is why I pointed out that each individual jet may very well require
> a softfork, or enough buy-in that you might as well just softfork.
>
> > One interesting additional thing that could be done with this mechanism
> is to add higher-order function ability to jets, which could allow nodes to
> add OP_FOLD or similar functions as a jet without requiring additional soft
> forks.  Hypothetically, you could imagine a jet script that uses an OP_LOOP
> jet be written as follows:
> >
> > 5             # Loop 5 times
> > 1             # Loop the next 1 operation
> > 3c1g14ad
> > OP_JET
> > OP_ADD  # The 1 operation to loop
> >
> > The above would sum up 5 numbers from the stack. And while this
> summation jet can't be represented in bitcoin script on its own (since
> bitcoin script can't manipulate opcode calls), the jet *call* can still be
> represented as:
> >
> > OP_ADD
> > OP_ADD
> > OP_ADD
> > OP_ADD
> > OP_ADD
> >
> > which means all of the above replacement functionality would work just
> as well.
> >
> > So my point here is that jets implemented in a way similar to this would
> give a much wider range of "code as compression" possibilities than
> implementing a single opcode like op_fold.
>
> Yes, that is certainly the case, and nothing really prevents us bringing
> "programming as compression" to its logical conclusion.
>
> > > To make jets more useful, we should redesign the language so that
> `OP_PUSH` is not in the opcode stream, but instead, we have a separate
> table of constants that is attached / concatenated to the actual SCRIPT.
> >
> > This can already be done, right? You just have to redesign the script to
> consume and swap/rot around the data in the right way to separate them out
> from the main script body.
>
> Yes, but that implies additional operations (and execution overhead),
> increasing the costs to use jets, which makes it even less palatable to use
> jets, *in addition to* the witness hack disincentivizing jets.
>
> So I would suggest that, if we were to seriously pursue jets, we should
> really replace most of the `OP_PUSH` opcodes with variants that look up in
> a static table at the start, before the executable script body.
> I.e. opcodes 0x01 to 0x4e instead mean "push contents of `c1` to `c78`
> from the constants table", and have aliases `a` through `z` for `c1` to
> `c26`, etc.
> That way, replacing the `OP_PUSH` is shorter in the actual SCRIPT (instead
> of a bunch of stack manipulations) and hopefully the overhead of the
> constants table can be kept low.
>
> In particular, this helps jets compose more easily; if we want a SCRIPT
> that incorporates an existing jet, we do not have to manipulate the stack
> in a way that the existing jet expects, we just load the proper data into
> the constants table.
>
> Or something, anyway.
> This seems a fair amount of complexity here.
>
> Regards,
> ZmnSCPxj
>

[-- Attachment #2: Type: text/html, Size: 14081 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)
  2022-03-11 14:11     ` Billy Tetrud
@ 2022-03-16 15:38       ` ZmnSCPxj
  2022-03-16 15:59         ` Billy Tetrud
  0 siblings, 1 reply; 6+ messages in thread
From: ZmnSCPxj @ 2022-03-16 15:38 UTC (permalink / raw)
  To: Billy Tetrud; +Cc: Bitcoin Protocol Discussion

Good morning Billy,

> > I think we would want to have a cleanstack rule at some point
>
> Ah is this a rule where a script shouldn't validate if more than just a true is left on the stack? I can see how that would prevent the non-soft-fork version of what I'm proposing. 

Yes.
There was also an even stronger cleanstack rule where the stack and alt stack are totally empty.
This is because a SCRIPT really just returns "valid" or "invalid", and `OP_VERIFY` can be trivially appended to a SCRIPT that leaves a single stack item to convert to a SCRIPT that leaves no stack items and retains the same behavior.

>
> > How large is the critical mass needed?
>
> Well it seems we've agreed that were we going to do this, we would want to at least do a soft-fork to make known jet scripts lighter weight (and unknown jet scripts not-heavier) than their non-jet counterparts. So given a situation where this soft fork happens, and someone wants to implement a new jet, how much critical mass would be needed for the network to get some benefit from the jet? Well, the absolute minimum for some benefit to happen is that two nodes that support that jet are connected. In such a case, one node can send that jet scripted transaction along without sending the data of what the jet stands for. The jet itself is pretty small, like 2 or so bytes. So that does impose a small additional cost on nodes that don't support a jet. For 100,000 nodes, that means 200,000 bytes of transmission would need to be saved for a jet to break even. So if the jet stands for a 22 byte script, it would break even when 10% of the network supported it. If the jet stood for a 102 byte script, it would break even when 2% of the network supported it. So how much critical mass is necessary for it to be worth it depends on what the script is. 

The math seems reasonable.


> The question I have is: where would the constants table come from? Would it reference the original positions of items on the witness stack? 

The constants table would be part of the SCRIPT puzzle, and thus not in the witness solution.
I imagine the SCRIPT would be divided into two parts: (1) a table of constants and (2) the actual opcodes to execute.


Regards,
ZmnSCPxj


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT)
  2022-03-16 15:38       ` ZmnSCPxj
@ 2022-03-16 15:59         ` Billy Tetrud
  0 siblings, 0 replies; 6+ messages in thread
From: Billy Tetrud @ 2022-03-16 15:59 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2813 bytes --]

>  The constants table would be part of the SCRIPT puzzle

Ah I see what you're saying now. You're not talking about referencing
inputs from the spender, but rather constants for the script writer to
parameterize a jet with. TBH I think both would be useful, and both could
potentially be done in the same way (ie reference their position in the
script before any evaluation starts). I think your idea is a good one.

Cheers

On Wed, Mar 16, 2022 at 10:39 AM ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote:

> Good morning Billy,
>
> > > I think we would want to have a cleanstack rule at some point
> >
> > Ah is this a rule where a script shouldn't validate if more than just a
> true is left on the stack? I can see how that would prevent the
> non-soft-fork version of what I'm proposing.
>
> Yes.
> There was also an even stronger cleanstack rule where the stack and alt
> stack are totally empty.
> This is because a SCRIPT really just returns "valid" or "invalid", and
> `OP_VERIFY` can be trivially appended to a SCRIPT that leaves a single
> stack item to convert to a SCRIPT that leaves no stack items and retains
> the same behavior.
>
> >
> > > How large is the critical mass needed?
> >
> > Well it seems we've agreed that were we going to do this, we would want
> to at least do a soft-fork to make known jet scripts lighter weight (and
> unknown jet scripts not-heavier) than their non-jet counterparts. So given
> a situation where this soft fork happens, and someone wants to implement a
> new jet, how much critical mass would be needed for the network to get some
> benefit from the jet? Well, the absolute minimum for some benefit to happen
> is that two nodes that support that jet are connected. In such a case, one
> node can send that jet scripted transaction along without sending the data
> of what the jet stands for. The jet itself is pretty small, like 2 or so
> bytes. So that does impose a small additional cost on nodes that don't
> support a jet. For 100,000 nodes, that means 200,000 bytes of transmission
> would need to be saved for a jet to break even. So if the jet stands for a
> 22 byte script, it would break even when 10% of the network supported it.
> If the jet stood for a 102 byte script, it would break even when 2% of the
> network supported it. So how much critical mass is necessary for it to be
> worth it depends on what the script is.
>
> The math seems reasonable.
>
>
> > The question I have is: where would the constants table come from? Would
> it reference the original positions of items on the witness stack?
>
> The constants table would be part of the SCRIPT puzzle, and thus not in
> the witness solution.
> I imagine the SCRIPT would be divided into two parts: (1) a table of
> constants and (2) the actual opcodes to execute.
>
>
> Regards,
> ZmnSCPxj
>

[-- Attachment #2: Type: text/html, Size: 3272 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-16 15:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 23:35 [bitcoin-dev] Jets (Was: `OP_FOLD`: A Looping Construct For Bitcoin SCRIPT) ZmnSCPxj
2022-03-10  5:05 ` Billy Tetrud
2022-03-10  6:43   ` ZmnSCPxj
2022-03-11 14:11     ` Billy Tetrud
2022-03-16 15:38       ` ZmnSCPxj
2022-03-16 15:59         ` Billy Tetrud

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox