From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id F4137C000B for ; Fri, 4 Mar 2022 23:11:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id D4B8481D5C for ; Fri, 4 Mar 2022 23:11:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org X-Spam-Flag: NO X-Spam-Score: -1.602 X-Spam-Level: X-Spam-Status: No, score=-1.602 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, FROM_LOCAL_NOVOWEL=0.5, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=protonmail.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7TW0MZ91gTUh for ; Fri, 4 Mar 2022 23:11:02 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mail-40130.protonmail.ch (mail-40130.protonmail.ch [185.70.40.130]) by smtp1.osuosl.org (Postfix) with ESMTPS id 1F279818D0 for ; Fri, 4 Mar 2022 23:11:02 +0000 (UTC) Date: Fri, 04 Mar 2022 23:10:48 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1646435458; bh=LDZ/IaXPrOQBbKTzraFDE+g98KrqgYQdX1T4sUc4EiI=; h=Date:To:From:Reply-To:Subject:Message-ID:In-Reply-To:References: From:To:Cc:Date:Subject:Reply-To:Feedback-ID:Message-ID; b=YeLizwtsqiwXb7JLxK862qpGx4Hk7lilhPrlw3Yl/CLJnYHdYnkLxd4eZ1dBy94RR yHgubXh5VFR0PcjJWWdEg15b2TfJhYTD2gqp1jq8o39oHlnmuuyUEyL5C3mM4emBw5 eRM5fthb1ssN6Q9Yn0V8bdR8HUfONLbIlJi0qOIus0vE646+fUn4qY0a2JJSYyj6/N odph60hJ1MLrAWKr9mZ+hP93vx7a8aBDNt9S7XKxUGWJPUxK3YUjE10MwMBA9CzJlU bZcsuFLTYuq78/4NlM1TwXz9duZtV7tMF7zAd+1GdVcfI7gwe85vp9i/PF6MUcNECS 7N/KGQLTSNMfg== To: Anthony Towns , Bitcoin Protocol Discussion From: ZmnSCPxj Reply-To: ZmnSCPxj Message-ID: <0yCTRKhBa9IPPg5J4HfKxraWJ4w6gUS5LRAoCPk01NpbYk-9R5zxAOmJO1Z8voUiatUJugYB6Oa9t1wFLbhQSgDie8hBzr0Z1EJVm6XGvMI=@protonmail.com> In-Reply-To: <20220304010442.GC3869@erisian.com.au> References: <20220304010442.GC3869@erisian.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [bitcoin-dev] bitcoin scripting and lisp X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Mar 2022 23:11:08 -0000 Good morning aj, > On Sun, Feb 27, 2022 at 04:34:31PM +0000, ZmnSCPxj via bitcoin-dev wrote: > > > In reaction to this, AJ Towns mailed me privately about some of his > > thoughts on this insane `OP_EVICT` proposal. > > He observed that we could generalize the `OP_EVICT` opcode by > > decomposing it into smaller parts, including an operation congruent > > to the Scheme/Haskell/Scala `map` operation. > > At much the same time Zman was thinking about OP_FOLD and in exactly the > same context, I was wondering what the simplest possible language that > had some sort of map construction was -- I mean simplest in a "practical > engineering" sense; I think Simplicity already has the Euclidean/Peano > "least axioms" sense covered. > > The thing that's most appealing to me about bitcoin script as it stands > (beyond "it works") is that it's really pretty simple in an engineering > sense: it's just a "forth" like system, where you put byte strings on a > stack and have a few operators to manipulate them. The alt-stack, and > supporting "IF" and "CODESEPARATOR" add a little additional complexity, > but really not very much. > > To level-up from that, instead of putting byte strings on a stack, you > could have some other data structure than a stack -- eg one that allows > nesting. Simple ones that come to mind are lists of (lists of) byte > strings, or a binary tree of byte strings [0]. Both those essentially > give you a lisp-like language -- lisp is obviously all about lists, > and a binary tree is just made of things or pairs of things, and pairs > of things are just another way of saying "car" and "cdr". > > A particular advantage of lisp-like approaches is that they treat code > and data exactly the same -- so if we're trying to leave the option open > for a transaction to supply some unexpected code on the witness stack, > then lisp handles that really naturally: you were going to include data > on the stack anyway, and code and data are the same, so you don't have > to do anything special at all. And while I've never really coded in > lisp at all, my understanding is that its biggest problems are all about > doing things efficiently at large scales -- but script's problem space > is for very small scale things, so there's at least reason to hope that > any problems lisp might have won't actually show up for this use case. I heartily endorse LISP --- it has a trivial implementation of `eval` that = is easily implementable once you have defined a proper data type in preferr= ed-language-here to represent LISP datums. Combine it with your idea of committing to a max-number-of-operations (whic= h increases the weight of the transaction) and you may very well have somet= hing viable. (In particular, even though `eval` is traditionally (re-)implemented in LIS= P itself, the limit on max-number-of-operations means any `eval` implementa= tion within the same language is also forcibly made total.) Of note is that the supposed "problem at scale" of LISP is, as I understand= it, due precisely to its code and data being homoiconic to each other. This homoiconicity greatly tempts LISP programmers to use macros, i.e. prog= rams that generate other programs from some input syntax. Homoiconicity means that one can manipulate code just as easily as the data= , and thus LISP macros are a trivial extension on the language. This allows each LISP programmer to just code up a macro to expand common p= atterns. However, each LISP programmer then ends up implementing *different*, but *s= imilar* macros from each other. Unfortunately, programming at scale requires multiple programmers speaking = the same language. Then programming at scale is hampered because each LISP programmer has thei= r own private dialect of LISP (formed from the common LISP language and fro= m their own extensive set of private macros) and intercommunication between= them is hindered by the fact that each one speaks their own private dialec= t. Some LISP-like languages (e.g. Scheme) have classically targeted a "small" = subset of absolutely-necessary operations, and each implementation of the l= anguage immediately becomes a new dialect due to having slightly different = forms for roughly the same convenience function or macro, and *then* indivi= dual programmers build their own private dialect on top. For Scheme specifically, R7RS has targeted providing a "large" standard as = well, as did R6RS (which only *had* a "large" standard), but individual Sch= eme implementations have not always liked to implement *all* the "large" st= andard. Otherwise, every big C program contains a half-assed implementation of half= of Common LISP, so ---- > - I don't think execution costing takes into account how much memory > is used at any one time, just how much was allocated in total; so > the equivalent of (OP_DUP OP_DROP OP_DUP OP_DROP ..) only has the > allocations accounted for, with no discount given for the immediate > freeing, so it gets treated as having the same cost as (OP_DUP > OP_DUP .. OP_DROP OP_DROP ..). Doing it that way would be a worse > than how bitcoin script is currently costed, but doing better might > mean locking in an evaluation method at the consensus level. Seems > worth looking into, at least. This may depend on the language that the interpreter is written in. For example, on a typical GC language, both the N * `OP_DUP OP_DROP` and th= e N * `OP_DUP` + N * `OP_DROP` will have similar behavior when allocated at= the nursery. Since the GC nursery acts as a large buffer of potential allocations, the a= mount of work done in both cases would be the same, at least until the numb= er of allocs exceeds the nursery size. Alternately, the implementation may use immutable byte vectors, in which ca= se `OP_DUP` is just a pointer copy. Or alternately the implementation may use copy-on-write byte vectors, in wh= ich case `OP_DUP` is just a pointer copy plus refcount increment, and `OP_D= ROP` is just a refcount decrement, and the amount of memory used remains sm= all. > There's two ways to think about upgradability here; if someday we wan= t > to add new opcodes to the language -- perhaps something to validate z= ero > knowledge proofs or calculate sha3 or use a different ECC curve, or s= ome > way to support cross-input signature aggregation, or perhaps it's jus= t > that some snippets are very widely used and we'd like to code them in > C++ directly so they validate quicker and don't use up as much block > weight. One approach is to just define a new version of the language > via the tapleaf version, defining new opcodes however we like. > > The other is to use the "softfork" opcode -- chia defines it as: > > (softfork cost code) > > though I think it would probably be better if it were > > (softfork cost version code) > > where the idea is that "code" will use the "x" opcode if there's a > problem, and anyone supporting the "version" softfork can verify that > there aren't any problems at a cost of "cost". However, whether you > do or don't support that softfork, as far as the rest of the script i= s > concerned, the expression will either fail entirely or evaluate as ze= ro; > so anyone who doesn't support the softfork can just replace it with z= ero > and continue on, treating it as if it had costed "cost" units. > > One thing worth noting: "softfork" behaves more like OP_NOP than > tapscript's OP_SUCCESS -- I think it's just not possible in general t= o > have OP_SUCCESS-like behaviour if you're trying to allow accepting co= de > from the witness data -- otherwise as soon as you reveal that your sc= ript > does accept arbitrary code supplied by the spender, someone could sti= ck > in an OP_SUCCESS code, and remove all the restrictions on spending an= d > steal your funds. Oh no `1 RETURN`! Well, the advantage of chialisp here is that it enables the opcode for a *b= lock* of code, so the opcode *itself* could return arbitrary data, it is ju= st the entire block of code that is restricted to returning `0`. So it would be something like an `OP_BEGINSOFTFORK .... OP_ENDSOFTFORK` whe= re any reserved opcodes in the middle have arbitrary behavior, the entire b= lock gets a *copy* of the current stack and alt stack, with any changes to = the stack / alt stack disappear after the `OP_ENDSOFTFORK`. Thus, the entire block as a whole would act as an `OP_NOP`. (OG Bitcoin SCRIPT FTW!) I think the `softfork` form would have to be a syntax though and not a proc= edure, as I think you want `cost` to be statically determined, and very lik= ely also `version`. Regards, ZmnSCPxj