From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5CE92C013A for ; Wed, 3 Feb 2021 13:24:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 56C3B8710A for ; Wed, 3 Feb 2021 13:24:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EmdT4pluxN5h for ; Wed, 3 Feb 2021 13:24:38 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from lkcl.net (lkcl.net [217.147.94.29]) by hemlock.osuosl.org (Postfix) with ESMTPS id 75C8D86FAD for ; Wed, 3 Feb 2021 13:24:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lkcl.net; s=201607131; h=Content-Type:Cc:To:Subject:Message-ID:Date:From:References:In-Reply-To:MIME-Version; bh=L3Docrkp8d32XEDGlVFtrKbkuYL/aC2gDKYaa1pep20=; b=FBKppL1oitsFTHTq7I4FuSuD1iDkcOK/IwTSsUq8Fx+OHjbczmGhZcCTi0bFxqIFU0+7Ssc6cWTk0VMkUBWE1mpy1d1ETsoTLBV/WlGRsLZBO5wysHvs61HAkuM5mkgKHtFb71Q19Q2sxLenfHK1quJse+JbYXcKGWj9AtKncm4=; Received: from mail-lf1-f49.google.com ([209.85.167.49]) by lkcl.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1l7I95-0003Yt-G2 for bitcoin-dev@lists.linuxfoundation.org; Wed, 03 Feb 2021 13:24:35 +0000 Received: by mail-lf1-f49.google.com with SMTP id e15so13113352lft.13 for ; Wed, 03 Feb 2021 05:24:20 -0800 (PST) X-Gm-Message-State: AOAM530EJvrii88N5X+QMNIrq7VRp3g5X2atxxXegflftB2bYrPcG2QH NWKBp8nK09zQuVKNIJ1lYd5BlnzCLDSPcm9XjTk= X-Google-Smtp-Source: ABdhPJzBT7lnAR7ONWQgH8zXdhnCMEkGHOeU8qWqBQulyPMw/qpNHKw6mdC0vsEmJuBvTHUTeuJZX1dEGTnmlggAP0M= X-Received: by 2002:a19:4191:: with SMTP id o139mr1773310lfa.224.1612358654527; Wed, 03 Feb 2021 05:24:14 -0800 (PST) MIME-Version: 1.0 Received: by 2002:a05:6520:2f95:b029:bc:bc2b:60bb with HTTP; Wed, 3 Feb 2021 05:24:13 -0800 (PST) In-Reply-To: References: From: Luke Kenneth Casson Leighton Date: Wed, 3 Feb 2021 13:24:13 +0000 X-Gmail-Original-Message-ID: Message-ID: To: ZmnSCPxj Content-Type: multipart/alternative; boundary="000000000000da8d8405ba6e8095" X-Mailman-Approved-At: Wed, 03 Feb 2021 14:14:10 +0000 Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Feb 2021 13:24:40 -0000 --000000000000da8d8405ba6e8095 Content-Type: text/plain; charset="UTF-8" (hi folks do cc me, i am subscribed digest, thank you for doing that, ZmnSCPxj) On Wednesday, February 3, 2021, ZmnSCPxj wrote: > Good morning Luke, > > I happen to have experience designing digital ASICs, mostly pipelined data processing. > However my experience is limited to larger geometries and in SystemVerilog. larger geometries for a hardware wallet ASIC is ok (as long as it is not retail based and trying to run e.g. RSA, taking so long to complete that the retail customer walks out) > On the technical side, as I understand it (I have been out of that industry for 4 years now, so my knowledge may be obsolete) not at all! still very valuable > as you approach lower geometries, you also start approaching analog design. yyeah i could intuitively tell/guess there might be something like this which would throw a spanner in the works, it is why the grant request i put in specifically excluded data-dependent constant time analysis and also power analysis. > In our case we were already manually laying out gates and flip-flops (or replacing flip-flops with level-triggered latches and being extra careful with clocks) to squeeze performance (and area) ... ya-howw :) > Many of the formal correctness proofs were really about the formal equivalence of the netlist to the RTL; the correctness of the RTL was "proved" by simulation testing. thanks to Symbiyosys we are using formal proofs much more extensively, as effectively a 100% coverage replacement for unit tests. an example is popcount. we did two versions. one is a recursive tree algorithm, almost impossible to read and understand what the hell it does. the other is a total braindead 1-liner "x = x + input[i]", rubbish performance though. running a formal proof on these gave us 100% confidence that the complex optimised version does the damn job. yes we still do unit tests, these are more "demo code". now, the caveat is that you have to have a model of the "dut" (device under test) against which to compare, and if the dut is ridiculously complex then the formal model variant, which has to do the same job, ends up equally as complex (or effectively a duplicate of the dut) and the exercise is a bit of a waste of time... ...*unless*... there happens to be other implementations out there. then the proof can be run against those and everybody wins through collaboration. now, here's why i put in the NLnet Grant request to explore going back to the mathematics of crypto-primitives. many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because that does S-Boxes for Rijndael. they have gone mad by analysing algorithms trying to fit them to standard ISAs. nobody does Rijndael S-Boxes any way other than 256-entry lookup tables because no standard ISA has general-purpose Galois Field Multiply. consequently implementations in assembler get completely divorced from the original mathematics on which the cryptographic algorithm was based. the approach i would like to take is, "hang on a minute: how far would you get if you actually added *general-purpose* instructions that *directly* provided the underlying mathematical principles, and then wrapped a Vector-Matrix Engine around them?". would this drastically simplify algorithms to the point where *READABLE* c code compiles directly to opcodes that run screamingly fast, outperforming hand-optimised SIMD code using standard ISAs? then, given the Formal Correctness approach above, can we verify that the mathematically-related opcodes do the job? > (to be fair, there were tools to force you to improve coverage by injecting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` and if none of your tests signaled an error it would complain that your test coverage sucked.) nice! > Things might have changed. nah. this is such a complex area, run by few incumbent players, that innovation is rare. not least, innovation is different and cannot be trusted by the Foundries! > A good RTL would embed SystemVerilog Assertions or PSL Assertions as well. > Some formal verification tools can understand a subset of SystemVerilog Assertions / PSL assertions and validate that your RTL conformed to the assertions, which would probably help cut down on the need for RTL simulation. interesting. > Overall, my understanding is that smaller geometries are needed only if you want to target a really high performance / unit cost and performance / energy consumption ratios. > That is, you would target smaller geometries for mining. yes. > If you need a secure tr\*sted computing module that does not need to be fast or cheap, just very accurate to the required specification, the larger geometries should be fine and you would be able to live almost entirely in RTL-land without diving into netlist and layout specifications. hardware wallet ASICs. i concur. > A wrinkle here is that licenses for tools from tr\*sted vendors like Synopsys or Cadence are ***expensive***. yes they are :) we are currently working with Sorbonne University LIP6.fr and Staf Verhaegen from Chips4Makers, trying a different approach: coriolis2. this will do fine up to 130nm (skywater). beyond that, mmm, we need a few more years. > What is more, you should really buy two sets of licenses, e.g. do logic synthesis with Synopsys and then formal verification with Cadence, because you do not want to fully tr\*st just one vendor. interesting, good advice. > Synthesis in particular is a black box and each vendor keeps their particular implementations and tricks secret. sigh. i think that's partly because they have to insert diodes, and buffers, and generally mess with the netlist. i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers! plus, they make an awful lot of money, it is good business. > Pointing some funding at the open-source Icarus Verilog might also fit, as it lost its ability to do synthesis more than a decade ago due to inability to maintain. ah i didn't know it could do synthesis at all! i thought it was simulation only. > Note as well that I heard (at the time when I was in the industry) that some foundries will not even accept a netlist unless it was created by a synthesis tool from one of the major vendors (Synopsys, Cadence, Mentor Graphics, maybe more I have forgotten since). yes i heard this too, they don't want their time wasted: after all they only make money by selling wafers, and if they can't sell any they have to run empty wafers to keep the equipment at operating temperature. if you book a slot 18 months in advance and the RTL doesn't work during testing 3 months before the deadline they may not be able to find someone else in time. anything to reduce the risk there is good, so i totally get why. thank you for the insights and the discussion, really appreciated. l. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 --000000000000da8d8405ba6e8095 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable (hi folks do cc me, i am subscribed digest, thank you for doing that, ZmnSC= Pxj)

On Wednesday, February 3, 2021, ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote:
> Goo= d morning Luke,
>
> I happen to have experience designing digit= al ASICs, mostly pipelined data processing.
> However my experience i= s limited to larger geometries and in SystemVerilog.

larger geometri= es for a hardware wallet ASIC is ok (as long as it is not retail based and = trying to run e.g. RSA, taking so long to complete that the retail customer= walks out)

> On the technical side, as I understand it (I have b= een out of that industry for 4 years now, so my knowledge may be obsolete)<= br>
not at all! still very valuable

> as you approach lower ge= ometries, you also start approaching analog design.

yyeah i could in= tuitively tell/guess there might be something like this which would throw a= spanner in the works, it is why the grant request i put in specifically ex= cluded data-dependent constant time analysis and also power analysis.

> In our case we were already manually laying out gates and flip-f= lops (or replacing flip-flops with level-triggered latches and being extra = careful with clocks) to squeeze performance (and area) ...

ya-howw := )


> Many of the formal correctness proofs were really about t= he formal equivalence of the netlist to the RTL; the correctness of the RTL= was "proved" by simulation testing.

thanks to Symbiyosys = we are using formal proofs much more extensively, as effectively a 100% cov= erage replacement for unit tests.

an example is popcount. =C2=A0we d= id two versions. =C2=A0one is a recursive tree algorithm, almost impossible= to read and understand what the hell it does.

the other is a total = braindead 1-liner "x =3D x + input[i]", rubbish performance thoug= h.

running a formal proof on these gave us 100% confidence that the = complex optimised version does the damn job.


yes we still do uni= t tests, these are more "demo code".

now, the caveat is th= at you have to have a model of the "dut" (device under test) agai= nst which to compare, and if the dut is ridiculously complex then the forma= l model variant, which has to do the same job, ends up equally as complex (= or effectively a duplicate of the dut) and the exercise is a bit of a waste= of time...

...*unless*... there happens to be other implementations= out there. =C2=A0then the proof can be run against those and everybody win= s through collaboration.



now, here's why i put in the NL= net Grant request to explore going back to the mathematics of crypto-primit= ives.

many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because t= hat does S-Boxes for Rijndael. =C2=A0they have gone mad by analysing algori= thms trying to fit them to standard ISAs.

nobody does Rijndael S-Box= es any way other than 256-entry lookup tables because no standard ISA has g= eneral-purpose Galois Field Multiply.

consequently implementations i= n assembler get completely divorced from the original mathematics on which = the cryptographic algorithm was based.

the approach i would like to = take is, "hang on a minute: how far would you get if you actually adde= d *general-purpose* instructions that *directly* provided the underlying ma= thematical principles, and then wrapped a Vector-Matrix Engine around them?= ".

would this drastically simplify algorithms to the point wher= e *READABLE* c code compiles directly to opcodes that run screamingly fast,= outperforming hand-optimised SIMD code using standard ISAs?

then, = given the Formal Correctness approach above, can we verify that the mathema= tically-related opcodes do the job?


> (to be fair, there were= tools to force you to improve coverage by injecting faults to your RTL, e.= g. it would virtually flip an `&&` to an `||` and if none of your t= ests signaled an error it would complain that your test coverage sucked.)
nice!

> Things might have changed.

nah. =C2=A0this i= s such a complex area, run by few incumbent players, that innovation is rar= e. =C2=A0not least, innovation is different and cannot be trusted by the Fo= undries!


> A good RTL would embed SystemVerilog Assertions or= PSL Assertions as well.
> Some formal verification tools can underst= and a subset of SystemVerilog Assertions / PSL assertions and validate that= your RTL conformed to the assertions, which would probably help cut down o= n the need for RTL simulation.

interesting.

> Overall, my = understanding is that smaller geometries are needed only if you want to tar= get a really high performance / unit cost and performance / energy consumpt= ion ratios.
> That is, you would target smaller geometries for mining= .

yes.

> If you need a secure tr\*sted computing module th= at does not need to be fast or cheap, just very accurate to the required sp= ecification, the larger geometries should be fine and you would be able to = live almost entirely in RTL-land without diving into netlist and layout spe= cifications.

hardware wallet ASICs.

i concur.

> A w= rinkle here is that licenses for tools from tr\*sted vendors like Synopsys = or Cadence are ***expensive***.

yes they are :) =C2=A0we are current= ly working with Sorbonne University LIP6.fr and Staf Verhaegen from Chips4M= akers, trying a different approach: coriolis2.

this will do fine up = to 130nm (skywater). =C2=A0beyond that, mmm, we need a few more years.
<= br>> What is more, you should really buy two sets of licenses, e.g. do l= ogic synthesis with Synopsys and then formal verification with Cadence, bec= ause you do not want to fully tr\*st just one vendor.

interesting, g= ood advice.

> Synthesis in particular is a black box and each ven= dor keeps their particular implementations and tricks secret.

sigh. = =C2=A0i think that's partly because they have to insert diodes, and buf= fers, and generally mess with the netlist.

i was stunned to learn th= at in a 28nm ASIC, 50% of it is repeater-buffers!

plus, they make an= awful lot of money, it is good business.

> Pointing some funding= at the open-source Icarus Verilog might also fit, as it lost its ability t= o do synthesis more than a decade ago due to inability to maintain.

= ah i didn't know it could do synthesis at all! i thought it was simulat= ion only.

> Note as well that I heard (at the time when I was in = the industry) that some foundries will not even accept a netlist unless it = was created by a synthesis tool from one of the major vendors (Synopsys, Ca= dence, Mentor Graphics, maybe more I have forgotten since).

yes i he= ard this too, they don't want their time wasted: after all they only ma= ke money by selling wafers, and if they can't sell any they have to run= empty wafers to keep the equipment at operating temperature.

if you= book a slot 18 months in advance and the RTL doesn't work during testi= ng 3 months before the deadline they may not be able to find someone else i= n time.

anything to reduce the risk there is good, so i totally get = why.

thank you for the insights and the discussion, really appreciat= ed.

l.


--
---
crowd-funded eco-conscious hardware:= https://w= ww.crowdsupply.com/eoma68

--000000000000da8d8405ba6e8095--