(hi folks do cc me, i am subscribed digest, thank you for doing that, ZmnSCPxj)
On Wednesday, February 3, 2021, ZmnSCPxj <ZmnSCPxj@protonmail.com> wrote:
> Good morning Luke,
>
> I happen to have experience designing digital ASICs, mostly pipelined data processing.
> However my experience is limited to larger geometries and in SystemVerilog.
larger geometries for a hardware wallet ASIC is ok (as long as it is not retail based and trying to run e.g. RSA, taking so long to complete that the retail customer walks out)
> On the technical side, as I understand it (I have been out of that industry for 4 years now, so my knowledge may be obsolete)
not at all! still very valuable
> as you approach lower geometries, you also start approaching analog design.
yyeah i could intuitively tell/guess there might be something like this which would throw a spanner in the works, it is why the grant request i put in specifically excluded data-dependent constant time analysis and also power analysis.
> In our case we were already manually laying out gates and flip-flops (or replacing flip-flops with level-triggered latches and being extra careful with clocks) to squeeze performance (and area) ...
ya-howw :)
> Many of the formal correctness proofs were really about the formal equivalence of the netlist to the RTL; the correctness of the RTL was "proved" by simulation testing.
thanks to Symbiyosys we are using formal proofs much more extensively, as effectively a 100% coverage replacement for unit tests.
an example is popcount. we did two versions. one is a recursive tree algorithm, almost impossible to read and understand what the hell it does.
the other is a total braindead 1-liner "x = x + input[i]", rubbish performance though.
running a formal proof on these gave us 100% confidence that the complex optimised version does the damn job.
yes we still do unit tests, these are more "demo code".
now, the caveat is that you have to have a model of the "dut" (device under test) against which to compare, and if the dut is ridiculously complex then the formal model variant, which has to do the same job, ends up equally as complex (or effectively a duplicate of the dut) and the exercise is a bit of a waste of time...
...*unless*... there happens to be other implementations out there. then the proof can be run against those and everybody wins through collaboration.
now, here's why i put in the NLnet Grant request to explore going back to the mathematics of crypto-primitives.
many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because that does S-Boxes for Rijndael. they have gone mad by analysing algorithms trying to fit them to standard ISAs.
nobody does Rijndael S-Boxes any way other than 256-entry lookup tables because no standard ISA has general-purpose Galois Field Multiply.
consequently implementations in assembler get completely divorced from the original mathematics on which the cryptographic algorithm was based.
the approach i would like to take is, "hang on a minute: how far would you get if you actually added *general-purpose* instructions that *directly* provided the underlying mathematical principles, and then wrapped a Vector-Matrix Engine around them?".
would this drastically simplify algorithms to the point where *READABLE* c code compiles directly to opcodes that run screamingly fast, outperforming hand-optimised SIMD code using standard ISAs?
then, given the Formal Correctness approach above, can we verify that the mathematically-related opcodes do the job?
> (to be fair, there were tools to force you to improve coverage by injecting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` and if none of your tests signaled an error it would complain that your test coverage sucked.)
nice!
> Things might have changed.
nah. this is such a complex area, run by few incumbent players, that innovation is rare. not least, innovation is different and cannot be trusted by the Foundries!
> A good RTL would embed SystemVerilog Assertions or PSL Assertions as well.
> Some formal verification tools can understand a subset of SystemVerilog Assertions / PSL assertions and validate that your RTL conformed to the assertions, which would probably help cut down on the need for RTL simulation.
interesting.
> Overall, my understanding is that smaller geometries are needed only if you want to target a really high performance / unit cost and performance / energy consumption ratios.
> That is, you would target smaller geometries for mining.
yes.
> If you need a secure tr\*sted computing module that does not need to be fast or cheap, just very accurate to the required specification, the larger geometries should be fine and you would be able to live almost entirely in RTL-land without diving into netlist and layout specifications.
hardware wallet ASICs.
i concur.
> A wrinkle here is that licenses for tools from tr\*sted vendors like Synopsys or Cadence are ***expensive***.
yes they are :) we are currently working with Sorbonne University LIP6.fr and Staf Verhaegen from Chips4Makers, trying a different approach: coriolis2.
this will do fine up to 130nm (skywater). beyond that, mmm, we need a few more years.
> What is more, you should really buy two sets of licenses, e.g. do logic synthesis with Synopsys and then formal verification with Cadence, because you do not want to fully tr\*st just one vendor.
interesting, good advice.
> Synthesis in particular is a black box and each vendor keeps their particular implementations and tricks secret.
sigh. i think that's partly because they have to insert diodes, and buffers, and generally mess with the netlist.
i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers!
plus, they make an awful lot of money, it is good business.
> Pointing some funding at the open-source Icarus Verilog might also fit, as it lost its ability to do synthesis more than a decade ago due to inability to maintain.
ah i didn't know it could do synthesis at all! i thought it was simulation only.
> Note as well that I heard (at the time when I was in the industry) that some foundries will not even accept a netlist unless it was created by a synthesis tool from one of the major vendors (Synopsys, Cadence, Mentor Graphics, maybe more I have forgotten since).
yes i heard this too, they don't want their time wasted: after all they only make money by selling wafers, and if they can't sell any they have to run empty wafers to keep the equipment at operating temperature.
if you book a slot 18 months in advance and the RTL doesn't work during testing 3 months before the deadline they may not be able to find someone else in time.
anything to reduce the risk there is good, so i totally get why.
thank you for the insights and the discussion, really appreciated.
l.
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68