From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1983AC013A for ; Thu, 11 Feb 2021 08:21:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id F26CC6F486 for ; Thu, 11 Feb 2021 08:21:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e8A8Z8_XH12A for ; Thu, 11 Feb 2021 08:21:07 +0000 (UTC) Received: by smtp3.osuosl.org (Postfix, from userid 1001) id 353696F4F9; Thu, 11 Feb 2021 08:21:07 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mail-40132.protonmail.ch (mail-40132.protonmail.ch [185.70.40.132]) by smtp3.osuosl.org (Postfix) with ESMTPS id C52586F486 for ; Thu, 11 Feb 2021 08:21:03 +0000 (UTC) Date: Thu, 11 Feb 2021 08:20:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail; t=1613031659; bh=xGUr/7ib+GSLc6AIyLvhELrXXaBU0AD+hxACVMAMycs=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References:From; b=iL3mNIKcmAivWHz5ayYyr3TeFdJBC1zZfItEsn9leC/GaKigdQDWbgXgHtebkSpv3 /pZ3Mc/vbpw1JAOAIdH9zYjh48z6FXzT0I/ObBc0SKNp/S7yXbQjRixZ6KH2e7T67J YuaPtR5N0FY9+/2ecK5oZ2kao17ggLe0dGOwqChM= To: Luke Kenneth Casson Leighton From: ZmnSCPxj Reply-To: ZmnSCPxj Message-ID: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Bitcoin Protocol Discussion Subject: Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs X-BeenThere: bitcoin-dev@lists.linuxfoundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Bitcoin Protocol Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Feb 2021 08:21:09 -0000 Good morning Luke, > > (to be fair, there were tools to force you to improve coverage by injec= ting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` an= d if none of your tests signaled an error it would complain that your test = coverage sucked.) > > nice! It should be possible for a tool to be developed to parse a Verilog RTL des= ign, then generate a new version of it with one change. Then you could add some automation to run a set of testcases around mutated= variants of the design. For example, it could create a "wrapper" module that connects to an unmutat= ed differently-named version of the design, and various mutated versions, w= ire all their inputs together, then compare outputs. If the testcase could trigger an output of a mutated version to be differen= t from the reference version, then we would consider that mutation covered = by that testcase. Possibly that could be done with Verilog-2001 file writing code in the wrap= per module to dump out which mutations were covered, then a summary program= could just read in the generated file. Or Verilog plugins could be used as well (Icarus supports this, that is how= it implements all `$` functions). A drawback is that just because an output is different does not mean the te= stcase actually ***checks*** that output. If the testcase does not detect the diverging output it could still not be = properly covering that. The point of this is to check coverage of the tests. Not sure how well this works with formal validation. > > Synthesis in particular is a black box and each vendor keeps their part= icular implementations and tricks secret. > > sigh. =C2=A0i think that's partly because they have to insert diodes, and= buffers, and generally mess with the netlist. > > i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers= ! Well, that surprises me as well. On the other hand, smaller technologies consistently have lower raw output = current driving capability due to the smaller size, and as trace width goes= down and frequency goes up they stop acting like ideal 0-impedance traces = and start acting more like transmission lines. So I suppose at some point something like that would occur and I should not= actually be surprised. (Maybe I am more surprised that it reached that level at that technology si= ze, I would have thought 33% at 7nm.) In the modules where we were doing manual netlist+layout, we used inverting= buffers instead (slightly smaller than non-inverrting buffers, in most tec= hnologies a non-inverting buffer is just an inverter followed by an inverti= ng buffer), it was an advantage of manual design since it looks like synthe= sis tools are not willing to invert the contents of intermediate flip-lfops= even if it could give theoretical speed+size advantage to use an inverting= buffer rather than an non-inverting one (it looks like synthesis optimizat= ion starts at the output of flip-flops and ends at their input, so a manual= designer could achieve slightly better performance if they were willing to= invert an intermediate flip-flop). Another was that inverting latches were smaller in the technology we were u= sing than non-inverting latches, so it was perfectly natural for us to use = an inverting latch and an inverting buffer on those parts where we needed h= igher fan-out (t was equivalent to a "custom" latch that had higher-than-no= rmal driving capability). Scan chain test generation was impossible though, as those require flip-flo= ps, not latches. Fortunately this was "just" deserialization of high-frequency low-width dat= a with no transformation of the data (that was done after the deserializati= on, at lower clock speeds but higher data width, in pure RTL so flip-flops)= , so it was judged acceptable that it would not be covered by scan chain, s= ince scan chain is primarily for testing combinational logic between flip-f= lops. So we just had flip-flops at the input, and flip-flops at the output, and f= orced all latches to pass-through mode, during scan mode. We just needed to have enough coverage to uncover stuck-at faults (which wa= s still a pain, since additional test vectors slow down manufacturing so we= had to reduce the test vectors to the minimum possible) in non-scan-momde = testing. Man, making ASICs was tough. > > plus, they make an awful lot of money, it is good business. > > > Pointing some funding at the open-source Icarus Verilog might also fit,= as it lost its ability to do synthesis more than a decade ago due to inabi= lity to maintain. > > ah i didn't know it could do synthesis at all! i thought it was simulatio= n only. Icarus was the only open-source synthesis tool I could find back then, and = it dropped synthesis capability fairly early due to maintenance burden (I n= ever managed to get the old version with synthesis compiled and never manag= ed actual synthesis on it, so my knowledge of it is theoretical). There is an argument that open-source software is not truly open-source unl= ess it can be compiled by open-source compilers or executed by open-source = interpreters. Similarly, I think open-source hardware RTL designs are not truly open-sour= ce if there are no open-source synthesis tools that can synthesize it to ne= tlist and then lay it out. Icarus can interpret most Veriog RTL designs, though. However, at the time I left, I had already mandated that new code should us= e `always_comb` and `always_ff` (previously I had mandated that new code sh= ould use `always @*` for combinational logic) and was encouraging my subord= inates to use loops and `generate`. Icarus did not support `always_comb` and `always_ff` at the time (though wo= rked perfectly fine with loops and `generate`). In addition, at the time, we (actually just me in that company haha) were d= abbling in object-oriented testing methodologies (which Icarus has no plans= on ever implementing, which is understandable since it is a massive increa= se in complexity, it is much much harder than the scheduling shenanigans of= `always_comb` and the "just treat it as `always`" of `always_ff`). (Particularly, you need object-oriented testbenches since SystemVerilog inc= ludes a fuzz-testing framework to randomize fields of objects according to = certain engineer-provided constraints, and then you would use those object = fields to derive the test vectors your test framework would feed into the D= UT, this was a massive increase in code coverage for a largish up-front cos= t but once you built the test framework you could just dump various constra= ints on your test specification objects, I actually caught a few bugs that = we would not have otherwise found with our previous checklist-based testing= methodology.) (Unfortunately it turned out that it required a more expensive license and = I ended up hogging the only one we had of that more expensive license (whic= h, if I remember correctly, was the same license needed for formal verifica= tion of netlist<->RTL equivalence) for this, which killed enthusiasm for th= is technique, sigh, this is another argument for getting open-source hardwa= re design tools developed; not much sense in having open-source RTL for a c= rypto device if you have to pay through the nose for a license just to synt= hesize it, never mind the manufacturing cost.) ----------------------- Another point to ponder is test modes. In mass production you **need** test modes. There will always be some number of manufacturing defects because even the = cleanest of cleanrooms *will* have a tiny amount of contaminants (what can = go wrong will go wrong). Test modes are run in manufacturing to filter out chips with failing circui= try due to contamination. Now, a typical way of implementing test modes is to have a special command = sent over, say, the "normal" serial port interface of a chip, which then en= ters various test modes to allow, say, scan chain testing. Of course, scan chain testing is done by pushing test vectors into all flip= -flops, and then the test is validated by pulsing global clock once (often = the test mode forces all flip-flops on the same clock), then pulling data f= rom all flip-flops to verify that all the circuitry works as designed. The "pulling data from all flip-flops" is of course just another way of say= ing that all mass-produced chips have a way of letting ***anyone*** exfiltr= ate data from their flip-flops via test modes. Thus, for a secure environment, you need to ensure that test modes cannot b= e entered after the device enters normal operation. For example, you might have a dedicated pad which is normally pulled-down, = but if at reset it is pulled up, the device enters test mode. If at reset the pad is pulled down, the device is in normal mode and even i= f the pad is pulled up afterwards the device will not enter test mode. This ensures that only reset data can be read from the device, without poss= ibility of exfiltration of sensitive (key material or midstate) data. The pad should also not be exposed as a package pinout except perhaps on DS= and ES packages, and the pulldown resistor has to be on-chip. As an additional precaution, we can also create a small secure memory (mayb= e 256 octet addressable would be more than enough). It is possible to exempt flip-flops from scan chain generation (usually by = explicitly instantiating flip-flops in a separate module and telling post-s= ynthesis tools to exempt the module from scan chain synthesis). This gives an extra layer of protection against test mode accessing sensiti= ve data; even if we manage to screw up test mode and it is possible to forc= e reset on the test mode circuit without resetting the rest of the design, = sensitive data is still out of the scan chain. Of course, since they are not on scan, it is possible they have undetectabl= e manufacturing defects, so you would need to use some kind of ECC, or bett= er triple-redundancy best-of-three, to protect against manufacturing defect= s on the non-scan flip-flops. Fortunately non-scan flip-flops are often a good bit smaller than scan flip= -flops, so the redundancy is not so onerous. Since the ECC / best-of-three circuit itself would need to be tested, you w= ould multiplex their inputs, in normal mode they get inputs from the non-sc= an-chain flip-flops, in test mode they get inputs from separate scan-chain = flip-flops, so that the ECC / best-of-three circuit is testable at scan mod= e. You would also need a separate test of the secure memory, this time running= in normal mode with a special test program in the CPU, just in case. Finally, you would explicitly lay them out "distributed" around the chip, s= ince manufacturing defects tend to correlate in space (they are usually fro= m dust, and dust particles can be large relative to cell size), you do not = want all three of the best-of-three to have manufacturing defects. For example, you could have a 256 x 8 non-scan-chain flip-flop module, inst= antiate three of those, and explicitly place them in corners of the digital= area, then use a best-of-three circuit to resolve the "correct" value. The test mode circuit itself could ensure that the device enters test mode = if and only if the secure memory contains all 0 data after the test mode ci= rcuit is reset. For example, the 256 x 8 non-scan-chain flip-flop module could have a large= OR circuit that ORs all the flip-flops, then outputs a single bit that is = the bitwise OR of all the flip-flop contents. Then the test mode circuit gets the `in_use` outputs fo the three secure fl= ip-flop modules, and if at reset any of them are `1` then it will refuse to= enter test mode even if the test mode pad is pulled high. This ensures that even if an attacker is somehow able to reset *only* the t= est mode circuit somehow (this is basic engineering, always assume somethin= g will go wrong), if the secure memory has any non-0 data (we presume it re= sets to 0), the device will still not enter test mode. Of course, if the secure memory itself is accessible from the CPU, then it = remains possible that a CPU program is reading from the secure area, keepin= g raw data in CPU registers, from which a test-mode might be able to extrac= t if the device is somehow forced into test mode even after normal mode. You could redesign your implementations of field multiplication and SHA mid= state computation so that they directly read from the secure memory and wri= te to the secure memory without using any flip-flops along the way, and hav= e only the cryptographic circuit have access to the secure memory. That way there is reduced possibility that intermediate flip-flops (that ar= e part of the scan chain) outside the secure memory having sensitive key ma= terial or midstate data. You would need to use a custom bus with separate read and write addresses, = and non-pipelined unbuffered access, and since you want to distribute your = secure memory physically distant, that translates to wide and long buses (i= t might be better to use 64 x 32 or 32 x 64 addressable memories, to increa= se what the cryptographic circuit has access to per clock cycle) screwing w= ith your layout, and probably having to run the secure memory + crypto circ= uit at a ***much*** slower clock domain (but more secure is a good tradeoff= for slowness). Of course, that is a major design headache (the crypto circuit has to act m= ostly as a reduced-functionality processor), so you might just want to have= the CPU directly access the secure memory and in early boot poke a `0x01` = in some part of the memory, in the hope that the `in_use` flag in the previ= ous paragraph is enough to suppress test modes from exfiltrating CPU regist= ers. Do note that with enough power-cycles and ESD noise you can put digital cir= cuitry into really weird and unexpected states (seen it happen, though fair= ly hard to replicate, we had an ESD gun you could point at a chip to make i= t go into weird states), so being extra paranoid about test modes is import= ant. What can go wrong will go wrong! In particular with "`TESTMODE_PAD` is only checked at reset" you would have= to store `TESTMODE` in a non-scan flip-flop, and with enough targeted ESD = that flip-flop can be jostled, setting `TESTMODE` even after normal operati= on. You might instead want to use, say, a byte pattern instead of a single bit = to represent `TESTMODE`, so the `TESTMODE` register has to have a specific = value such as `0xA5`, so that targeted ESD has to be very lucky in order to= force your device into test mode. For example, since you need to check the `TESTMODE` pad at reset anyway, yo= u could do something like this: input CLK, RESET_N, TESTMODE_PAD, IN_USE0, IN_USE1, IN_USE2; output reg TESTMODE; wire in_use =3D IN_USE0 || IN_USE1 || IN_USE2; reg [7:0] testmode_ff; wire [7:0] next_testmode_ff =3D (testmode_ff =3D=3D 8'hA5 || testmode_ff =3D=3D 8'h00) ? (TESTMODE_PAD && !in_use) ? 8'hA5 : /*otherwise*/ 8'h5A : /*otherwise*/ testmode_ff ; always_ff @(posedge CLK, negedge RESET_N) begin if (!RESET_N) testmode_ff <=3D 0x00; else testmode_ff <=3D next_testmode_ff; end wire next_TESTMODE =3D (testmode_ff =3D=3D 8'hA5); always_ff @(posedge CLK, negedge RESET_N) begin if (!RESET_N) TESTMODE <=3D 1'b0; else TESTMODE <=3D next_TESTMODE; end Do note that the `TESTMODE` is a flip-flop, since you do ***not*** want gli= tches on the `TESTMODE` signal line, it would be horribly unsafe to output = it from combinational circuitry directly, please do not do that. Of course that flip-flop can instead be the target of ESD gunnery, but sinc= e you need many clock pulses to read the scan chain, it should with good pr= obability also get set to `0` on the next clock pulse and leave test mode (= and probably crash the device as well until full reset, but this "fails saf= e" since at least sensitive data cannot be extracted). `TESTMODE` has no feedback, thus cannot be stuck in a state loop. `testmode_ff` *can* be stuck in a state loop, but that is deliberate, as it= would "fail safe" if it gets a value other than `0xA5`, it would not enter= test mode (and if it enters `0xA5` it can easily leave test mode by either= `TESTMODE_PAD` or `in_use`). (Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeate= dly, but this risks also flipping other scan flip-flops that contain the da= ta that is being extracted, so this might be sufficient protection in pract= ice.) If you are really going to open-source the hardware design then the layout = is also open and attackers can probably target specific chip area for ESD p= ulse to try a flip-flop upset, so you need to be extra careful. Note as well that even closed-source "secure" elements can be reverse-engin= eered (I used to do this in the IC design job as a junior engineer, it was = the sort of shitty brain-numbing work forced on new hires), so security-by-= obscurity does have a limit as well, it should be possible to try to figure= out the testmode circuitry on "secure" elements and try to get targeted ES= D upsets at flip-flops on the testmode circuit. Test mode design is something of an arcane art, especially if you are tryin= g to build a security device, on the one hand you need to ensure you delive= r devices without manufacturing defects, on the other hand you need to ensu= re that the test mode is not entered inadvertently by strange conditions. In general, because test modes are such a pain to deal with securely, and a= re an absolute necessity for mass production, you should assume that any "s= ecure" chip can be broken by physical access and shooting short-range ESD p= ulses at it to try to get it into some test mode, unless it is openly desig= ned to prevent test mode from persisting after entering normal mode, as abo= ve. (No idea how that ESD gun thing worked or what it was formally called, we j= ust called it the ESD gun, it was an amusing toy, you point it at the DUT a= nd pull the trigger and suddenly it would switch modes, this of course was = a bad thing since you want to make sure that as much as possible such upset= s do not cause the chip to enter an irrecoverable mode but an amusing thing= to do still, we even had small amounts of flash memory containing register= settings that we would load into the settings registers periodically at th= e end of each display frame to protect against this kind of ESD gun thing s= ince the flip-flops backing the settings registers were vulnerable to it an= d we needed a way to preserve the settings of the customer for the IC, the = expected effect would be to cause the display to flicker.) Regards, ZmnSCPxj