From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192] helo=mx.sourceforge.net) by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1WMMw0-0003Gu-03 for bitcoin-development@lists.sourceforge.net; Sat, 08 Mar 2014 19:29:24 +0000 Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.213.180 as permitted sender) client-ip=209.85.213.180; envelope-from=gustav.simonsson@gmail.com; helo=mail-ig0-f180.google.com; Received: from mail-ig0-f180.google.com ([209.85.213.180]) by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1WMMvw-0003tX-VL for bitcoin-development@lists.sourceforge.net; Sat, 08 Mar 2014 19:29:23 +0000 Received: by mail-ig0-f180.google.com with SMTP id hl1so4890119igb.1 for ; Sat, 08 Mar 2014 11:29:15 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.50.66.129 with SMTP id f1mr9791767igt.26.1394306955603; Sat, 08 Mar 2014 11:29:15 -0800 (PST) Received: by 10.64.32.10 with HTTP; Sat, 8 Mar 2014 11:29:15 -0800 (PST) In-Reply-To: References: <0720C223-E9DD-4E76-AD6F-0308CA5B5289@gmail.com> <7E50E1D6-3A9F-419B-B01E-50C6DE044E0F@gmail.com> Date: Sat, 8 Mar 2014 20:29:15 +0100 Message-ID: From: Gustav Simonsson To: Mike Hearn Content-Type: multipart/alternative; boundary=047d7bdc0cf4cbf37904f41d628e X-Spam-Score: -0.6 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (gustav.simonsson[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.0 HTML_MESSAGE BODY: HTML included in message -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-Headers-End: 1WMMvw-0003tX-VL Cc: Bitcoin Development Subject: Re: [Bitcoin-development] New side channel attack that can recover Bitcoin keys X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Mar 2014 19:29:24 -0000 --047d7bdc0cf4cbf37904f41d628e Content-Type: text/plain; charset=ISO-8859-1 While there is no mention of virtualization in the side-channel article, the FLUSH+RELOAD paper [1] mentions virtualization and claims the clflush instruction works not only towards processes on the same OS, but also against processes in a separate guest OS if executed on the host OS (type 2 hypervisor) [2]. It also works if executed from within another guest OS (though that reduces the efficiency of the attack) [3]. Both the authors [4] and Vulnerability Note VU#976534 [5] claim disabling hypervisor memory page de-duplication prevents the attack. This could perhaps be a first step for bitcoin companies running their software on shared hosts; demand their allocated instances to be on hosts with this disabled. Question is how common it is for virtualization providers to offer that as an option. TRESOR is is only applicable if running in a non-virtualized OS [6]. While TRESOR only implements AES, it seems it could work for ECDSA as well, as they use the four x86 debug registers to fit a 256 bit privkey [7] for the entire machine uptime, and then use other registers when doing the actual AES ops. They use the Intel AES-NI instruction set though, and since there is no corresponding instruction set for EC extra work would be required to manually implement EC math in assembler. They actually do what Mike Hearn mentioned and disable preemption in Linux (their code runs in kernel space; they patched the kernel) to ensure atomicity. Not only do they manage to protect against memory attacks (and RAM/cache timing attacks) from other processes running on the same host, but even from root on the same host (from userland, the debug registers are only accessible through ptrace, which they patched, and they also disabled LKM & KMEM). One could imagine different levels of TRESOR-like ECDSA with different tradeoffs of complexity vs security. For example, if one is fine with keeping the privkey(s) in RAM but want to avoid cache timing attacks, the signing could be implemented as a userspace program holding key(s) in RAM together with a kernel module providing a syscall for signing. Signing is then run with preemption using only x86 registers for intermediate data and then using e.g. movntps [8] to write to RAM without data being cached. The benefit of this compared with the full TRESOR approach is that it would not require a patched kernel, only a kernel module. It would also be simpler to implement compared to keeping the privkey in the debug registers for the entire machine uptime, especially if multiple privkeys are used. It would not protect against root though, since an adversary getting root could load their own kernel module and read the registers. To handle multiple keys (maybe as one-time-use) and get full TRESOR benefits, one could perhaps (with the original TRESOR approach, i.e. with patched kernel) store a BIP 0032 starting string / seed + counter in the debug registers and have the atomic kernel code generate new keys and do the signing. Cheers, Gustav Simonsson 1. http://eprint.iacr.org/2013/448.pdf 2. Page 1 of [1] 3. page 5 of [1] 4. page 8 (end of conclusions section) of [1] 5. http://www.kb.cert.org/vuls/id/976534 6. page 8, "3.2 Hardware compatibility", https://www.usenix.org/legacy/event/sec11/tech/full_papers/Muller.pdf 7. page 3, "2.2 Key Management" of [6] 8. page 1041 of http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf On Thu, Mar 6, 2014 at 9:38 AM, Mike Hearn wrote: > I'm wondering about whether (don't laugh) moving signing into the kernel > and then using the MTRRs to disable caching entirely for a small scratch > region of memory would also work. You could then disable pre-emption and > prevent anything on the same core from interrupting or timing the signing > operation. > > However I suspect just making a hardened secp256k1 signer implementation > in userspace would be of similar difficulty, in which case it would > naturally be preferable. > > > On Wed, Mar 5, 2014 at 11:25 PM, Gregory Maxwell wrote: > >> On Wed, Mar 5, 2014 at 2:14 PM, Eric Lombrozo >> wrote: >> > Everything you say is true. >> > >> > However, branchless does reduce the attack surface considerably - if >> nothing else, it significantly ups the difficulty of an attack for a >> relatively low cost in program complexity, and that might still make it >> worth doing. >> >> Absolutely. I believe these things are worth doing. >> >> My comment on it being insufficient was only that "my signer is >> branchless" doesn't make other defense measures (avoiding reuse, >> multsig with multiple devices, not sharing hardware, etc.) >> unimportant. >> >> > As for uniform memory access, if we avoided any kind of heap >> allocation, wouldn't we avoid such issues? >> >> No. At a minimum to hide a memory timing side-channel you must perform >> no data dependent loads (e.g. no operation where an offset into memory >> is calculated). A strategy for this is to always load the same values, >> but then mask out the ones you didn't intend to read... even that I'd >> worry about on sufficiently advanced hardware, since I would very much >> not be surprised if the processor was able to determine that the load >> had no effect and eliminate it! :) ) >> >> Maybe in practice if your data dependencies end up only picking around >> in the same cache-line it doesn't actually matter... but it's hard to >> be sure, and unclear when a future optimization in the rest of the >> system might leave it exposed again. >> >> (In particular, you can't generally write timing sign-channel immune >> code in C (or other high level language) because the compiler is >> freely permitted to optimize things in a way that break the property. >> ... It may be _unlikely_ for it to do this, but its permitted-- and >> will actually do so in some cases--, so you cannot be completely sure >> unless you check and freeze the toolchain) >> >> > Anyhow, without having gone into the full details of this particular >> attack, it seems the main attack point is differences in how squaring and >> multiplication (in the case of field exponentiation) or doubling and point >> addition (in the case of ECDSA) are performed. I believe using a branchless >> implementation where each phase of the operation executes the exact same >> code and accesses the exact same stack frames would not be vulnerable to >> FLUSH+RELOAD. >> >> I wouldn't be surprised. >> >> >> ------------------------------------------------------------------------------ >> Subversion Kills Productivity. Get off Subversion & Make the Move to >> Perforce. >> With Perforce, you get hassle-free workflows. Merge that actually works. >> Faster operations. Version large binaries. Built-in WAN optimization and >> the >> freedom to use Git, Perforce or both. Make the move to Perforce. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk >> _______________________________________________ >> Bitcoin-development mailing list >> Bitcoin-development@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/bitcoin-development >> > > > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and > the > freedom to use Git, Perforce or both. Make the move to Perforce. > > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Bitcoin-development mailing list > Bitcoin-development@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bitcoin-development > > --047d7bdc0cf4cbf37904f41d628e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
While there is no mention of virtualization in the side-ch= annel article, the FLUSH+RELOAD paper [1] mentions virtualization and claim= s the clflush instruction works not only towards processes on the same OS, = but also against processes in a separate guest OS if executed on the host O= S (type 2 hypervisor) [2]. It also works if executed from within another gu= est OS (though that reduces the efficiency of the attack) [3].

Both the authors [4] and Vulnerability Note VU#976534 [5] claim disabli= ng hypervisor memory page de-duplication prevents the attack. This could pe= rhaps be a first step for bitcoin companies running their software on share= d hosts; demand their allocated instances to be on hosts with this disabled= . Question is how common it is for virtualization providers to offer that a= s an option.

TRESOR is is only applicable if running in a non-virtualized OS [6].
While TRESOR only implements AES, it seems it could work for ECDSA as = well, as they use the four x86 debug registers to fit a 256 bit privkey [7]= for the entire machine uptime, and then use other registers when doing the= actual AES ops. They use the Intel AES-NI instruction set though, and sinc= e there is no corresponding instruction set for EC extra work would be requ= ired to manually implement EC math in assembler.

They actually do what Mike Hearn mentioned and disable preemption in Li= nux (their code runs in kernel space; they patched the kernel) to ensure at= omicity. Not only do they manage to protect against memory attacks (and RAM= /cache timing attacks) from other processes running on the same host, but e= ven from root on the same host (from userland, the debug registers are only= accessible through ptrace, which they patched, and they also disabled LKM = & KMEM).

One could imagine different levels of TRESOR-like ECDSA with different = tradeoffs of complexity vs security. For example, if one is fine with keepi= ng the privkey(s) in RAM but want to avoid cache timing attacks, the signin= g could be implemented as a userspace program holding key(s) in RAM togethe= r with a kernel module providing a syscall for signing. Signing is then run= with preemption using only x86 registers for intermediate data and then us= ing e.g. movntps [8] to write to RAM without data being cached. The benefit= of this compared with the full TRESOR approach is that it would not requir= e a patched kernel, only a kernel module. It would also be simpler to imple= ment compared to keeping the privkey in the debug registers for the entire = machine uptime, especially if multiple privkeys are used. It would not prot= ect against root though, since an adversary getting root could load their o= wn kernel module and read the registers.

To handle multiple keys (maybe as one-time-use) and get full TRESOR ben= efits, one could perhaps (with the original TRESOR approach, i.e. with patc= hed kernel) store a BIP 0032 starting string / seed + counter in the debug = registers and have the atomic kernel code generate new keys and do the sign= ing.

Cheers,
Gustav Simonsson

1. http://eprint.iacr.org/2013/448.pdf
2. Page 1 of [1]<= br>3. page 5 of [1]
4. page 8 (end of conclusions section) of [1]
5. = http://www.kb.cert.org/vu= ls/id/976534
6. page 8, "3.2 Hardware compatibility", https://www.useni= x.org/legacy/event/sec11/tech/full_papers/Muller.pdf
7. page 3, &quo= t;2.2 Key Management" of [6]
8. page 1041 of http://www.intel.com/content/dam/www/public/us/en/documents/manuals/6= 4-ia-32-architectures-software-developer-manual-325462.pdf



On = Thu, Mar 6, 2014 at 9:38 AM, Mike Hearn <mike@plan99.net> wrot= e:
I'm wondering about whe= ther (don't laugh) moving signing into the kernel and then using the MT= RRs to disable caching entirely for a small scratch region of memory would = also work. You could then disable pre-emption and prevent anything on the s= ame core from interrupting or timing the signing operation.

However I suspect just making a hardened secp256k1 signer im= plementation in userspace would be of similar difficulty, in which case it =  would naturally be preferable.


On Wed, Mar 5, 2014 at 11:25 PM, Gregory Max= well <gmaxwell@gmail.com> wrote:
On Wed, Mar 5, 2014 at 2:14 PM, Eric Lombrozo <elombrozo@gmail.com> wrote:
> Everything you say is true.
>
> However, branchless does reduce the attack surface considerably - if n= othing else, it significantly ups the difficulty of an attack for a relativ= ely low cost in program complexity, and that might still make it worth doin= g.

Absolutely. I believe these things are worth doing.

My comment on it being insufficient was only that "my signer is
branchless" doesn't make other defense measures (avoiding reuse, multsig with multiple devices, not sharing hardware, etc.)
unimportant.

> As for uniform memory access, if we avoided any kind of heap allocatio= n, wouldn't we avoid such issues?

No. At a minimum to hide a memory timing side-channel you must perfor= m
no data dependent loads (e.g. no operation where an offset into memory
is calculated). A strategy for this is to always load the same values,
but then mask out the ones you didn't intend to read... even that I'= ;d
worry about on sufficiently advanced hardware, since I would very much
not be surprised if the processor was able to determine that the load
had no effect and eliminate it! :) )

Maybe in practice if your data dependencies end up only picking around
in the same cache-line it doesn't actually matter... but it's hard = to
be sure, and unclear when a future optimization in the rest of the
system might leave it exposed again.

(In particular, you can't generally write timing sign-channel immune code in C (or other high level language) because the compiler is
freely permitted to optimize things in a way that break the property.
... It may be _unlikely_ for it to do this, but its permitted— and will actually do so in some cases—, so you cannot be completely sure<= br> unless you check and freeze the toolchain)

> Anyhow, without having gone into the full details of this particular a= ttack, it seems the main attack point is differences in how squaring and mu= ltiplication (in the case of field exponentiation) or doubling and point ad= dition (in the case of ECDSA) are performed. I believe using a branchless i= mplementation where each phase of the operation executes the exact same cod= e and accesses the exact same stack frames would not be vulnerable to FLUSH= +RELOAD.

I wouldn't be surprised.

---------------------------------------------------------------------------= ---
Subversion Kills Productivity. Get off Subversion & Make the Move to Pe= rforce.
With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries.  Built-in WAN optimization = and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gam= pad/clk?id=3D122218951&iu=3D/4140/ostg.clktrk
_______________________________________________
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-de= velopment


-----------------------------------------------------------= -------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Pe= rforce.
With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries.  Built-in WAN optimization = and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gam= pad/clk?id=3D122218951&iu=3D/4140/ostg.clktrk
__________________= _____________________________
Bitcoin-development mailing list
Bitcoin-develo= pment@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-de= velopment


--047d7bdc0cf4cbf37904f41d628e--