From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sog-mx-2.v43.ch3.sourceforge.com ([172.29.43.192] helo=mx.sourceforge.net) by sfs-ml-1.v29.ch3.sourceforge.com with esmtp (Exim 4.76) (envelope-from ) id 1X71SU-0000AG-H1 for bitcoin-development@lists.sourceforge.net; Tue, 15 Jul 2014 12:03:46 +0000 Received-SPF: pass (sog-mx-2.v43.ch3.sourceforge.com: domain of gmail.com designates 209.85.219.52 as permitted sender) client-ip=209.85.219.52; envelope-from=mh.in.england@gmail.com; helo=mail-oa0-f52.google.com; Received: from mail-oa0-f52.google.com ([209.85.219.52]) by sog-mx-2.v43.ch3.sourceforge.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.76) id 1X71SS-0007kl-GX for bitcoin-development@lists.sourceforge.net; Tue, 15 Jul 2014 12:03:46 +0000 Received: by mail-oa0-f52.google.com with SMTP id o6so3774200oag.39 for ; Tue, 15 Jul 2014 05:03:37 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.52.5 with SMTP id p5mr10859126oeo.55.1405425817027; Tue, 15 Jul 2014 05:03:37 -0700 (PDT) Sender: mh.in.england@gmail.com Received: by 10.76.35.234 with HTTP; Tue, 15 Jul 2014 05:03:36 -0700 (PDT) Date: Tue, 15 Jul 2014 14:03:36 +0200 X-Google-Sender-Auth: RCm5k5l2rMqvmKbZ1RwhxH4zskc Message-ID: From: Mike Hearn To: Bitcoin Dev , Aaron Voisine Content-Type: multipart/alternative; boundary=001a11332c8e9508bd04fe3a328d X-Spam-Score: -0.5 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -1.5 SPF_CHECK_PASS SPF reports sender host as permitted sender for sender-domain 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (mh.in.england[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.0 HTML_MESSAGE BODY: HTML included in message 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-Headers-End: 1X71SS-0007kl-GX Cc: Andreas Schildbach Subject: [Bitcoin-development] BIP 38 NFC normalisation issue X-BeenThere: bitcoin-development@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jul 2014 12:03:46 -0000 --001a11332c8e9508bd04fe3a328d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable [+cc aaron] We recently added an implementation of BIP 38 (password protected private keys) to bitcoinj. It came to my attention that the third test vector may be broken. It gives a hex version of what the NFC normalised version of the input string should be, but this does not match the results of the Java unicode normaliser, and in fact I can't even get Python to print the names of the characters past the embedded null. I'm curious where this normalised version came from. Given that "pile of poo" is not a character I think any sane user would put into a passphrase, I question the value of this test vector. NFC form is intended to collapse things like umlaut control characters onto their prior code point, but here we're feeding the algorithm what is basically garbage so I'm not totally surprised that different implementations appear to disagree on the outcome. Proposed action: we remove this test vector as it does not represent any real world usage of the spec, or if we desperately need to verify NFC normalisation I suggest using a different, more realistic test string, like Z=C3=BCrich, or something written in Thai. Test 3: - Passphrase =CF=92=CC=81=E2=90=80=F0=90=90=80=F0=9F=92=A9 (\u03D2\u0301= \u0000\U00010400\U0001F4A9; GREEK UPSILON WITH HOOK , COMBINING ACUTE ACCENT , NULL , DES= ERET CAPITAL LETTER LONG I , PILE OF POO ) - Encrypted key: 6PRW5o9FLp4gJDDVqJQKJFTpMvdsSGJxMYHtHaQBF3ooa8mwD69bapcDQn - Bitcoin Address: 16ktGzmfrurhbhi6JGqsMWf7TyqK9HNAeF - Unencrypted private key (WIF): 5Jajm8eQ22H3pGWLEVCXyvND8dQZhiQhoLJNKjYXk9roUFTMSZ4 - *Note:* The non-standard UTF-8 characters in this passphrase should be NFC normalized to result in a passphrase of0xcf9300f0909080f09f92a9 befo= re further processing --001a11332c8e9508bd04fe3a328d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
[+cc aaron]

We recently added an implem= entation of BIP 38 (password protected private keys) to bitcoinj. It came t= o my attention that the third test vector may be broken. It gives a hex ver= sion of what the NFC normalised version of the input string should be, but = this does not match the results of the Java unicode normaliser, and in fact= I can't even get Python to print the names of the characters past the = embedded null. I'm curious where this normalised version came from.
Given that "pile of poo" is not a character I thin= k any sane user would put into a passphrase, I question the value of this t= est vector. NFC form is intended to collapse things like umlaut control cha= racters onto their prior code point, but here we're feeding the algorit= hm what is basically garbage so I'm not totally surprised that differen= t implementations appear to disagree on the outcome.

Proposed action: we remove this test vector as it does = not represent any real world usage of the spec, or if we desperately need t= o verify NFC normalisation I suggest using a different, more realistic test= string, like Z=C3=BCrich, or something written in Thai.



Test 3:
  • Passphrase =CF=92=CC=81=E2=90=80=F0=90=90=80=F0=9F=92=A9 (\u03D2= \u0301\u0000\U00010400\U0001F4A9;=C2=A0GREEK UPSILON WITH HOOK,=C2=A0COMBINING ACUTE ACCENT,= =C2=A0NULL,= =C2=A0DESERET = CAPITAL LETTER LONG I,=C2=A0PILE OF POO)
  • Encrypted key: 6PRW5o9FLp4gJDDVqJQKJFTpMvdsSGJxMYHtHaQBF3ooa8mwD69bapcD= Qn
  • Bitcoin Address: 16ktGzmfrurhbhi6JGqsMWf7TyqK9HNAeF
  • Unen= crypted private key (WIF): 5Jajm8eQ22H3pGWLEVCXyvND8dQZhiQhoLJNKjYXk9roUFTM= SZ4
  • Note:=C2=A0The non-standard UTF-8 characters in this passphrase = should be NFC normalized to result in a passphrase of0xcf9300f090908= 0f09f92a9=C2=A0before further processing


<= /div>
--001a11332c8e9508bd04fe3a328d--