From: Sjors Provoost <sjors@sprovoost.nl>
To: nullius <nullius@nym.zone>,
Bitcoin Protocol Discussion
<bitcoin-dev@lists.linuxfoundation.org>
Cc: arachnid@notdot.net
Subject: Re: [bitcoin-dev] BIP 39: Add language identifier strings for wordlists
Date: Fri, 5 Jan 2018 17:04:10 +0100 [thread overview]
Message-ID: <BB3FA46E-AA09-4A60-9D0F-8E350015E107@sprovoost.nl> (raw)
In-Reply-To: <57f5fcd8644c6f6472cd6a91144a6152@nym.zone>
[-- Attachment #1: Type: text/plain, Size: 5217 bytes --]
I’m not a fan of language specific word lists within the current BIP-39 standard. Very few wallets support anything other than English, which can lead to vendor lock-in and long term loss of funds if a rare non-English wallet disappears.
However, because people can memorize things better in their native tongue, supporting multiple languages seems quite useful.
I would prefer a new standard where words are mapped to integers rather than to a literal string. For each language a mapping from words to integers would be published. In addition to that, there would be a mapping from original language words to matching (in terms of integer value, not meaning) English words that people can print on an A4 paper. This would allow them to enter a mnemonic into e.g. a hardware wallet that only support English. Such lists are more likely to be around 100 years from now than some ancient piece of software.
This would not work with the current BIP-39 (duress) password, but this feature could be replaced by appending words (with or without a checksum for that addition).
A replacement for BIP-39 would be a good opportunity to produce a better English dictionary as Nic Johnson suggested a while ago:
• all words are 4-8 characters
• all 4-character prefixes are unique (very useful for hardware wallets)
• no two words have edit distance < 2
Wallets need to be able to distinguish between the old and new standard, so un-upgraded BIP 39 wallets should consider all new mnemonics invalid. At the same time, some new wallets may not wish to support BIP39. They shouldn't be burdened with storing the old word list.
A solution is to sort the new word list such that reused words appear first. When generating a mnemonic, at least one word unique to the new list must be present. A wallet only needs to know the index of the last BIP39 overlapping word. They reject a proposed mnemonic if none of the elements use a word with a higher index.
For my above point and some related ideas, see: https://github.com/satoshilabs/slips/issues/103
Sjors
> Op 5 jan. 2018, om 14:58 heeft nullius via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> het volgende geschreven:
>
> I propose and request as an enhancement that the BIP 39 wordlist set should specify canonical native language strings to identify each wordlist, as well as short ASCII language codes. At present, the languages are identified only by their names in English.
>
> Strings properly vetted and recommended by native speakers should facilitate language identification in user interface options or menus. Specification of language identifier strings would also promote interface consistency between implementations; this may be important if a user creates a mnemonic in Implementation A, then restores a wallet using that mnemonic in Implementation B.
>
> As an independent implementer who does not know *all* these different languages, I monkey-pasted language-native strings from a popular wiki site. I cannot guarantee that they be all accurate, sensible, or even non-embarrassing.
>
> https://github.com/nym-zone/easyseed/blob/1a6e48bbdac9366d9d5d1912dc062dfc3f0db2c6/easyseed.c#L99
> ```
> LANG(english, u8"English", "en", ascii_space ),
> LANG(chinese_simplified, u8"汉语", "zh-CN",ascii_space ),
> LANG(chinese_traditional, u8"漢語", "zh-TW",ascii_space ),
> LANG(french, u8"Français", "fr", ascii_space ),
> LANG(italian, u8"Italiano", "it", ascii_space ),
> LANG(japanese, u8"日本語", "ja", u8"\u3000" ),
> LANG(korean, u8"한국어", "ko", ascii_space ),
> LANG(spanish, u8"Español", "es", ascii_space )
> ```
>
> Per the comment at #L85 of the quoted file, I also know that for my short identifiers for Chinese, “zh-CN” and “zh-TW”, are imprecise at best—insofar as Hong Kong uses Traditional; and overseas Chinese may use either. For differentiating the two Chinese writing variants, are there any appropriate standardized or customary short ASCII language IDs similar to ISO 3166-1 alpha-2 which are purely linguistic, and not fit to present-day political boundaries?
>
> My general suggestion is that the specification of appropriate strings in bitcoin:bips/bip-0039/bip-0039-wordlists.md be made part of the process for accepting new wordlists. My specific request is that such strings be ascertained for the wordlists already existing, preferably from the persons involved in the original pull requests therefor.
>
> Should this proposal be “concept ACKed” by appropriate parties, then I may open a pull request suggesting an appropriate format for specifying this information in the repository. However, I will must needs leave the vetting of appropriate strings to native speakers or experts in the respective languages.
>
> Prior references: The wordlist additions at PRs #92, #130 (Japanese); #100 (Spanish); #114 (Chinese, both variants); #152 (French); #306 (Italian); #570 (Korean); #621 (Indonesian, *proposed*, open).
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2018-01-05 16:04 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-05 13:58 [bitcoin-dev] BIP 39: Add language identifier strings for wordlists nullius
2018-01-05 16:04 ` Sjors Provoost [this message]
[not found] ` <CALPhJax=53dLL9+JDKJC7NdEFFRB2kgKiECSh8PUMzrr2KxWuQ@mail.gmail.com>
2018-01-05 17:13 ` Sjors Provoost
2018-01-05 18:08 ` Aymeric Vitte
[not found] ` <CALPhJaxzayykMMxaa421kfu6QQ77JD7bZJk8+dXT4qSqK_eABg@mail.gmail.com>
2018-01-05 19:56 ` Aymeric Vitte
[not found] ` <CALPhJawP7hjucR6X3gpTxCxK+awMT9iArELZYFy_zffCGgVMEw@mail.gmail.com>
[not found] ` <58C8F1BA-B9A1-4525-BCC9-BF4CEDC87E1B@sprovoost.nl>
[not found] ` <a3e10fe7-ed9c-bb58-bf12-d0aeda2827e4@gmail.com>
[not found] ` <a2e8b3e2-b444-039c-c51e-43294a3437c9@gmail.com>
[not found] ` <CALPhJaz1wU8y6KxZipREjus8WbHpwpyYjyMwgj5x-tTodxpjCQ@mail.gmail.com>
2018-01-06 17:40 ` Aymeric Vitte
[not found] ` <CALPhJaw8_wpPCRj58JcZqLnEvOtLoo=U_VBYRLSKTCeN7TFB6A@mail.gmail.com>
2018-01-06 19:46 ` Aymeric Vitte
2018-01-05 18:08 ` nullius
2018-01-07 15:16 ` Pavol Rusnak
2018-01-08 7:35 ` 木ノ下じょな
2018-01-08 11:13 ` nullius
2018-01-08 14:34 ` Greg Sanders
2018-01-08 14:52 ` Matias Alejo Garcia
2018-01-08 14:54 ` Greg Sanders
2018-01-08 15:23 ` Matias Alejo Garcia
2018-01-08 15:26 ` AJ West
2018-01-08 15:32 ` Greg Sanders
2018-01-08 16:02 ` Aymeric Vitte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BB3FA46E-AA09-4A60-9D0F-8E350015E107@sprovoost.nl \
--to=sjors@sprovoost.nl \
--cc=arachnid@notdot.net \
--cc=bitcoin-dev@lists.linuxfoundation.org \
--cc=nullius@nym.zone \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox