From: Brooks Boyd <boydb@midnightdesign.ws>
To: bitcoin-development@lists.sourceforge.net
Subject: Re: [Bitcoin-development] BIP39 word list
Date: Fri, 1 Nov 2013 15:14:44 -0500 [thread overview]
Message-ID: <CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3722 bytes --]
I was inspired to join the mailing list to comment on some of these
discussions about BIP39, which I think will have great use in the Bitcoin
community and outside it as a way to transcribe binary data.
The one thought I had as the discussions about similar characters are
resulting in culling words from the list, is that it only helps to validate
input, not help the user if it is incorrect.
For example, if both "cat" and "eat" were in the word list, and someone
wrote down "eat", but later mis-translated it and put "cat" back into
translator, the result would be a checksum error; "cat" is a different
number, so the checksum would fail.
As it currently stands, "cat" would not be a valid word ("eat" is the real
word, and no other number is "cat"), so the translator can throw a
different error which is more helpful (i.e. "'cat' isn't a valid word
choice), but still doesn't get the user to the proper translation.
What about if the wordlist included those "words that are so similar to
each other that we only kept one of them" and had them all refer to the
same number? I propose the wordlist have the possibility of multiple words
on a single line, with the first word on the line being the "primary" or
"real" word to be used, with the other similar words be included so that a
translation program if it wanted to assist the user could fix their input
for them (verbosely or not), along the lines of "'cat' isn't a valid word
choice; assuming you meant 'eat', which is valid". You might still hit a
checksum error if that similar word is still the wrong word, but as it
stands now, I know you culled a bunch of words from the wordlist as "too
similar", but if I want to try and help the user fix a bad input, I need to
write a translation program with a full english dictionary alongside the
BIP39 dictionary.
I'd be willing to create a pull request for such an update, but before I
delve into that, does this sound like a good idea? I could see it devolving
into a slippery slope if every number in the 2048 set had a dozen word
variations (misspellings, similar words, slang terms for the real word,
etc.) which could get confusing of how similar is similar enough to be
added as an alternate, and the standard would need to be clear that when
translating binary to words, you only use the "main" word for that row, not
any of the variations.
MidnightLightning
> I've just pushed updated wordlist which is filtered to similar characters
taken from this matrix.
> BIP39 now consider following character pairs as similar:
> similar = (
> ('a', 'c'), ('a', 'e'), ('a', 'o'),
> ('b', 'd'), ('b', 'h'), ('b', 'p'), ('b', 'q'), ('b', 'r'),
> ('c', 'e'), ('c', 'g'), ('c', 'n'), ('c', 'o'), ('c', 'q'),
('c', 'u'),
> ('d', 'g'), ('d', 'h'), ('d', 'o'), ('d', 'p'), ('d', 'q'),
> ('e', 'f'), ('e', 'o'),
> ('f', 'i'), ('f', 'j'), ('f', 'l'), ('f', 'p'), ('f', 't'),
> ('g', 'j'), ('g', 'o'), ('g', 'p'), ('g', 'q'), ('g', 'y'),
> ('h', 'k'), ('h', 'l'), ('h', 'm'), ('h', 'n'), ('h', 'r'),
> ('i', 'j'), ('i', 'l'), ('i', 't'), ('i', 'y'),
> ('j', 'l'), ('j', 'p'), ('j', 'q'), ('j', 'y'),
> ('k', 'x'),
> ('l', 't'),
> ('m', 'n'), ('m', 'w'),
> ('n', 'u'), ('n', 'z'),
> ('o', 'p'), ('o', 'q'), ('o', 'u'), ('o', 'v'),
> ('p', 'q'), ('p', 'r'),
> ('q', 'y'),
> ('s', 'z'),
> ('u', 'v'), ('u', 'w'), ('u', 'y'),
> ('v', 'w'), ('v', 'y')
> )
> Feel free to review and comment current wordlist, but I think we're
slowly moving forward final list.
> slush
[-- Attachment #2: Type: text/html, Size: 5176 bytes --]
next reply other threads:[~2013-11-01 23:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-01 20:14 Brooks Boyd [this message]
2013-11-01 23:41 ` [Bitcoin-development] BIP39 word list Allen Piscitello
2013-11-02 0:04 ` slush
2013-11-02 4:31 ` Brooks Boyd
-- strict thread matches above, loose matches on Subject: below --
2013-10-18 23:52 jan
2013-10-18 23:58 ` Gregory Maxwell
2013-10-19 10:11 ` Pavol Rusnak
2013-10-24 13:26 ` slush
2013-10-23 0:56 ` slush
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANg-TZC2NHfGR3mfm4VuuZMbwxkJzP69OmWhLvOD2Zq8GWejnw@mail.gmail.com \
--to=boydb@midnightdesign.ws \
--cc=bitcoin-development@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox