From: Aneesh Karve <aneesh.karve@gmail.com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Subject: [bitcoindev] BIP-? : Free seed mnemonics for steganography and attack-resistance
Date: Fri, 7 Jun 2024 19:40:42 -0700 (PDT) [thread overview]
Message-ID: <de66201a-b281-4a0c-a483-dc2ffd6978b2n@googlegroups.com> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 13494 bytes --]
By adding a BIP-39 language called "Free" we can have nice things from
PBKDF2(), including offline and steganographic seeds, in a way that's fully
compatible with BIP-39. For instance users can store strong seeds in an
easy-to-conceal deck of cards.
Proposal inline below, with link to a reference implementation in Python.
Online spec
here https://github.com/akarve/bip-keychain/blob/main/pre-bips/steganographic-seeds.md.
I could use your help on a proper seed complexity measure.
Let me know your thoughts.
# Free seed mnemonics for steganography and attack-resistance
> "Anyone who considers arithmetical methods of producing random digits
is,
> of course, in a state of sin." —John Von Neumann
# Abstract
Bitcoin seed mnemonics face attackers ranging from casual thieves to state
actors.
We propose backward-compatible, non-breaking changes to BIP-39 master seed
derivation
to unlock broader mnemonic options
from PBKDF2().
Users of this BIP can generate and store seeds offline with common physical
objects
that are plausibly deniable: playing cards, chess boards, and paper napkins
to name
a few. As a result seed mnemonics gain greater portability, memorability,
entropy,
and steganography as compared to BIP-39 mnemonics.
# Motivation
BIP-39 mnemonic seed phrases have the following shortcomings:
1. BIP-39 seed words, if found by an attacker, are instantly recognizable
as a Bitcoin passphrase to be scrutinized or seized outright.
1. BIP-39 "mnemonics" are not particularly easy to remember.
1. Computing BIP-39's checksum bits necessitates a computer, making offline
seeds impossible.
1. Some hardware vendors support independent sources of entropy such as die
rolls
but, unfortunately for the security, convenience, and trust of the users,
vary
in how they convert user entropy to the proper binary seed.
1. Users are required to run their own vendor-specific code to verify
that the
vendor has actually used their provided entropy. Said verification
often _still_
requires blind trust (how does the user know that entropy produced the
right
binary seed?) and is prohibitively technical for many users.
1. It is cumbersome to manually enter the results of 100 six-sided die
rolls,
the minimum number of rolls to surpass 256 bits of entropy.
1. Die rolls are poor for storing secrets since there are usually fewer
dice
than rolls and since dice are easily mixed up.
> Compare the effort and portability of these 100 rolls
> to the far easier and more portable shuffled deck of cards.
The above weaknesses cause all Bitcoin wallets to lose,
**due to no fundamental limitation whatsoever**,
the following benefits:
1. Portability in physical media.
1. Portability and memorability in the human brain.
1. Repudiation in high-risk situations where individuals or the Bitcoin
protocol
are under attack.
1. The ability to generate, with no electronics, a cryptographically
strong seed that will later work with many hardware wallets.
1. Moore's-law-resistant mnemonics that encode far more than 256 bits of
entropy.
> Although more than 256 bits of entropy exceeds today's ECDSA private
keys,
> it rightfully leaves the door open for stronger keys in the future
> and further permits today's mnemonics to be repurposed, without
change,
> for tomorrow's larger keys.
**As result, Bitcoin users seeking to evade oppressive governments and
other attackers
have fewer defenses than the protocol naturally affords.**
Importantly, the above weaknesses can be remedied with
_little to no change to the BIP-39 seed phrase implementations_ already
ubiquitous
in hardware and software wallets.
# Risks, remedies, and alternatives
## Risks
1. Giving users more sources and options for mnemonics increases the
risk that these users provide weak inputs that contain too little entropy
or weak
entropy (e.g. common phrases).
> Implementers must mitigate this risk with an easy-to-implement entropy
> measure and message to the user.
1. BIP-39 includes checksum bits in the final word, offering some protection
against erroneous entry. The present proposal eliminates both the
advantages (integrity)
and disadvantages (cannot be computed by hand) of the BIP-39 checksum bits
in favor of a much broader set of steganographic mnemonics that can be
stored,
generated, and carried in situations of urgency and scarcity.
> BIP-39 checksum validation _shall remain in place_ (unchanged) for
BIP-39
> mnemonics.
> Advanced users might choose to implement their own checksums or
error-correcting
> codes.
# Specification
**The present spec is fully backwards-compatible with BIP-39 mnemonics**.
We introduce a new input "language" `Free` that admits a superset of BIP-39
mnemonics.
Wallets can and should continue to validate BIP-39 mnemonics as in the past.
`Free` should be treated as a new input language, similar to English,
French, or
any of the BIP-39 languages.
`Free` should allow at a minimum the ASCII printable characters, minus
capital
letters.
BIP-39 derives binary seeds by applying `PBKDF2()` to a mnemonic and
passhprase
as follows:
```
PKBDF2(
password=bin(nfkd(mnemonic)),
salt=bin(nfkd("mnemonic" || passphrase)),
hash_name="HMAC-SHA512",
iterations=2048,
)
```
`nfkd()` converts the input string to Unicode normalization form KD.
Fortunately `PBKDF2()` does not limit the domain either the `password` or
`salt`
argument. Existing implementations are therefore easy to update.
Applications must _not change_ passphrase entry, validation, normaliztion,
or application.
Applications should regard the _mnemonic_ as a raw string to permit the
widest
possible array of input characters. (See the following section for details.)
In the interest of backward compatibility we propose that existing BIP-39
implementations
treat `Free` as the tenth input "language" with the following differences
from the 9
BIP-39 languages:
1. If they do not already, lower the case of all input letters.
Strangely BIP-39 is silent on the subject of case.
1. If they do not already, `strip` the `Free` mnemonic of surrounding
whitespace
and split it on the regular expression `\s+`.
1. Apply `nfkd()` to the `Free` mnemonic.
```
if language == "Free":
norm_mnemonic = lower(nfkd(split("\s+", mnemonic)))
validate(norm_mnemonic)
PKBDF2(
password=bin(norm_mnemonic),
salt=bin(nfkd("mnemonic" || passphrase))),
hash_name="HMAC-SHA512",
iterations=2048,
)
```
The output of PKBDF2 is converted to a master seed as in BIP-39.
## `validate()`
`validate()` must at a minimum estimate the complexity `complexity()` of
the user proposed mnemonic and must refuse the mnemonic if the entropy is
less
than a threshold (we recommend a threshold of 0.5).
Implementations must know the cardinality `C` of the mnemonic character set.
Applications must support at a bare minium an input cardinality of **74**
(the number of printable ASCII characters) but higher values for `C` are
both
permissible and recommended. As suggested below, the higher the
cardinality of
the input set, the greater the steganographic potential.
0123456789abcdefghijklmnopqrstuvwxyz!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
\t\n\r\x0b\x0c
The Shannon Entropy `SE` of a string is as follows:
$$ se(X) := - \sum_{i} p(x_i) \log_2 p(x_i) $$
As an optimization for fixed `X` with all unique entries and cardinality
`C`:
$$ SE(X) := log_2(C)$$
> Intuitively the above is the (fractional) number of bits needed to
represent all
> characters in the `universe`.
The relative entropy is simply the following:
$$ re(mnemonic\_tokens, universe) := SE(mnemonic) / SE(universe)$$
`universe` is the list of all possible input tokens. `mnemonic_tokens` is a
tokenized
list of the inputs. Tokens may vary in length per the universe and
application,
though applications can start withe one token per ASCII printable character.
`re()` ranges from 0 to 1 when `mnemonic_tokens` are all unique. An `re()`
of 0.5
reflects that the user has provided enough information as providing one
instance
of half of all input tokens.
`re()` alone is not a complete measure of password complexity since it does
not take order into account. For instance the string `"abc...xyz"` and its
reverse both have hight relative entropy but are highly predictable.
To correct for this we can use the Hamming Distance, `hd()`, which counts
the
number of characters that are not in sorted order:
$$
\text{Hamming Distance} = \sum_{i=1}^{n} (s_i \neq t_i)
$$
Since undesirable order might be forwards of backwards we take the Relative
Absolute Hamming Distance `rahd()`:
```
rahd() := min(hd(norm_mnemonic), hd(norm_mnemonic.reverse())) /
len(norm_mnemonic)
```
As with `re()`, `rahd()` ranges from 0 to 1.
```
validate := F(re(norm_mnemonic) + rahd(norm_mnemonic)) / 2
```
If `validate()` returns a complexity less than a given threshold (TBD) the
wallet
should warn the user.
### TODO: examples of `validate()` for representative inputs
* [ ] Dice, cards, chess, bad + good text passwords
* [ ] Show how this leads to standard dice verification and fingerprinting
across all hardware vendors (phew)
# Reference implementation
* https://github.com/akarve/bipsea
## Example
```sh
bipsea validate -f free -m "$(cat input.txt)" | bipsea xprv
```
# Example steganographic mnemonics in `Free`
## Playing cards
A common deck of 52 cards encodes approximately 225 bits of entropy,
more entropy than 21 BIP-39 words. Such decks can be carried on one's person
without raising an eyebrow.
Users might enter cards in deck order as follows:
```
2S 10S kC 10H 5S ...
```
## Chess, three different ways
### Fictional move order
A chess board contains 64 totally ordered squares each of which can be
addressed
in algebraic notation of the form `{a-h}{1-8}`. A move specifies one of
five piece
types (`R, N, B, Q, K`) followed by a square.
`Nf1` is an example of a single knight move.
A series of 42 chess moves written on an easy-to-repudiate,
easy-to-obfuscate,
piece of paper encodes at least as much entropy as 24 BIP-39 seed words.
$$ \log_2(69^{42}) \approx 256 $$
Ensuring that such moves comprise a valid chess game (and thus greater
steganography)
is a hard problem and is neither required nor recommended in the context of
this BIP.
It is not recommended since it constrains the potential entropy in unclear
and
hard-to-reckon ways.
### PGN files
Nevertheless, high steganography in a chess game can be achieved with file
formats
that support comments, such as the common PGN format. Observe the following
snippet
from a PGN file and note the opportunities for arbitrary comments.
```
[Event "Third Rosenwald Trophy"]
[Site "New York, NY USA"]
[Date "1956.10.17"]
[EventDate "1956.10.07"]
[Round "8"]
[Result "0-1"]
[White "Donald Byrne"]
[Black "Robert James Fischer"]
[ECO "D92"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "82"]
1. Nf3 Nf6 2. c4 g6 3. Nc3 Bg7 4. d4 O-O 5. Bf4 d5 6. Qb3 dxc4
7. Qxc4 c6 8. e4 Nbd7 9. Rd1 Nb6 10. Qc5 Bg4 11. Bg5 {11. Be2
followed by 12. O-O would have been more prudent. The bishop
move played allows a sudden crescendo of tactical points to be
uncovered by Fischer. -- Wade} Na4 {!}
```
### Marked game boards
Alternatively a user might choose to subtly mark the 64 squares of two
chess boards
to represent a 1 or 0 in each of 128 unique positions, storing 128 bits of
entropy
(equivalent to 12 BIP-39 seed words). Random bits can be generated with a
coin.
## Any board game
One can imagine steganographic secrets similar to chess for Monopoly, Go,
or any
board game.
## Dice, but different
We noted above that if the user were to roll and then store 100 dice that
it would
be impractical to retain the original order.
We observe that there are 21 small writing surfaces (the solid dots on each
face)
on a six-sided die. If the user were to inscribe a single random digit into
each dot
he would obtain approximately 70 bits of ordered entropy. Three such dice
would be
easy to retain and order and provide greater entropy than 18 BIP-39 seed
words.
## A paper napkin
In addition to a literal napkin sketch (with phone numbers, measurements,
or harmless notes) users without access to coins, dice, game boards or
electronics
could generate "poor man's entropy" by dropping a stone onto a napkin
divided into
equal-sized quadrants to generate entropy.
Said "poor man's entropy" is not recommended but nevertheless illustrates
the vast expansion in capability and steganography that obtains from this
BIP.
--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/de66201a-b281-4a0c-a483-dc2ffd6978b2n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 15235 bytes --]
reply other threads:[~2024-06-08 17:55 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de66201a-b281-4a0c-a483-dc2ffd6978b2n@googlegroups.com \
--to=aneesh.karve@gmail.com \
--cc=bitcoindev@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox