* [bitcoin-dev] Codex32
@ 2023-02-16 2:16 Russell O'Connor
2023-02-16 11:50 ` Pavol Rusnak
2023-02-20 18:44 ` Andrew Poelstra
0 siblings, 2 replies; 15+ messages in thread
From: Russell O'Connor @ 2023-02-16 2:16 UTC (permalink / raw)
To: Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 29073 bytes --]
I've been asked by Dr. Curr and Professor Snead to forward this message to
this mailing list, as it may be of general interest to Bitcoin users.
Dear Colleague:
In 1967, during excavation for the construction of a new shopping center in
Monroeville, Pennsylvania, workers uncovered a vault containing a cache of
ancient scrolls[1]. Most were severely damaged, but those that could be
recovered confirmed the existence of a secret society long suspected to
have
been active in the region around the year 200 BC.
Based on a translation of these documents, we now know that the society,
the
Cult of the Bound Variable, was devoted to the careful study of
computation,
over two millennia before the invention of the digital computer.
While the Monroeville scrolls make reference to computing machines made of
sandstone, most researchers believed this to be a poetic metaphor and that
the
"computers" were in fact the initiates themselves, carrying out the
unimaginably tedious steps of their computations with reed pens on
parchment.
Within the vault, a collection of sandstone wheels marked in a language
consisting of 32 glyphs was found. After 15 years of study, we have
successfully
completed the translation of what is known as "Codex32," a document that
describes the functions of the wheels. It was discovered that the wheels
operate
a system of cryptographic computations that was used by cult members to
safeguard their most valuable secrets.
The Codex32 system allows secrets to be carved into multiple tablets and
scattered to the far corners of the earth. When a sufficient number of
tablets are
brought together the stone wheels are manipulated in a manner to recover the
secrets. This finding may be of particular interest to the Bitcoin
community.
Below we provide a summary of the cult's secret sharing system, which is
graciously hosted at
<
https://github.com/apoelstra/bips/blob/2023-02--volvelles/bip-0000.mediawiki
>.
We are requesting a record assignment in the Bibliography of Immemorial
Philosophy (BIP) repository.
Thank you for your consideration.
Dr. Leon O. Curr and Professor Pearlwort Snead
Department of Archaeocryptography
Harry Q. Bovik Institute for the Advancement
[1] http://www.boundvariable.org/task.shtml
-----BEGIN BIP-----
<pre>
BIP: ????
Layer: Applications
Title: codex32
Author: Leon Olsson Curr and Pearlwort Sneed <pearlwort@wpsoftware.net>
Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-????
Status: Draft
Type: ????
Created: 2023-02-13
License: BSD-3-Clause
Post-History: FIXME
</pre>
==Introduction==
===Abstract===
This document describes a standard for backing up and restoring the master
seed of a
[https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032]
hierarchical deterministic wallet, using Shamir's secret sharing.
It includes an encoding format, a BCH error-correcting checksum, and
algorithms for share generation and secret recovery.
Secret data can be split into up to 31 shares.
A minimum threshold of shares, which can be between 1 and 9, is needed to
recover the secret, whereas without sufficient shares, no information about
the secret is recoverable.
===Copyright===
This document is licensed under the 3-clause BSD license.
===Motivation===
BIP-0032 master seed data is the source entropy used to derive all private
keys in an HD wallet.
Safely storing this secret data is the hardest and most important part of
self-custody.
However, there is a tension between security, which demands limiting the
number of backups, and resilience, which demands widely replicated backups.
Encrypting the seed does not change this fundamental tradeoff, since it
leaves essentially the same problem of how to back up the encryption key(s).
To allow users freedom to make this tradeoff, we use Shamir's secret
sharing, which guarantees that any number of shares less than the threshold
leaks no information about the secret.
This approach allows increasing safety by widely distributing the generated
shares, while also providing security against the compromise of one or more
shares (as long as fewer than the threshold have been compromised).
[https://github.com/satoshilabs/slips/blob/master/slip-0039.md SLIP-0039]
has essentially the same motivations as this standard.
However, unlike SLIP-0039, this standard also aims to be simple enough for
hand computation.
Users who demand a higher level of security for particular secrets, or have
a general distrust in digital electronic devices, have the option of using
hand computation to backup and restore secret data in an interoperable
manner.
Note that hand computation is optional, the particular details of hand
computation are outside the scope of this standard, and implementers do not
need to be concerned with this possibility.
[https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki BIP-0039]
serves the same purpose as this standard: encoding master seeds for storage
by users.
However, BIP-0039 has no error-correcting ability, cannot sensibly be
extended to support secret sharing, has no support for versioning or other
metadata, and has many technical design decisions that make implementation
and interoperability difficult (for example, the use of SHA-512 to derive
seeds, or the use of 11-bit words).
==Specification==
===codex32===
A codex32 string is similar to a Bech32 string defined in [
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
It reuses the base32 character set from BIP-0173, and consists of:
* A human-readable part, which is the string "ms" (or "MS").
* A separator, which is always "1".
* A data part which is in turn subdivided into:
** A threshold parameter, which MUST be a single digit between "2" and "9",
or the digit "0".
*** If the threshold parameter is "0" then the share index, defined below,
MUST have a value of "s" (or "S").
** An identifier consisting of 4 Bech32 characters.
** A share index, which is any Bech32 character. Note that a share index
value of "s" (or "S") is special and denotes the unshared secret (see
section "Unshared Secret").
** A payload which is a sequence of up to 74 Bech32 characters. (However,
see '''Long codex32 Strings''' below for an exception to this limit.)
** A checksum which consists of 13 Bech32 characters as described below.
As with Bech32 strings, a codex32 string MUST be entirely uppercase or
entirely lowercase.
The lowercase form is used when determining a character's value for
checksum purposes.
For presentation, lowercase is usually preferable, but uppercase SHOULD be
used for handwritten codex32 strings.
===Checksum===
The last thirteen characters of the data part form a checksum and contain
no information.
Valid strings MUST pass the criteria for validity specified by the Python3
code snippet below.
The function <code>ms32_verify_checksum</code> must return true when its
argument is the data part as a list of integers representing the characters
converted using the bech32 character table from BIP-0173.
To construct a valid checksum given the data-part characters (excluding the
checksum), the <code>ms32_create_checksum</code> function can be used.
<source lang="python">
MS32_CONST = 0x10ce0795c2fd1e62a
def ms32_polymod(values):
GEN = [
0x19dc500ce73fde210,
0x1bfae00def77fe529,
0x1fbd920fffe7bee52,
0x1739640bdeee3fdad,
0x07729a039cfc75f5a,
]
residue = 0x23181b3
for v in values:
b = (residue >> 60)
residue = (residue & 0x0fffffffffffffff) << 5 ^ v
for i in range(5):
residue ^= GEN[i] if ((b >> i) & 1) else 0
return residue
def ms32_verify_checksum(data):
if len(data) >= 96: # See Long codex32 Strings
return ms32_verify_long_checksum(data)
if len(data) <= 93:
return ms32_polymod(data) == MS32_CONST
return False
def ms32_create_checksum(data):
if len(data) > 80: # See Long codex32 Strings
return ms32_create_long_checksum(data)
values = data
polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST
return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
</source>
===Error Correction===
A codex32 string without a valid checksum MUST NOT be used.
The checksum is designed to be an error correcting code that can correct up
to 4 character substitutions, up to 8 unreadable characters (called
erasures), or up to 13 consecutive erasures.
Implementations SHOULD provide the user with a corrected valid codex32
string if possible.
However, implementations SHOULD NOT automatically proceed with a corrected
codex32 string without user confirmation of the corrected string, either by
prompting the user, or returning a corrected string in an error message and
allowing the user to repeat their action.
We do not specify how an implementation should implement error correction.
However, we recommend that:
* Implementations make suggestions to substitute non-bech32 characters with
bech32 characters in some situations, such as replacing "B" with "8", "O"
with "0", "I" with "l", etc.
* Implementations interpret "?" as an erasure.
* Implementations optionally interpret other non-bech32 characters, or
characters with incorrect case, as erasures.
* If a string with 8 or fewer erasures can have those erasures filled in to
make a valid codex32 string, then the implementation suggests such a string
as a correction.
* If a string consisting of valid Bech32 characters in the proper case can
be made valid by substituting 4 or fewer characters, then the
implementation suggests such a string as a correction.
===Unshared Secret===
When the share index of a valid codex32 string (converted to lowercase) is
the letter "s", we call the string a codex32 secret.
The subsequent data characters in a codex32 secret, excluding the final
checksum of 13 characters, is a direct encoding of a BIP-0032 HD master
seed.
The master seed is decoded by converting the data to bytes:
* Translate the characters to 5 bits values using the bech32 character
table from BIP-0173, most significant bit first.
* Re-arrange those bits into groups of 8 bits. Any incomplete group at the
end MUST be 4 bits or less, and is discarded.
Note that unlike the decoding process in BIP-0173, we do NOT require that
the incomplete group be all zeros.
For an unshared secret, the threshold parameter (the first character of the
data part) is ignored (beyond the fact it must be a digit for the codex32
string to be valid).
We recommend using the digit "0" for the threshold parameter in this case.
The 4 character identifier also has no effect beyond aiding users in
distinguishing between multiple different master seeds in cases where they
have more than one.
===Recovering Master Seed===
When the share index of a valid codex32 string (converted to lowercase) is
not the letter "s", we call the string an codex32 share.
The first character of the data part indicates the threshold of the share,
and it is required to be a non-"0" digit.
In order to recover a master seed, one needs a set of valid codex32 shares
such that:
* All shares have the same threshold value, the same identifier, and the
same length.
* All of the share index values are distinct.
* The number of codex32 shares is exactly equal to the (common) threshold
value.
If all the above conditions are satisfied, the <code>ms32_recover</code>
function will return a codex32 secret when its argument is the list of
codex32 shares with each share represented as a list of integers
representing the characters converted using the bech32 character table from
BIP-0173.
<source lang="python">
bech32_inv = [
0, 1, 20, 24, 10, 8, 12, 29, 5, 11, 4, 9, 6, 28, 26, 31,
22, 18, 17, 23, 2, 25, 16, 19, 3, 21, 14, 30, 13, 7, 27, 15,
]
def bech32_mul(a, b):
res = 0
for i in range(5):
res ^= a if ((b >> i) & 1) else 0
a *= 2
a ^= 41 if (32 <= a) else 0
return res
def bech32_lagrange(l, x):
n = 1
c = []
for i in l:
n = bech32_mul(n, i ^ x)
m = 1
for j in l:
m = bech32_mul(m, (x if i == j else i) ^ j)
c.append(m)
return [bech32_mul(n, bech32_inv[i]) for i in c]
def ms32_interpolate(l, x):
w = bech32_lagrange([s[5] for s in l], x)
res = []
for i in range(len(l[0])):
n = 0
for j in range(len(l)):
n ^= bech32_mul(w[j], l[j][i])
res.append(n)
return res
def ms32_recover(l):
return ms32_interpolate(l, 16)
</source>
===Generating Shares===
If we already have ''t'' valid codex32 strings such that:
* All strings have the same threshold value ''t'', the same identifier, and
the same length
* All of the share index values are distinct
Then we can derive additional shares with the <code>ms32_interpolate</code>
function by passing it a list of exactly ''t'' of these codex32 strings,
together with a fresh share index distinct from all of the existing share
indexes.
The newly derived share will have the provided share index.
Once a user has generated ''n'' codex32 shares, they may discard the
codex32 secret (if it exists).
The ''n'' shares form a ''t'' of ''n'' Shamir's secret sharing scheme of a
codex32 secret.
There are two ways to create an initial set of ''t'' valid codex32 strings,
depending on whether the user already has an existing master seed to split.
====For an existing master seed====
Before generating shares for an existing master seed, it first must be
converted into a codex32 secret, as described above.
The conversion process consists of:
* Choosing a threshold value ''t'' between 2 and 9, inclusive
* Choosing a 4 bech32 character identifier
** We do not define how to choose the identifier, beyond noting that it
SHOULD be distinct for every master seed the user may need to disambiguate.
* Setting the share index to "s"
* Setting the payload to a Bech32 encoding of the master seed, padded with
arbitrary bits
* Generating a valid checksum in accordance with the Checksum section
Along with the codex32 secret, the user must generate ''t''-1 other codex32
shares, each with the same threshold value, the same identifier, and a
distinct share index.
The set of share indexes may be chosen arbitrarily.
The payload of each of these codex32 shares is chosen uniformly at random
such that it has the same length as the payload of the codex32 secret.
For each share, a valid checksum must be generated in accordance with the
Checksum section.
The codex32 secret and the ''t''-1 codex32 shares form a set of ''t'' valid
codex32 strings from which additional shares can be derived as described
above.
====For a fresh master seed====
In the case that the user wishes to generate a fresh master seed, the user
chooses a threshold value ''t'' and an identifier, then generates ''t''
random codex32 shares, using the generation procedure from the previous
section.
As before, each share must have the same threshold value ''t'', the same
identifier, and a distinct share index.
With this set of ''t'' codex32 shares, new shares can be derived as
discussed above. This process generates a fresh master seed, whose value
can be retrieved by running the recovery process on any ''t'' of these
shares.
===Long codex32 Strings===
The 13 character checksum design only supports up to 80 data characters.
Excluding the threshold, identifier and index characters, this limits the
payload to 74 characters or 46 bytes.
While this is enough to support the 32-byte advised size of BIP-0032 master
seeds, BIP-0032 allows seeds to be up to 64 bytes in size.
We define a long codex32 string format to support these longer seeds by
defining an alternative checksum.
<source lang="python">
MS32_LONG_CONST = 0x43381e570bf4798ab26
def ms32_long_polymod(values):
GEN = [
0x3d59d273535ea62d897,
0x7a9becb6361c6c51507,
0x543f9b7e6c38d8a2a0e,
0x0c577eaeccf1990d13c,
0x1887f74f8dc71b10651,
]
residue = 0x23181b3
for v in values:
b = (residue >> 70)
residue = (residue & 0x3fffffffffffffffff) << 5 ^ v
for i in range(5):
residue ^= GEN[i] if ((b >> i) & 1) else 0
return residue
def ms32_verify_long_checksum(data):
return ms32_long_polymod(data) == MS32_LONG_CONST
def ms32_create_long_checksum(data):
values = data
polymod = ms32_long_polymod(values + [0] * 15) ^ MS32_LONG_CONST
return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]
</source>
A long codex32 string follows the same specification as a regular codex32
string with the following changes.
* The payload is a sequence of between 75 and 103 Bech32 characters.
* The checksum consists of 15 Bech32 characters as defined above.
A codex32 string with a data part of 94 or 95 characters is never legal as
a regular codex32 string is limited to 93 data characters and a long
codex32 string is at least 96 characters.
Generation of long shares and recovery of the master seed from long shares
proceeds in exactly the same way as for regular shares with the
<code>ms32_interpolate</code> function.
The long checksum is designed to be an error correcting code that can
correct up to 4 character substitutions, up to 8 unreadable characters
(called erasures), or up to 15 consecutive erasures.
As with regular checksums we do not specify how an implementation should
implement error correction, and all our recommendations for error
correction of regular codex32 strings also apply to long codex32 strings.
==Rationale==
This scheme is based on the observation that the Lagrange interpolation of
valid codewords in a BCH code will always be a valid codeword.
This means that derived shares will always have valid checksum, and a
sufficient threshold of shares with valid checksums will derive a secret
with a valid checksum.
The header system is also compatible with Lagrange interpolation, meaning
all derived shares will have the same identifier and will have the
appropriate share index.
This fact allows the header data to be covered by the checksum.
The checksum size and identifier size have been chosen so that the encoding
of 128-bit seeds and shares fit within 48 characters.
This is a standard size for many common seed storage formats, which has
been popularized by the 12 four-letter word format of the BIP-0039 mnemonic.
The 13 character checksum is adequate to correct 4 errors in up to 93
characters (80 characters of data and 13 characters of the checksum). This
is somewhat better quality than the checksum used in SLIP-0039.
For 256-bit seeds and shares our strings are 74 characters, which fits into
the 96 character format of the 24 four-letter word format of the BIP-0039
mnemonic, with plenty of room to spare.
A longer checksum is needed to support up to 512-bit seeds, the longest
seed length specified in BIP-0032, as the 13 character checksum isn't
adequate for more than 80 data characters.
While we could use the 15 character checksum for both cases, we prefer to
keep the strings as short as possible for the more common cases of 128-bit
and 256-bit master seeds.
We only guarantee to correct 4 characters no matter how long the string is.
Longer strings mean more chances for transcription errors, so shorter
strings are better.
The longest data part using the regular 13 character checksum is 93
characters and corresponds to a 400-bit secret.
At this length, the prefix <code>MS1</code> is not covered by the checksum.
This is acceptable because the checksum scheme itself requires you to know
that the <code>MS1</code> prefix is being used in the first place.
If the prefix is damaged and a user is guessing that the data might be
using this scheme, then the user can enter the available data explicitly
using the suspected <code>MS1</code> prefix.
==Backwards Compatibility==
codex32 is an alternative to BIP-0039 and SLIP-0039.
It is technically possible to derive the BIP32 master seed from seed words
encoded in one of these schemes, and then to encode this seed in codex32.
For BIP-0039 this process is irreversible, since it involves hashing the
original words.
Furthermore, the resulting seed will be 512 bits long, which may be too
large to be safely and conveniently handled.
SLIP-0039 seed words can be reversibly converted to master seeds, so it is
possible to interconvert between SLIP-0039 and codex32.
However, SLIP-0039 '''shares''' cannot be converted to codex32 shares
because the two schemes use a different underlying field.
The authors of this BIP do not recommend interconversion.
Instead, users who wish to switch to codex32 should generate a fresh seed
and sweep their coins.
==Reference Implementation==
* [https://secretcodex32.com/docs/2023-02-14--bw.ps Reference PostScript
Implementation]
* FIXME add Python implementation
* FIXME add Rust implementation
==Test Vectors==
===Test vector 1===
This example shows the codex32 format, when used without splitting the
secret into any shares.
The data part contains 26 Bech32 characters, which corresponds to 130 bits.
We truncate the last two bits in order to obtain a 128-bit master seed.
codex32 secret (Bech32):
<code>ms10testsxxxxxxxxxxxxxxxxxxxxxxxxxx4nzvca9cmczlw</code>
Master secret (hex): <code>318c6318c6318c6318c6318c6318c631</code>
* human-readable part: <code>ms</code>
* separator: <code>1</code>
* k value: <code>0</code> (no secret splitting)
* identifier: <code>test</code>
* share index: <code>s</code> (the secret)
* data: <code>xxxxxxxxxxxxxxxxxxxxxxxxxx</code>
* checksum: <code>4nzvca9cmczlw</code>
===Test vector 2===
This example shows generating a new master seed using "random" codex32
shares, as well as deriving an additional codex32 share, using ''k''=2 and
an identifier of <code>NAME</code>.
Although codex32 strings are canonically all lowercase, it's also valid to
use all uppercase.
Share with index <code>A</code>:
<code>MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP870HKKQRM</code>
Share with index <code>C</code>:
<code>MS12NAMECACDEFGHJKLMNPQRSTUVWXYZ023FTR2GDZMPY6PN</code>
* Derived share with index <code>D</code>:
<code>MS12NAMEDLL4F8JLH4E5VDVULDLFXU2JHDNLSM97XVENRXEG</code>
* Secret share with index <code>S</code>:
<code>MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW</code>
* Master secret (hex): <code>d1808e096b35b209ca12132b264662a5</code>
Note that per BIP-0173, the lowercase form is used when determining a
character's value for checksum purposes.
In particular, given an all uppercase codex32 string, we still use
lowercase <code>ms</code> as the human-readable part during checksum
construction.
===Test vector 3===
This example shows splitting an existing 128-bit master seed into "random"
codex32 shares, using ''k''=3 and an identifier of <code>cash</code>.
We appended two zero bits in order to obtain 26 Bech32 characters (130 bits
of data) from the 128-bit master seed.
Master secret (hex): <code>ffeeddccbbaa99887766554433221100</code>
Secret share with index <code>s</code>:
<code>ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln</code>
Share with index <code>a</code>:
<code>ms13casha320zyxwvutsrqpnmlkjhgfedca2a8d0zehn8a0t</code>
Share with index <code>c</code>:
<code>ms13cashcacdefghjklmnpqrstuvwxyz023949xq35my48dr</code>
* Derived share with index <code>d</code>:
<code>ms13cashd0wsedstcdcts64cd7wvy4m90lm28w4ffupqs7rm</code>
* Derived share with index <code>e</code>:
<code>ms13casheekgpemxzshcrmqhaydlp6yhms3ws7320xyxsar9</code>
* Derived share with index <code>f</code>:
<code>ms13cashf8jh6sdrkpyrsp5ut94pj8ktehhw2hfvyrj48704</code>
Any three of the five shares among <code>acdef</code> can be used to
recover the secret.
Note that the choice to append two zero bits was arbitrary, and any of the
following four secret shares would have been valid choices.
However, each choice would have resulted in a different set of derived
shares.
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln</code>
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qpte35dvzkjpt0r</code>
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qzfatvdwq5692k6</code>
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qrsx6ydhed97jx2</code>
===Test vector 4===
This example shows converting a 256-bit secret into a codex32 secret,
without splitting the secret into any shares.
We appended four zero bits in order to obtain 52 Bech32 characters (260
bits of data) from the 256-bit secret.
256-bit secret (hex):
<code>ffeeddccbbaa99887766554433221100ffeeddccbbaa99887766554433221100</code>
* codex32 secret:
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqqtum9pgv99ycma</code>
Note that the choice to append four zero bits was arbitrary, and any of the
following sixteen codex32 secrets would have been valid:
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqqtum9pgv99ycma</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqpj82dp34u6lqtd</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqzsrs4pnh7jmpj5</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqrfcpap2w8dqezy</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqy5tdvphn6znrf0</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq9dsuypw2ragmel</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqx05xupvgp4v6qx</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq8k0h5p43c2hzsk</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqgum7hplmjtr8ks</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqf9q0lpxzt5clxq</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq28y48pyqfuu7le</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqt7ly0paesr8x0f</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqvrvg7pqydv5uyz</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqd6hekpea5n0y5j</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqwcnrwpmlkmt9dt</code>
*
<code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq0pgjxpzx0ysaam</code>
===Test vector 5===
This example shows generating a new 512-bit master seed using "random"
codex32 characters and appending a checksum.
The payload contains 103 Bech32 characters, which corresponds to 515 bits.
The last three bits are discarded when converting to a 512-bit master seed.
This is an example of a '''Long codex32 String'''.
* Secret share with index <code>S</code>:
<code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
* Master secret (hex):
<code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
==Appendix==
===Mathematical Companion===
Below we use the Bech32 character set to denote values in GF[32].
In Bech32, the letter <code>Q</code> denotes zero and the letter
<code>P</code> denotes one.
The digits <code>0</code> and <code>2</code> through <code>9</code> do
''not'' denote their numeric values.
They are simply elements of GF[32].
The generating polynomial for our BCH code is as follows.
We extend GF[32] to GF[1024] by adjoining a primitive cube root of unity,
<code>ζ</code>, satisfying <code>ζ^2 = ζ + P</code>.
We select <code>β := G ζ</code> which has order 93, and construct the
product <code>(x - β^i)</code> for <code>i</code> in <code>{17, 20, 46, 49,
52, 77, 78, 79, 80, 81, 82, 83, 84}</code>.
The resulting polynomial is our generating polynomial for our 13 character
checksum:
x^13 + E x^12 + M x^11 + 3 x^10 + G x^9 + Q x^8 + E x^7 + E x^6 + E x^5
+ L x^4 + M x^3 + C x^2 + S x + S
For our long checksum, we select <code>γ := E + X ζ</code>, which has order
1023, and construct the product <code>(x - γ^i)</code> for <code>i</code>
in <code>{32, 64, 96, 895, 927, 959, 991, 1019, 1020, 1021, 1022, 1023,
1024, 1025, 1026}</code>.
The resulting polynomial is our generating polynomial for our 15 character
checksum for long strings:
x^15 + 0 x^14 + 2 x^13 + E x^12 + 6 x^11 + F x^10 + E x^9 + 4 x^8 + X
x^7 + H x^6 + 4 x^5 + X x^4 + 9 x^3 + K x^2 + Y x^1 + H
(Reminder: the character <code>0</code> does ''not'' denote the zero of the
field.)
-----END BIP-----
[-- Attachment #2: Type: text/html, Size: 91685 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-16 2:16 [bitcoin-dev] Codex32 Russell O'Connor
@ 2023-02-16 11:50 ` Pavol Rusnak
2023-02-16 13:49 ` Andrew Poelstra
2023-02-20 18:44 ` Andrew Poelstra
1 sibling, 1 reply; 15+ messages in thread
From: Pavol Rusnak @ 2023-02-16 11:50 UTC (permalink / raw)
To: Russell O'Connor, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 31670 bytes --]
Hi!
The BIP states that its only advantage over SLIP-0039, which has been used
in production for nearly three years (in at at least 3 SW/HW wallet
implementations), is that it aims to be simple enough for hand computation.
However, the BIP also indicates that "details of hand computation are
outside the scope of this standard, and implementers do not need to be
concerned with this possibility." Therefore, I am curious about how
significant this advantage over SLIP-0039 really is. If hand computation is
not straightforward and there are no other substantial advantages over
SLIP-0039, I cannot help but feel that this BIP is simply a result of
not-invented-here syndrome, but please correct me if I am wrong.
Keep in mind that the encoded shares in SLIP-0039 consist of exactly 200 or
330 bits, both of which are divisible by 5. This makes it straightforward
to encode them as Bech32 strings.
On Thu, 16 Feb 2023 at 09:30, Russell O'Connor via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> I've been asked by Dr. Curr and Professor Snead to forward this message to
> this mailing list, as it may be of general interest to Bitcoin users.
>
> Dear Colleague:
>
> In 1967, during excavation for the construction of a new shopping center
> in
> Monroeville, Pennsylvania, workers uncovered a vault containing a cache of
> ancient scrolls[1]. Most were severely damaged, but those that could be
> recovered confirmed the existence of a secret society long suspected to
> have
> been active in the region around the year 200 BC.
>
> Based on a translation of these documents, we now know that the society,
> the
> Cult of the Bound Variable, was devoted to the careful study of
> computation,
> over two millennia before the invention of the digital computer.
>
> While the Monroeville scrolls make reference to computing machines made of
> sandstone, most researchers believed this to be a poetic metaphor and that
> the
> "computers" were in fact the initiates themselves, carrying out the
> unimaginably tedious steps of their computations with reed pens on
> parchment.
>
> Within the vault, a collection of sandstone wheels marked in a language
> consisting of 32 glyphs was found. After 15 years of study, we have
> successfully
> completed the translation of what is known as "Codex32," a document that
> describes the functions of the wheels. It was discovered that the wheels
> operate
> a system of cryptographic computations that was used by cult members to
> safeguard their most valuable secrets.
>
> The Codex32 system allows secrets to be carved into multiple tablets and
> scattered to the far corners of the earth. When a sufficient number of
> tablets are
> brought together the stone wheels are manipulated in a manner to recover
> the
> secrets. This finding may be of particular interest to the Bitcoin
> community.
>
> Below we provide a summary of the cult's secret sharing system, which is
> graciously hosted at
> <
> https://github.com/apoelstra/bips/blob/2023-02--volvelles/bip-0000.mediawiki
> >.
> We are requesting a record assignment in the Bibliography of Immemorial
> Philosophy (BIP) repository.
>
> Thank you for your consideration.
>
> Dr. Leon O. Curr and Professor Pearlwort Snead
> Department of Archaeocryptography
> Harry Q. Bovik Institute for the Advancement
>
> [1] http://www.boundvariable.org/task.shtml
>
> -----BEGIN BIP-----
>
> <pre>
> BIP: ????
> Layer: Applications
> Title: codex32
> Author: Leon Olsson Curr and Pearlwort Sneed <pearlwort@wpsoftware.net>
> Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-????
> Status: Draft
> Type: ????
> Created: 2023-02-13
> License: BSD-3-Clause
> Post-History: FIXME
> </pre>
>
> ==Introduction==
>
> ===Abstract===
>
> This document describes a standard for backing up and restoring the master
> seed of a
> [https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032]
> hierarchical deterministic wallet, using Shamir's secret sharing.
> It includes an encoding format, a BCH error-correcting checksum, and
> algorithms for share generation and secret recovery.
> Secret data can be split into up to 31 shares.
> A minimum threshold of shares, which can be between 1 and 9, is needed to
> recover the secret, whereas without sufficient shares, no information about
> the secret is recoverable.
>
> ===Copyright===
>
> This document is licensed under the 3-clause BSD license.
>
> ===Motivation===
>
> BIP-0032 master seed data is the source entropy used to derive all private
> keys in an HD wallet.
> Safely storing this secret data is the hardest and most important part of
> self-custody.
> However, there is a tension between security, which demands limiting the
> number of backups, and resilience, which demands widely replicated backups.
> Encrypting the seed does not change this fundamental tradeoff, since it
> leaves essentially the same problem of how to back up the encryption key(s).
>
> To allow users freedom to make this tradeoff, we use Shamir's secret
> sharing, which guarantees that any number of shares less than the threshold
> leaks no information about the secret.
> This approach allows increasing safety by widely distributing the
> generated shares, while also providing security against the compromise of
> one or more shares (as long as fewer than the threshold have been
> compromised).
>
> [https://github.com/satoshilabs/slips/blob/master/slip-0039.md SLIP-0039]
> has essentially the same motivations as this standard.
> However, unlike SLIP-0039, this standard also aims to be simple enough for
> hand computation.
> Users who demand a higher level of security for particular secrets, or
> have a general distrust in digital electronic devices, have the option of
> using hand computation to backup and restore secret data in an
> interoperable manner.
> Note that hand computation is optional, the particular details of hand
> computation are outside the scope of this standard, and implementers do not
> need to be concerned with this possibility.
>
> [https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki BIP-0039]
> serves the same purpose as this standard: encoding master seeds for storage
> by users.
> However, BIP-0039 has no error-correcting ability, cannot sensibly be
> extended to support secret sharing, has no support for versioning or other
> metadata, and has many technical design decisions that make implementation
> and interoperability difficult (for example, the use of SHA-512 to derive
> seeds, or the use of 11-bit words).
>
> ==Specification==
>
> ===codex32===
>
> A codex32 string is similar to a Bech32 string defined in [
> https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
> It reuses the base32 character set from BIP-0173, and consists of:
>
> * A human-readable part, which is the string "ms" (or "MS").
> * A separator, which is always "1".
> * A data part which is in turn subdivided into:
> ** A threshold parameter, which MUST be a single digit between "2" and
> "9", or the digit "0".
> *** If the threshold parameter is "0" then the share index, defined below,
> MUST have a value of "s" (or "S").
> ** An identifier consisting of 4 Bech32 characters.
> ** A share index, which is any Bech32 character. Note that a share index
> value of "s" (or "S") is special and denotes the unshared secret (see
> section "Unshared Secret").
> ** A payload which is a sequence of up to 74 Bech32 characters. (However,
> see '''Long codex32 Strings''' below for an exception to this limit.)
> ** A checksum which consists of 13 Bech32 characters as described below.
>
> As with Bech32 strings, a codex32 string MUST be entirely uppercase or
> entirely lowercase.
> The lowercase form is used when determining a character's value for
> checksum purposes.
> For presentation, lowercase is usually preferable, but uppercase SHOULD be
> used for handwritten codex32 strings.
>
> ===Checksum===
>
> The last thirteen characters of the data part form a checksum and contain
> no information.
> Valid strings MUST pass the criteria for validity specified by the Python3
> code snippet below.
> The function <code>ms32_verify_checksum</code> must return true when its
> argument is the data part as a list of integers representing the characters
> converted using the bech32 character table from BIP-0173.
>
> To construct a valid checksum given the data-part characters (excluding
> the checksum), the <code>ms32_create_checksum</code> function can be used.
>
> <source lang="python">
> MS32_CONST = 0x10ce0795c2fd1e62a
>
> def ms32_polymod(values):
> GEN = [
> 0x19dc500ce73fde210,
> 0x1bfae00def77fe529,
> 0x1fbd920fffe7bee52,
> 0x1739640bdeee3fdad,
> 0x07729a039cfc75f5a,
> ]
> residue = 0x23181b3
> for v in values:
> b = (residue >> 60)
> residue = (residue & 0x0fffffffffffffff) << 5 ^ v
> for i in range(5):
> residue ^= GEN[i] if ((b >> i) & 1) else 0
> return residue
>
> def ms32_verify_checksum(data):
> if len(data) >= 96: # See Long codex32 Strings
> return ms32_verify_long_checksum(data)
> if len(data) <= 93:
> return ms32_polymod(data) == MS32_CONST
> return False
>
> def ms32_create_checksum(data):
> if len(data) > 80: # See Long codex32 Strings
> return ms32_create_long_checksum(data)
> values = data
> polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST
> return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
> </source>
>
> ===Error Correction===
>
> A codex32 string without a valid checksum MUST NOT be used.
> The checksum is designed to be an error correcting code that can correct
> up to 4 character substitutions, up to 8 unreadable characters (called
> erasures), or up to 13 consecutive erasures.
> Implementations SHOULD provide the user with a corrected valid codex32
> string if possible.
> However, implementations SHOULD NOT automatically proceed with a corrected
> codex32 string without user confirmation of the corrected string, either by
> prompting the user, or returning a corrected string in an error message and
> allowing the user to repeat their action.
> We do not specify how an implementation should implement error correction.
> However, we recommend that:
>
> * Implementations make suggestions to substitute non-bech32 characters
> with bech32 characters in some situations, such as replacing "B" with "8",
> "O" with "0", "I" with "l", etc.
> * Implementations interpret "?" as an erasure.
> * Implementations optionally interpret other non-bech32 characters, or
> characters with incorrect case, as erasures.
> * If a string with 8 or fewer erasures can have those erasures filled in
> to make a valid codex32 string, then the implementation suggests such a
> string as a correction.
> * If a string consisting of valid Bech32 characters in the proper case can
> be made valid by substituting 4 or fewer characters, then the
> implementation suggests such a string as a correction.
>
> ===Unshared Secret===
>
> When the share index of a valid codex32 string (converted to lowercase) is
> the letter "s", we call the string a codex32 secret.
> The subsequent data characters in a codex32 secret, excluding the final
> checksum of 13 characters, is a direct encoding of a BIP-0032 HD master
> seed.
>
> The master seed is decoded by converting the data to bytes:
>
> * Translate the characters to 5 bits values using the bech32 character
> table from BIP-0173, most significant bit first.
> * Re-arrange those bits into groups of 8 bits. Any incomplete group at the
> end MUST be 4 bits or less, and is discarded.
>
> Note that unlike the decoding process in BIP-0173, we do NOT require that
> the incomplete group be all zeros.
>
> For an unshared secret, the threshold parameter (the first character of
> the data part) is ignored (beyond the fact it must be a digit for the
> codex32 string to be valid).
> We recommend using the digit "0" for the threshold parameter in this case.
> The 4 character identifier also has no effect beyond aiding users in
> distinguishing between multiple different master seeds in cases where they
> have more than one.
>
> ===Recovering Master Seed===
>
> When the share index of a valid codex32 string (converted to lowercase) is
> not the letter "s", we call the string an codex32 share.
> The first character of the data part indicates the threshold of the share,
> and it is required to be a non-"0" digit.
>
> In order to recover a master seed, one needs a set of valid codex32 shares
> such that:
>
> * All shares have the same threshold value, the same identifier, and the
> same length.
> * All of the share index values are distinct.
> * The number of codex32 shares is exactly equal to the (common) threshold
> value.
>
> If all the above conditions are satisfied, the <code>ms32_recover</code>
> function will return a codex32 secret when its argument is the list of
> codex32 shares with each share represented as a list of integers
> representing the characters converted using the bech32 character table from
> BIP-0173.
>
> <source lang="python">
> bech32_inv = [
> 0, 1, 20, 24, 10, 8, 12, 29, 5, 11, 4, 9, 6, 28, 26, 31,
> 22, 18, 17, 23, 2, 25, 16, 19, 3, 21, 14, 30, 13, 7, 27, 15,
> ]
>
> def bech32_mul(a, b):
> res = 0
> for i in range(5):
> res ^= a if ((b >> i) & 1) else 0
> a *= 2
> a ^= 41 if (32 <= a) else 0
> return res
>
> def bech32_lagrange(l, x):
> n = 1
> c = []
> for i in l:
> n = bech32_mul(n, i ^ x)
> m = 1
> for j in l:
> m = bech32_mul(m, (x if i == j else i) ^ j)
> c.append(m)
> return [bech32_mul(n, bech32_inv[i]) for i in c]
>
> def ms32_interpolate(l, x):
> w = bech32_lagrange([s[5] for s in l], x)
> res = []
> for i in range(len(l[0])):
> n = 0
> for j in range(len(l)):
> n ^= bech32_mul(w[j], l[j][i])
> res.append(n)
> return res
>
> def ms32_recover(l):
> return ms32_interpolate(l, 16)
> </source>
>
> ===Generating Shares===
>
> If we already have ''t'' valid codex32 strings such that:
>
> * All strings have the same threshold value ''t'', the same identifier,
> and the same length
> * All of the share index values are distinct
>
> Then we can derive additional shares with the
> <code>ms32_interpolate</code> function by passing it a list of exactly
> ''t'' of these codex32 strings, together with a fresh share index distinct
> from all of the existing share indexes.
> The newly derived share will have the provided share index.
>
> Once a user has generated ''n'' codex32 shares, they may discard the
> codex32 secret (if it exists).
> The ''n'' shares form a ''t'' of ''n'' Shamir's secret sharing scheme of a
> codex32 secret.
>
> There are two ways to create an initial set of ''t'' valid codex32
> strings, depending on whether the user already has an existing master seed
> to split.
>
> ====For an existing master seed====
>
> Before generating shares for an existing master seed, it first must be
> converted into a codex32 secret, as described above.
> The conversion process consists of:
>
> * Choosing a threshold value ''t'' between 2 and 9, inclusive
> * Choosing a 4 bech32 character identifier
> ** We do not define how to choose the identifier, beyond noting that it
> SHOULD be distinct for every master seed the user may need to disambiguate.
> * Setting the share index to "s"
> * Setting the payload to a Bech32 encoding of the master seed, padded with
> arbitrary bits
> * Generating a valid checksum in accordance with the Checksum section
>
> Along with the codex32 secret, the user must generate ''t''-1 other
> codex32 shares, each with the same threshold value, the same identifier,
> and a distinct share index.
> The set of share indexes may be chosen arbitrarily.
> The payload of each of these codex32 shares is chosen uniformly at random
> such that it has the same length as the payload of the codex32 secret.
> For each share, a valid checksum must be generated in accordance with the
> Checksum section.
>
> The codex32 secret and the ''t''-1 codex32 shares form a set of ''t''
> valid codex32 strings from which additional shares can be derived as
> described above.
>
> ====For a fresh master seed====
>
> In the case that the user wishes to generate a fresh master seed, the user
> chooses a threshold value ''t'' and an identifier, then generates ''t''
> random codex32 shares, using the generation procedure from the previous
> section.
> As before, each share must have the same threshold value ''t'', the same
> identifier, and a distinct share index.
>
> With this set of ''t'' codex32 shares, new shares can be derived as
> discussed above. This process generates a fresh master seed, whose value
> can be retrieved by running the recovery process on any ''t'' of these
> shares.
>
> ===Long codex32 Strings===
>
> The 13 character checksum design only supports up to 80 data characters.
> Excluding the threshold, identifier and index characters, this limits the
> payload to 74 characters or 46 bytes.
> While this is enough to support the 32-byte advised size of BIP-0032
> master seeds, BIP-0032 allows seeds to be up to 64 bytes in size.
> We define a long codex32 string format to support these longer seeds by
> defining an alternative checksum.
>
> <source lang="python">
> MS32_LONG_CONST = 0x43381e570bf4798ab26
>
> def ms32_long_polymod(values):
> GEN = [
> 0x3d59d273535ea62d897,
> 0x7a9becb6361c6c51507,
> 0x543f9b7e6c38d8a2a0e,
> 0x0c577eaeccf1990d13c,
> 0x1887f74f8dc71b10651,
> ]
> residue = 0x23181b3
> for v in values:
> b = (residue >> 70)
> residue = (residue & 0x3fffffffffffffffff) << 5 ^ v
> for i in range(5):
> residue ^= GEN[i] if ((b >> i) & 1) else 0
> return residue
>
> def ms32_verify_long_checksum(data):
> return ms32_long_polymod(data) == MS32_LONG_CONST
>
> def ms32_create_long_checksum(data):
> values = data
> polymod = ms32_long_polymod(values + [0] * 15) ^ MS32_LONG_CONST
> return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]
> </source>
>
> A long codex32 string follows the same specification as a regular codex32
> string with the following changes.
>
> * The payload is a sequence of between 75 and 103 Bech32 characters.
> * The checksum consists of 15 Bech32 characters as defined above.
>
> A codex32 string with a data part of 94 or 95 characters is never legal as
> a regular codex32 string is limited to 93 data characters and a long
> codex32 string is at least 96 characters.
>
> Generation of long shares and recovery of the master seed from long shares
> proceeds in exactly the same way as for regular shares with the
> <code>ms32_interpolate</code> function.
>
> The long checksum is designed to be an error correcting code that can
> correct up to 4 character substitutions, up to 8 unreadable characters
> (called erasures), or up to 15 consecutive erasures.
> As with regular checksums we do not specify how an implementation should
> implement error correction, and all our recommendations for error
> correction of regular codex32 strings also apply to long codex32 strings.
>
> ==Rationale==
>
> This scheme is based on the observation that the Lagrange interpolation of
> valid codewords in a BCH code will always be a valid codeword.
> This means that derived shares will always have valid checksum, and a
> sufficient threshold of shares with valid checksums will derive a secret
> with a valid checksum.
>
> The header system is also compatible with Lagrange interpolation, meaning
> all derived shares will have the same identifier and will have the
> appropriate share index.
> This fact allows the header data to be covered by the checksum.
>
> The checksum size and identifier size have been chosen so that the
> encoding of 128-bit seeds and shares fit within 48 characters.
> This is a standard size for many common seed storage formats, which has
> been popularized by the 12 four-letter word format of the BIP-0039 mnemonic.
>
> The 13 character checksum is adequate to correct 4 errors in up to 93
> characters (80 characters of data and 13 characters of the checksum). This
> is somewhat better quality than the checksum used in SLIP-0039.
>
> For 256-bit seeds and shares our strings are 74 characters, which fits
> into the 96 character format of the 24 four-letter word format of the
> BIP-0039 mnemonic, with plenty of room to spare.
>
> A longer checksum is needed to support up to 512-bit seeds, the longest
> seed length specified in BIP-0032, as the 13 character checksum isn't
> adequate for more than 80 data characters.
> While we could use the 15 character checksum for both cases, we prefer to
> keep the strings as short as possible for the more common cases of 128-bit
> and 256-bit master seeds.
> We only guarantee to correct 4 characters no matter how long the string is.
> Longer strings mean more chances for transcription errors, so shorter
> strings are better.
>
> The longest data part using the regular 13 character checksum is 93
> characters and corresponds to a 400-bit secret.
> At this length, the prefix <code>MS1</code> is not covered by the checksum.
> This is acceptable because the checksum scheme itself requires you to know
> that the <code>MS1</code> prefix is being used in the first place.
> If the prefix is damaged and a user is guessing that the data might be
> using this scheme, then the user can enter the available data explicitly
> using the suspected <code>MS1</code> prefix.
>
> ==Backwards Compatibility==
>
> codex32 is an alternative to BIP-0039 and SLIP-0039.
> It is technically possible to derive the BIP32 master seed from seed words
> encoded in one of these schemes, and then to encode this seed in codex32.
> For BIP-0039 this process is irreversible, since it involves hashing the
> original words.
> Furthermore, the resulting seed will be 512 bits long, which may be too
> large to be safely and conveniently handled.
>
> SLIP-0039 seed words can be reversibly converted to master seeds, so it is
> possible to interconvert between SLIP-0039 and codex32.
> However, SLIP-0039 '''shares''' cannot be converted to codex32 shares
> because the two schemes use a different underlying field.
>
> The authors of this BIP do not recommend interconversion.
> Instead, users who wish to switch to codex32 should generate a fresh seed
> and sweep their coins.
>
> ==Reference Implementation==
>
> * [https://secretcodex32.com/docs/2023-02-14--bw.ps Reference PostScript
> Implementation]
> * FIXME add Python implementation
> * FIXME add Rust implementation
>
> ==Test Vectors==
>
> ===Test vector 1===
>
> This example shows the codex32 format, when used without splitting the
> secret into any shares.
> The data part contains 26 Bech32 characters, which corresponds to 130
> bits. We truncate the last two bits in order to obtain a 128-bit master
> seed.
>
> codex32 secret (Bech32):
> <code>ms10testsxxxxxxxxxxxxxxxxxxxxxxxxxx4nzvca9cmczlw</code>
>
> Master secret (hex): <code>318c6318c6318c6318c6318c6318c631</code>
>
> * human-readable part: <code>ms</code>
> * separator: <code>1</code>
> * k value: <code>0</code> (no secret splitting)
> * identifier: <code>test</code>
> * share index: <code>s</code> (the secret)
> * data: <code>xxxxxxxxxxxxxxxxxxxxxxxxxx</code>
> * checksum: <code>4nzvca9cmczlw</code>
>
> ===Test vector 2===
>
> This example shows generating a new master seed using "random" codex32
> shares, as well as deriving an additional codex32 share, using ''k''=2 and
> an identifier of <code>NAME</code>.
> Although codex32 strings are canonically all lowercase, it's also valid to
> use all uppercase.
>
> Share with index <code>A</code>:
> <code>MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP870HKKQRM</code>
>
> Share with index <code>C</code>:
> <code>MS12NAMECACDEFGHJKLMNPQRSTUVWXYZ023FTR2GDZMPY6PN</code>
>
> * Derived share with index <code>D</code>:
> <code>MS12NAMEDLL4F8JLH4E5VDVULDLFXU2JHDNLSM97XVENRXEG</code>
> * Secret share with index <code>S</code>:
> <code>MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW</code>
> * Master secret (hex): <code>d1808e096b35b209ca12132b264662a5</code>
>
> Note that per BIP-0173, the lowercase form is used when determining a
> character's value for checksum purposes.
> In particular, given an all uppercase codex32 string, we still use
> lowercase <code>ms</code> as the human-readable part during checksum
> construction.
>
> ===Test vector 3===
>
> This example shows splitting an existing 128-bit master seed into "random"
> codex32 shares, using ''k''=3 and an identifier of <code>cash</code>.
> We appended two zero bits in order to obtain 26 Bech32 characters (130
> bits of data) from the 128-bit master seed.
>
> Master secret (hex): <code>ffeeddccbbaa99887766554433221100</code>
>
> Secret share with index <code>s</code>:
> <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln</code>
>
> Share with index <code>a</code>:
> <code>ms13casha320zyxwvutsrqpnmlkjhgfedca2a8d0zehn8a0t</code>
>
> Share with index <code>c</code>:
> <code>ms13cashcacdefghjklmnpqrstuvwxyz023949xq35my48dr</code>
>
> * Derived share with index <code>d</code>:
> <code>ms13cashd0wsedstcdcts64cd7wvy4m90lm28w4ffupqs7rm</code>
> * Derived share with index <code>e</code>:
> <code>ms13casheekgpemxzshcrmqhaydlp6yhms3ws7320xyxsar9</code>
> * Derived share with index <code>f</code>:
> <code>ms13cashf8jh6sdrkpyrsp5ut94pj8ktehhw2hfvyrj48704</code>
>
> Any three of the five shares among <code>acdef</code> can be used to
> recover the secret.
>
> Note that the choice to append two zero bits was arbitrary, and any of the
> following four secret shares would have been valid choices.
> However, each choice would have resulted in a different set of derived
> shares.
>
> * <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln</code>
> * <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qpte35dvzkjpt0r</code>
> * <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qzfatvdwq5692k6</code>
> * <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qrsx6ydhed97jx2</code>
>
> ===Test vector 4===
>
> This example shows converting a 256-bit secret into a codex32 secret,
> without splitting the secret into any shares.
> We appended four zero bits in order to obtain 52 Bech32 characters (260
> bits of data) from the 256-bit secret.
>
> 256-bit secret (hex):
> <code>ffeeddccbbaa99887766554433221100ffeeddccbbaa99887766554433221100</code>
>
> * codex32 secret:
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqqtum9pgv99ycma</code>
>
> Note that the choice to append four zero bits was arbitrary, and any of
> the following sixteen codex32 secrets would have been valid:
>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqqtum9pgv99ycma</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqpj82dp34u6lqtd</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqzsrs4pnh7jmpj5</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqrfcpap2w8dqezy</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqy5tdvphn6znrf0</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq9dsuypw2ragmel</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqx05xupvgp4v6qx</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq8k0h5p43c2hzsk</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqgum7hplmjtr8ks</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqf9q0lpxzt5clxq</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq28y48pyqfuu7le</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqt7ly0paesr8x0f</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqvrvg7pqydv5uyz</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqd6hekpea5n0y5j</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqwcnrwpmlkmt9dt</code>
> *
> <code>ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyq0pgjxpzx0ysaam</code>
>
> ===Test vector 5===
>
> This example shows generating a new 512-bit master seed using "random"
> codex32 characters and appending a checksum.
> The payload contains 103 Bech32 characters, which corresponds to 515 bits.
> The last three bits are discarded when converting to a 512-bit master seed.
>
> This is an example of a '''Long codex32 String'''.
>
> * Secret share with index <code>S</code>:
> <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
> * Master secret (hex):
> <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
>
> ==Appendix==
>
> ===Mathematical Companion===
>
> Below we use the Bech32 character set to denote values in GF[32].
> In Bech32, the letter <code>Q</code> denotes zero and the letter
> <code>P</code> denotes one.
> The digits <code>0</code> and <code>2</code> through <code>9</code> do
> ''not'' denote their numeric values.
> They are simply elements of GF[32].
>
> The generating polynomial for our BCH code is as follows.
>
> We extend GF[32] to GF[1024] by adjoining a primitive cube root of unity,
> <code>ζ</code>, satisfying <code>ζ^2 = ζ + P</code>.
>
> We select <code>β := G ζ</code> which has order 93, and construct the
> product <code>(x - β^i)</code> for <code>i</code> in <code>{17, 20, 46, 49,
> 52, 77, 78, 79, 80, 81, 82, 83, 84}</code>.
> The resulting polynomial is our generating polynomial for our 13 character
> checksum:
>
> x^13 + E x^12 + M x^11 + 3 x^10 + G x^9 + Q x^8 + E x^7 + E x^6 + E
> x^5 + L x^4 + M x^3 + C x^2 + S x + S
>
> For our long checksum, we select <code>γ := E + X ζ</code>, which has
> order 1023, and construct the product <code>(x - γ^i)</code> for
> <code>i</code> in <code>{32, 64, 96, 895, 927, 959, 991, 1019, 1020, 1021,
> 1022, 1023, 1024, 1025, 1026}</code>.
> The resulting polynomial is our generating polynomial for our 15 character
> checksum for long strings:
>
> x^15 + 0 x^14 + 2 x^13 + E x^12 + 6 x^11 + F x^10 + E x^9 + 4 x^8 + X
> x^7 + H x^6 + 4 x^5 + X x^4 + 9 x^3 + K x^2 + Y x^1 + H
>
> (Reminder: the character <code>0</code> does ''not'' denote the zero of
> the field.)
>
> -----END BIP-----
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
--
Best Regards / S pozdravom,
Pavol "Stick" Rusnak
Co-Founder, SatoshiLabs
[-- Attachment #2: Type: text/html, Size: 66959 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-16 11:50 ` Pavol Rusnak
@ 2023-02-16 13:49 ` Andrew Poelstra
2023-02-19 20:13 ` David A. Harding
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Poelstra @ 2023-02-16 13:49 UTC (permalink / raw)
To: Pavol Rusnak, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 3397 bytes --]
On Thu, Feb 16, 2023 at 12:50:12PM +0100, Pavol Rusnak via bitcoin-dev wrote:
> Hi!
>
> The BIP states that its only advantage over SLIP-0039, which has been used
> in production for nearly three years (in at at least 3 SW/HW wallet
> implementations), is that it aims to be simple enough for hand computation.
> However, the BIP also indicates that "details of hand computation are
> outside the scope of this standard, and implementers do not need to be
> concerned with this possibility." Therefore, I am curious about how
> significant this advantage over SLIP-0039 really is. If hand computation is
> not straightforward and there are no other substantial advantages over
> SLIP-0039, I cannot help but feel that this BIP is simply a result of
> not-invented-here syndrome, but please correct me if I am wrong.
>
In my view, the hand computation is actually the main benefit of this
scheme. The process *is* straightforward, but tedious enough (and the
security benefits obscure enough, though they really shouldn't be...
"computers are opaque and untrustworthy" should be a common sentiment)
that it's hard to expect more than a small absolute number of users to
actually do it.
But for the purpose of the *standard*, what is important is that it is
possible to implement and use this within a normal hww workflow. This is
important for hand-computing users who know that their coins will not
die with them (since the 'standard' has fallen into obscurity), and
important for "normal" users who have the option to seamlessly switch
over to hand computation as the BTC price goes up or the world becomes
scarier.
For what it's worth, the draft lists several benefits over SLIP-0039.
I agree that none of them are particularly strong [1], and even together
they probably wouldn't meet the threshold to take the time to write a
standard, but I assure you the motivation was not NIH :).
> Keep in mind that the encoded shares in SLIP-0039 consist of exactly 200 or
> 330 bits, both of which are divisible by 5. This makes it straightforward
> to encode them as Bech32 strings.
>
This is true! And very convenient for people who may want to simply
"layer on" the codex32 checksum/splitting logic onto their SLIP39 words.
They can use a lookup table to do the conversion, spend years or
whataever doing hand-computation on them, and then use a lookup table
to go back.
[1] One listed reason is that "a SLIP is not a BIP". I have heard people
speculate that this is one reason SLIP-0039 is not nearly as
widespread as BIP-0039, even though it is objectively a far better
standard. I'm unsure whether I believe this, but "there is no other
BIP" does seem like a good reason for BIP-0039's continued
dominance.
At the very least, it means that on BIP-0039 itself we have nothing
that we could say "supercedes" or "is recommended instead of" the
BIP. See https://github.com/bitcoin/bips/pull/1413
So it's something of an aside, but I think it would probably be good
for the ecosystem (though maybe bad for this BIP's prospects :)) if
you would request a BIP number for SLIP-0039.
--
Andrew Poelstra
Director of Research, Blockstream
Email: apoelstra at wpsoftware.net
Web: https://www.wpsoftware.net/andrew
The sun is always shining in space
-Justin Lewis-Webster
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-16 13:49 ` Andrew Poelstra
@ 2023-02-19 20:13 ` David A. Harding
2023-02-19 22:12 ` Andrew Poelstra
2023-02-22 17:24 ` Russell O'Connor
0 siblings, 2 replies; 15+ messages in thread
From: David A. Harding @ 2023-02-19 20:13 UTC (permalink / raw)
To: Andrew Poelstra, Bitcoin Protocol Discussion
On 2023-02-16 03:49, Andrew Poelstra via bitcoin-dev wrote:
> the draft lists several benefits over SLIP-0039.
The only benefit over SLIP39 that I see explicitly mentioned in the
draft BIP is "simple enough for hand computation". In the FAQ[1] on the
project's website, I see some additional reasons:
| This scheme is essentially the same as SLIP39, with the following
differences:
|
| - The checksum is longer, slightly stronger, and designed to be
| computable by hand.
|
| - Our encoding is more compact, giving us room for a bit of more
| metadata, which is also designed to be readable by hand.
|
| - Unlike SLIP39, we do not support passphrases or hardening of any
| form.
|
| - Unlike SLIP39, we have no hardware wallet support. But we hope that
| will change!
From having perused the extended documentation myself, I think I would
personally note the following differences.
- Alphabet: Codex32 uses the bech32 alphabet rather than SLIP39's
alphabet consisting of English words. The benefit to human-language
words is easier memorization for those proficient in the particular
language (in this case, SLIP39 only allows the use of English). A
disadvantage, IMO, is that it encourages the practice of memorization
(which does have a few advantages but also a lot of drawbacks).
Interestingly, Codex32 addresses what I think is the main problems of
memorization: difficult-to-prove successful recollection. Someone who
wants to reliably keep seed-related material only in their head
needs to practice recalling it on a regular basis, but for BIP39,
SLIP39, Aezeed, etc... there's no way for them to confirm they
successfully recalled it short of going through the entire recovery
process; they probably just judge how confident they feel about the
recollection and assume that feeling like they recalled it correctly
is the same thing as recalling it correctly.
Codex32 allows the individual to periodically perform their
recollection on paper in a private room without electronics and use
nothing but a pen and some loookup tables (or a paper device) to
verify that they recalled the string correctly (and its checksum can
help with correcting up to several errors, although you might need a
computer for error location and correction assistance).
- Hierarchy: Codex32 does not natively provide support for nested SSSS
whereas SLIP39 does. E.g., in SLIP39, you can require 2-of-3 for
{me, family, friends} where me is 2-of-3 {fire_safe, bank_safe,
buried_in_woods}, family is 1-of-3 {alice, bob, carol}, and friends
are 2-of-5 {d, e, f, g, h}. I assume you can do the same with Codex32
by using the share for one level as the secret for the next level,
although this is not described in the protocol.
- Versioning: Codex32's metadata can store version information for
wallets that use implicit BIP32 paths (e.g. BIP44/49/84/86), although
this would cut into the space available for users to set their own
metadata and it is not specified in the draft BIP. SLIP39 also
doesn't specify anything about implicit path versioning and, AFAICT,
doesn't have any room to store such metadata without reducing seed
entropy.
- Plausible deniability dummy wallets: Codex32 doesn't support this;
SLIP39 does. Much has been written by other people about whether
dummy wallets are a good idea or not, with strong opinions on both
sides, so maybe we can just leave it at that.
---
When I first saw the post about this, it was unclear to me that it was a
serious project, but I've become increasingly interested as I researched
it. I'm not personally that interested in generating entropy from dice
or encoding shares by hand---it's already imperative that I acquire a
trustworthy computer and load it with trustworthy software in order to
use my seed securely, so I might as well have it generate my seeds and
my
recovery codes for me.
What really did catch my attention, but which was kind of buried in the
project documentation, is the ability to verify the integrity of each
share independently without using a computer. For example, if I store a
share with some relative who lives thousands of kilometers away, I'll be
able to take that share out of its tamper-evident bag on my annual
holiday visit, verify that I can still read it accurately by validating
its checksum, and put it into a new bag for another year. For this
procedure, I don't need to bring copies of any of my other shares,
allowing them (and my seed) to stay safe.
---
I do have one question after watching an excellent video[2] about the
motivation for this system. In the video, one of the threat models
described is a disarrangement of the words in a metal backup system.
The implication seems to be that this would be an accidental
disarrangement, which obviously the Codex32 checksum would catch during
periodic offline verification. But what about deliberate modification
of a recovery code? For example, Bob doesn't keep his seed loaded on
any computer; it only exists in Codex32 shares which Bob plans to
combine together in 20 years when he retires, although he makes regular
deposits to the pubkeys derived from the seed's master xpub. Mallory is
able to obtain access to Bob's shares, allowing her to immediately steal
his current funds---but also allowing her to replace them with
similar-looking
shares with the same metadata and valid checksums so that Bob
continues making deposits to the wallet.
I'm curious about whether there's a way to prevent this attack without
otherwise compromising the properties of the code? For example, some
extra data that Bob can carry around (or memorize) for verifying the
shares haven't changed, but which is not otherwise needed for recovery
(so there's no problem if it's lost).
Thanks,
-Dave
[1] https://secretcodex32.com/faq/index.html
[2]
https://www.youtube.com/watch?v=kf48oPoiHX0&list=PLyOGyBytgcuQLi9DC5g88DOEGnqBDPmq1&index=2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-19 20:13 ` David A. Harding
@ 2023-02-19 22:12 ` Andrew Poelstra
2023-02-19 23:05 ` Christopher Allen
` (2 more replies)
2023-02-22 17:24 ` Russell O'Connor
1 sibling, 3 replies; 15+ messages in thread
From: Andrew Poelstra @ 2023-02-19 22:12 UTC (permalink / raw)
To: David A. Harding; +Cc: Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 5425 bytes --]
On Sun, Feb 19, 2023 at 10:13:33AM -1000, David A. Harding wrote:
> On 2023-02-16 03:49, Andrew Poelstra via bitcoin-dev wrote:
> > the draft lists several benefits over SLIP-0039.
>
> The only benefit over SLIP39 that I see explicitly mentioned in the
> draft BIP is "simple enough for hand computation". In the FAQ[1] on the
> project's website, I see some additional reasons:
>
Oh, you're right! I think we removed this text in one of our revisions.
I'll see if it makes sense to put it back.
> | This scheme is essentially the same as SLIP39, with the following
> differences:
> |
> | - The checksum is longer, slightly stronger, and designed to be
> | computable by hand.
> |
> | - Our encoding is more compact, giving us room for a bit of more
> | metadata, which is also designed to be readable by hand.
> |
> | - Unlike SLIP39, we do not support passphrases or hardening of any
> | form.
> |
> | - Unlike SLIP39, we have no hardware wallet support. But we hope that
> | will change!
>
These are roughly the benefits -- a more compact encoding which is
always a fixed width. We also list "not supporting features such as
passphrases" as a benefit, for users who don't need/want this.
> <snip>
>
> When I first saw the post about this, it was unclear to me that it was a
> serious project, but I've become increasingly interested as I researched
> it. I'm not personally that interested in generating entropy from dice
> or encoding shares by hand---it's already imperative that I acquire a
> trustworthy computer and load it with trustworthy software in order to
> use my seed securely, so I might as well have it generate my seeds and my
> recovery codes for me.
>
Yes, we've been a bit coy about how serious this project is, because on
its face it's such a silly thing. But for my part, I've been using it
for real coins for over a year and I consider it to be serious.
> What really did catch my attention, but which was kind of buried in the
> project documentation, is the ability to verify the integrity of each
> share independently without using a computer. For example, if I store a
> share with some relative who lives thousands of kilometers away, I'll be
> able to take that share out of its tamper-evident bag on my annual
> holiday visit, verify that I can still read it accurately by validating
> its checksum, and put it into a new bag for another year. For this
> procedure, I don't need to bring copies of any of my other shares,
> allowing them (and my seed) to stay safe.
>
This is good feedback. I strongly agree that this is the big selling
point for this -- that you can vet shares/seeds which *aren't* being
actively used, without exposing them to the sorts of threats associated
with active use.
We should make this more prominent in the BIP motivation.
>
> I do have one question after watching an excellent video[2] about the
> motivation for this system. In the video, one of the threat models
> described is a disarrangement of the words in a metal backup system.
> The implication seems to be that this would be an accidental
> disarrangement, which obviously the Codex32 checksum would catch during
> periodic offline verification. But what about deliberate modification
> of a recovery code? For example, Bob doesn't keep his seed loaded on
> any computer; it only exists in Codex32 shares which Bob plans to
> combine together in 20 years when he retires, although he makes regular
> deposits to the pubkeys derived from the seed's master xpub. Mallory is
> able to obtain access to Bob's shares, allowing her to immediately steal
> his current funds---but also allowing her to replace them with
> similar-looking
> shares with the same metadata and valid checksums so that Bob
> continues making deposits to the wallet.
>
> I'm curious about whether there's a way to prevent this attack without
> otherwise compromising the properties of the code? For example, some
> extra data that Bob can carry around (or memorize) for verifying the
> shares haven't changed, but which is not otherwise needed for recovery
> (so there's no problem if it's lost).
>
Unfortunately not, as near as I can tell ... one way to think of this is
that Alice can flip a lot of random tiles then "error correct" it to get
a new valid, but incorrect, seed. So as long as we support error
correction it'll be possible to wreck seeds in this way.
It's actually even worse than this ... as long as there's a clearly
defined "checksum" at the end of a share, Alice will be able to mangele
tiles and then just re-compute the checksum at the end.
So what we really need to prevent this is something like a MAC: where
Bob has a secret value which gets input into the checksum somehow, which
Alice can't create valid checksums without knowing. Unfortunately I
don't see any way to do this with linear codes. With a hash-based
"checksum" a la BIP39 it would definitely be possible, but of course,
not hand-computable.
BTW, to execute this attack Alice doesn't need to compromise *all* the
shares. Just enough that Bob no longer has threshold-many un-tampered
ones left.
--
Andrew Poelstra
Director of Research, Blockstream
Email: apoelstra at wpsoftware.net
Web: https://www.wpsoftware.net/andrew
The sun is always shining in space
-Justin Lewis-Webster
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-19 22:12 ` Andrew Poelstra
@ 2023-02-19 23:05 ` Christopher Allen
2023-02-20 0:52 ` Russell O'Connor
2023-02-22 16:29 ` Peter Todd
2 siblings, 0 replies; 15+ messages in thread
From: Christopher Allen @ 2023-02-19 23:05 UTC (permalink / raw)
To: Andrew Poelstra, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]
An easy but possibly very important tip:
If you use only UPPER CASE alpha and numbers in codex32, and avoid most
punctuation, it makes QR rendering of it significantly smaller. This is
because the QR code to the ISO SPEC, when seeing lowercase, assumes the
value is binary, then converts it to a two byte value. If instead, the
codex32 is all upper, and it uses only the 45 allowed characters (see
https://www.thonky.com/qr-code-tutorial/alphanumeric-table) , it will leave
it single byte and try to compress it. Of course it doesn’t compress well,
but that is OK because it at least didn’t double the size first.
See our research on this topic at
https://github.com/BlockchainCommons/Research/blob/master/papers/bcr-2020-003-uri-binary-compatibility.md
A superior QR codec can do better (see our
https://github.com/BlockchainCommons/QRCodeGenerator or
https://www.nayuki.io/page/qr-code-generator-library) but many platforms
and more basic QR codecs will double the size of the QR if you have any
lower case.
— Christopher Allen
[-- Attachment #2: Type: text/html, Size: 1662 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-19 22:12 ` Andrew Poelstra
2023-02-19 23:05 ` Christopher Allen
@ 2023-02-20 0:52 ` Russell O'Connor
2023-02-22 16:29 ` Peter Todd
2 siblings, 0 replies; 15+ messages in thread
From: Russell O'Connor @ 2023-02-20 0:52 UTC (permalink / raw)
To: Andrew Poelstra, David A. Harding, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 2268 bytes --]
On Sun, Feb 19, 2023 at 5:13 PM Andrew Poelstra via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> On Sun, Feb 19, 2023 at 10:13:33AM -1000, David A. Harding wrote:
>
> I'm curious about whether there's a way to prevent this attack without
> > otherwise compromising the properties of the code? For example, some
> > extra data that Bob can carry around (or memorize) for verifying the
> > shares haven't changed, but which is not otherwise needed for recovery
> > (so there's no problem if it's lost).
> >
>
> Unfortunately not, as near as I can tell ... one way to think of this is
> that Alice can flip a lot of random tiles then "error correct" it to get
> a new valid, but incorrect, seed. So as long as we support error
> correction it'll be possible to wreck seeds in this way.
>
> It's actually even worse than this ... as long as there's a clearly
> defined "checksum" at the end of a share, Alice will be able to mangele
> tiles and then just re-compute the checksum at the end.
>
> So what we really need to prevent this is something like a MAC: where
> Bob has a secret value which gets input into the checksum somehow, which
> Alice can't create valid checksums without knowing. Unfortunately I
> don't see any way to do this with linear codes. With a hash-based
> "checksum" a la BIP39 it would definitely be possible, but of course,
> not hand-computable.
>
Speaking off the cuff and as a non-cryptographer (i.e do NOT rush off and
do this without any vetting) but Christopher Allen once referred me to an
cypher tile set called LS47 <https://gitea.blesmrt.net/exa/ls47>. If we
set aside the cypertext, I suspect we can form a MAC by recording some
random initial tile configuration, running the LS47 algorithm, and
recording the final tile configuration. These records are not sensitive as
(presumably!) the share data is not recoverable from just knowing these two
configurations. So one can keep these records with you, digitally sign
them or whatever, and then take them to your share on a regular basis to
rerun the LS47 algorithm to see if you still get the same final state from
the initial state.
Perhaps something more specific to Bech32 could be designed, but otherwise
this (alleged) MAC process isn't Codex32 specific.
[-- Attachment #2: Type: text/html, Size: 2972 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-16 2:16 [bitcoin-dev] Codex32 Russell O'Connor
2023-02-16 11:50 ` Pavol Rusnak
@ 2023-02-20 18:44 ` Andrew Poelstra
2023-02-20 18:48 ` Pavol Rusnak
1 sibling, 1 reply; 15+ messages in thread
From: Andrew Poelstra @ 2023-02-20 18:44 UTC (permalink / raw)
To: Russell O'Connor, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 1340 bytes --]
On Wed, Feb 15, 2023 at 09:16:02PM -0500, Russell O'Connor via bitcoin-dev wrote:
> I've been asked by Dr. Curr and Professor Snead to forward this message to
> this mailing list, as it may be of general interest to Bitcoin users.
>
> <snip>
>
I have opened a PR to the BIPs repo for this scheme: https://github.com/bitcoin/bips/pull/1425
Thanks very much to Pavol Rusnak, David Harding, and Christopher Allen
for their comments on this list. We've updated the draft text to try to
address your concerns. In particular:
* Added more text to "Motivation" distinguishing the scheme from SLIP-39
* Added more details to "Rationale" about error correction capabilities
of the code, and to explain that the code does not defend against
malicious errors
* Added a note to use uppercase for QR codes
* Rearranged and clarified the "creating shares" instructions
* Added text about hand-computation, in particular hand-computation
of share verification, to "Motivation".
If any of you would like to be listed under Acknowledgements, please let
me know here or on the PR.
Cheers
Andrew
--
Andrew Poelstra
Director of Research, Blockstream
Email: apoelstra at wpsoftware.net
Web: https://www.wpsoftware.net/andrew
The sun is always shining in space
-Justin Lewis-Webster
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-20 18:44 ` Andrew Poelstra
@ 2023-02-20 18:48 ` Pavol Rusnak
0 siblings, 0 replies; 15+ messages in thread
From: Pavol Rusnak @ 2023-02-20 18:48 UTC (permalink / raw)
To: Andrew Poelstra, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 194 bytes --]
Thanks Andrew!
New draft makes it much more obvious what are the differences between
Codex32 and SLIP-0039 scheme.
--
Best Regards / S pozdravom,
Pavol "Stick" Rusnak
Co-Founder, SatoshiLabs
[-- Attachment #2: Type: text/html, Size: 462 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-19 22:12 ` Andrew Poelstra
2023-02-19 23:05 ` Christopher Allen
2023-02-20 0:52 ` Russell O'Connor
@ 2023-02-22 16:29 ` Peter Todd
2023-02-22 19:01 ` Russell O'Connor
2 siblings, 1 reply; 15+ messages in thread
From: Peter Todd @ 2023-02-22 16:29 UTC (permalink / raw)
To: Andrew Poelstra, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]
On Sun, Feb 19, 2023 at 10:12:51PM +0000, Andrew Poelstra via bitcoin-dev wrote:
> > What really did catch my attention, but which was kind of buried in the
> > project documentation, is the ability to verify the integrity of each
> > share independently without using a computer. For example, if I store a
> > share with some relative who lives thousands of kilometers away, I'll be
> > able to take that share out of its tamper-evident bag on my annual
> > holiday visit, verify that I can still read it accurately by validating
> > its checksum, and put it into a new bag for another year. For this
> > procedure, I don't need to bring copies of any of my other shares,
> > allowing them (and my seed) to stay safe.
> >
>
> This is good feedback. I strongly agree that this is the big selling
> point for this -- that you can vet shares/seeds which *aren't* being
> actively used, without exposing them to the sorts of threats associated
> with active use.
>
> We should make this more prominent in the BIP motivation.
I don't think that use-case is a good selling point. The checksum that Codex32
uses is much more complex than necessary if you are simply verifying a share by
itself.
A *much* simpler approach would be to use a simple mod N = 0 checksum, either
by creating the seed such that each share passes, or by just storing an
additional word/symbol with the seed in such a way that sum(words) mod N = 0
passes. This approach is not only possible to compute by hand with a
word/symbol->number lookup table, and pen and paper or a calculator. It's so
simple they could probably *remember* how to do it themselves.
Secondly, if all shares have mod N checksums, it may be sufficient for everyone
to write down the checksums of the *other* shares, to verify they are the
correct ones and a different (otherwise correct) share hasn't accidentally been
substituted.
Indeed, with some brute forcing and small checksums, I'd expect it to be
mathematically possible to generate Shamir's secret sharing shards such that
every shard can share the *same* checksum. In which case the share verification
procedure would be to simply ask every share holder to verify the checksum
manually using the mod N procedure, and then verify that each share holder has
the same checksum. This would be less error prone in terms of leaking
information accidentally if the checksum was obviously *not* part of the share:
eg by encoding the share with words, and the checksum with a number.
Obviously, small checksums aren't fool proof. But we're probably better off
creating a relatively easy procedure with a 1-in-1000 chance of an error going
undetected than a complex procedure that people don't actually do at all.
--
https://petertodd.org 'peter'[:-1]@petertodd.org
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-19 20:13 ` David A. Harding
2023-02-19 22:12 ` Andrew Poelstra
@ 2023-02-22 17:24 ` Russell O'Connor
1 sibling, 0 replies; 15+ messages in thread
From: Russell O'Connor @ 2023-02-22 17:24 UTC (permalink / raw)
To: David A. Harding, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 6926 bytes --]
On Sun, Feb 19, 2023 at 3:13 PM David A. Harding via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> Codex32 allows the individual to periodically perform their
> recollection on paper in a private room without electronics and use
> nothing but a pen and some loookup tables (or a paper device) to
> verify that they recalled the string correctly (and its checksum can
> help with correcting up to several errors, although you might need a
> computer for error location and correction assistance).
>
While perhaps not entirely impossible, doing error correction by hand is
far more difficult than just the error detection process.
Fortunately, error correction can be aided by a computer needing very
little trust.
When doing hand computation of the checksum, there is an error if the final
"residue" is different from the value given in the codex32 spec.
After double checking your hand computation by redoing it, if you still get
the same incorrect residue, there is an error in your share data somewhere.
What you can do is plug in the residue that you did compute into a
computer, and have it produce an error correction string.
You can then, by hand, add this error correction string to your share to
get a corrected share.
If it were the case that all types of character errors were equally likely
(as in during an error any character is equally likely to be transformed
into any other character), then the computer would gain zero information
about your actual share data. Of course, it is not in fact the case that
all transcription errors are equally likely, and so the computer can learn
a little bit about your share. The fewer errors that are made, the less
data it can recover. If you only have one character in error, then 5 bits
is the most data it can recover, and that is assuming that it can somehow
infer your error perfectly from the delta of the error correction, which
isn't actually going to be the case.
Of course, a single share from a split secret has no information about your
master seed (the master seed is hidden in the correlation between different
shares). So learning partial information about one share isn't going to be
enough by itself to even begin compromising your master key.
This all still needs to be written up in more detail, but I figured I would
give a preview here.
- Hierarchy: Codex32 does not natively provide support for nested SSSS
> whereas SLIP39 does. E.g., in SLIP39, you can require 2-of-3 for
> {me, family, friends} where me is 2-of-3 {fire_safe, bank_safe,
> buried_in_woods}, family is 1-of-3 {alice, bob, carol}, and friends
> are 2-of-5 {d, e, f, g, h}. I assume you can do the same with Codex32
> by using the share for one level as the secret for the next level,
> although this is not described in the protocol.
>
There was a design for a second level share scheme floating around
somewhere. I'll have to dig it up. As I recall this is made slightly more
complicated by needing to incorporate share metadata (i.e. the share index)
when doing a second split, but it seemed doable at the time.
> - Versioning: Codex32's metadata can store version information for
> wallets that use implicit BIP32 paths (e.g. BIP44/49/84/86), although
> this would cut into the space available for users to set their own
> metadata and it is not specified in the draft BIP. SLIP39 also
> doesn't specify anything about implicit path versioning and, AFAICT,
> doesn't have any room to store such metadata without reducing seed
> entropy.
>
Personally, I don't consider the derivation path / descriptor data as that
sensitive, and I would recommend making wide backups of that data.
It certainly would make sense to store descriptor data alongside wherever
you keep your shares, and more places beyond that.
On the other hand, if you are trying to keep your shares innocuous somehow,
perhaps you won't be able to keep the descriptor data alongside your shares.
When I first saw the post about this, it was unclear to me that it was a
> serious project, but I've become increasingly interested as I researched
> it. I'm not personally that interested in generating entropy from dice
> or encoding shares by hand---it's already imperative that I acquire a
> trustworthy computer and load it with trustworthy software in order to
> use my seed securely, so I might as well have it generate my seeds and
> my
> recovery codes for me.
>
I do think hardware wallets are great, and overall provide a lot of
practical protection. What is notable is that once the secrets are loaded
onto a hardware wallet, as long as that wallet remains isolated, it cannot
leak any secrets.
Of course, a wallet needs to interact with the Bitcoin protocol and P2P
network, at least indirectly, in order to function, so we must break that
isolation. However, if we can limit the communication capabilities of a
hardware wallet, even a malicious wallet shouldn't be able to leak the
secret data.
There are three main vectors a hardware wallet can try to communicate that
I am aware of:
1. Compromise at seed/master secret (or in this case shares of the master
secret) during generation time.
2. Lie about public keys at public key generation time.
3. Exfiltrate data though signature data or otherwise during signature
generation time.
#3 is largely mitigated through using anti-exfil signing and air gapping
the hardware wallet (e.g. using QR codes, or using RS-232
<https://en.wikipedia.org/wiki/RS-232> if you consider that air gaping).
Using multiple wallets from different vendors doing deterministic signing
is another possibility, but I consider the method deprecated in favour of
anti-exfil.
Addressing #1 is where codex32 lies, by taking master secret handling
functions out of the hardware wallet.
My understanding is that it is difficult for digital hardware, which tries
very hard to be deterministic, to generate randomness, especially for
isolated hardware devices like hardware wallets.
Also it is hard for hardware to tell if their hardware random number
generator is broken. Generally I don't trust small pieces of isolated
digital hardware to generate master seeds, even when they are not
malicious. This may just be due to my ignorance of how they operate.
#2 seems to be the hardest issue to address. My current thinking is to
generate addresses on a diverse set of hardware and/or using public
derivations.
Perhaps an alternative to BIP-32 could be designed to make it easy to
generate zero-knowledge proofs of pubkey derivations.
Of course there are other ways for a hardware wallets to exfiltrate data:
The wallet can blink lights, make noise, or it can try to use radio
broadcasts, etc. These other attack vectors seem to require physically
local support, which is a fairly big hurdle to overcome. I suppose even
these vectors could be mitigated through various levels of tinfoil.
[-- Attachment #2: Type: text/html, Size: 8412 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-22 16:29 ` Peter Todd
@ 2023-02-22 19:01 ` Russell O'Connor
2023-02-23 3:30 ` Russell O'Connor
0 siblings, 1 reply; 15+ messages in thread
From: Russell O'Connor @ 2023-02-22 19:01 UTC (permalink / raw)
To: Peter Todd, Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 4140 bytes --]
After some poking around at the math, I do see that the 13 character
generator (for regular sized shares) is reasonably "smooth", having roots
at T{11}, S{16}, and C{24}.
This means we could build a "quick check" worksheet to evaluate the string
modulo (x - T) to verify a 5 bit checksum, whose operation would be similar
to the existing checksum worksheet in structure but significantly less work.
Perhaps 5 bits is too short, and it is more reasonable working modulo (x -
T)*(x - S) to get a 10 bit checksum. A worksheet for a 15 bit checksum is
also an option, and possibly others well depending on the size of the other
factors. I think this process is would be about as simple as any other
comparable hand operated checksum over the bech32 alphabet would be.
I haven't looked into the smoothness of the long generator for seeds that
are greater than 400 bits. If it isn't smooth and smoother options are
available, perhaps it should be changed.
On Wed, Feb 22, 2023 at 11:29 AM Peter Todd via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:
> On Sun, Feb 19, 2023 at 10:12:51PM +0000, Andrew Poelstra via bitcoin-dev
> wrote:
> > > What really did catch my attention, but which was kind of buried in the
> > > project documentation, is the ability to verify the integrity of each
> > > share independently without using a computer. For example, if I store
> a
> > > share with some relative who lives thousands of kilometers away, I'll
> be
> > > able to take that share out of its tamper-evident bag on my annual
> > > holiday visit, verify that I can still read it accurately by validating
> > > its checksum, and put it into a new bag for another year. For this
> > > procedure, I don't need to bring copies of any of my other shares,
> > > allowing them (and my seed) to stay safe.
> > >
> >
> > This is good feedback. I strongly agree that this is the big selling
> > point for this -- that you can vet shares/seeds which *aren't* being
> > actively used, without exposing them to the sorts of threats associated
> > with active use.
> >
> > We should make this more prominent in the BIP motivation.
>
> I don't think that use-case is a good selling point. The checksum that
> Codex32
> uses is much more complex than necessary if you are simply verifying a
> share by
> itself.
>
> A *much* simpler approach would be to use a simple mod N = 0 checksum,
> either
> by creating the seed such that each share passes, or by just storing an
> additional word/symbol with the seed in such a way that sum(words) mod N =
> 0
> passes. This approach is not only possible to compute by hand with a
> word/symbol->number lookup table, and pen and paper or a calculator. It's
> so
> simple they could probably *remember* how to do it themselves.
>
>
> Secondly, if all shares have mod N checksums, it may be sufficient for
> everyone
> to write down the checksums of the *other* shares, to verify they are the
> correct ones and a different (otherwise correct) share hasn't accidentally
> been
> substituted.
>
> Indeed, with some brute forcing and small checksums, I'd expect it to be
> mathematically possible to generate Shamir's secret sharing shards such
> that
> every shard can share the *same* checksum. In which case the share
> verification
> procedure would be to simply ask every share holder to verify the checksum
> manually using the mod N procedure, and then verify that each share holder
> has
> the same checksum. This would be less error prone in terms of leaking
> information accidentally if the checksum was obviously *not* part of the
> share:
> eg by encoding the share with words, and the checksum with a number.
>
> Obviously, small checksums aren't fool proof. But we're probably better off
> creating a relatively easy procedure with a 1-in-1000 chance of an error
> going
> undetected than a complex procedure that people don't actually do at all.
>
> --
> https://petertodd.org 'peter'[:-1]@petertodd.org
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
[-- Attachment #2: Type: text/html, Size: 5183 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-22 19:01 ` Russell O'Connor
@ 2023-02-23 3:30 ` Russell O'Connor
2023-02-23 16:36 ` Russell O'Connor
2023-02-23 18:26 ` Russell O'Connor
0 siblings, 2 replies; 15+ messages in thread
From: Russell O'Connor @ 2023-02-23 3:30 UTC (permalink / raw)
To: Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 7282 bytes --]
After some consultation, I now see that generators for all degree 2 BCH
codes, such as ours, are smooth and factor into quadratic and linear
components.
Anyhow the upshot of all this is that you can perform a "quickcheck"
verification of the codex32 strings for whatever size of verification you
want to do, 1 character, 2 characters, 3 characters, upto the full 13
characters. Each of these partial verifications will have BCH properties.
A 1 character quickchecksum will guarantee to detect any 1 character
error. A 3 character quickchecksum will guarantee to detect any 2
character error, etc. There remains a 1 in 32^n chance of failing to
detect larger numbers of errors where n is the size of your quickcheck.
To illustrate, let's consider a quickcheck of size 2. This can detect any
1 character error and will only have a 1/1024 chance of failing to detect
other random errors. Let's take the second test vector as our example: "
MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP870HKKQRM"
You start in a specified initial state with a pair of bech32 characters.
For MS1 strings and a size 2 quickcheck it would be the pair of Bech32
characters 'AS'.
Next we "add" the first character after the prefix, which is '2' by using
the addition volvelle or lookup table. "Adding" '2' to 'S' yields '6' and
our state becomes 'A6'.
Next we have to look up 'A6' in a lookup table. This table is too big to
fit in the margin of this email, so I will have to omit it. But it would
have an entry mapping 'A6' -> 'QM'. Our state becomes 'QM'
From this point we have an even number of remaining characters in the input
string and we can work 2 characters at a time. We "add" the next two data
characters from our string, which are 'NA'. Again, using the volvelle or
lookup table we get that adding 'N' to 'Q' yields 'N', and adding 'A' to
'M' yields 'X'. So our state is now 'NX'
Next we look up 'NX' in this table I haven't given you and we will find an
entry mapping 'NX' -> 'DX', making 'DX' our new state.
We keep repeating this process alternating between adding pairs of
characters and using this unstated lookup table all the way until the end
where we will reach a final state which will be 'H9'.
If you follow this procedure with any string (upto 400 bit master seeds)
you will always end up in the state 'H9'.
A specialized worksheet would help guide the process making the process
easier to follow.
This process is somewhat close to Peter Todd's suggestion of using a lookup
table on "words", which in this case would be pairs of bech32 characters,
and adding values together. The catch is that the addition is done with
Bech32 addition rather than calculator addition, which I accept is a
moderately large catch.
Anyhow, the point is that you can do this sort of partial verification by
hand to whatever degree you like, if you are in a rush and are willing to
accept larger chances of failing to catch random errors.
On Wed, Feb 22, 2023 at 2:01 PM Russell O'Connor <roconnor@blockstream.com>
wrote:
> After some poking around at the math, I do see that the 13 character
> generator (for regular sized shares) is reasonably "smooth", having roots
> at T{11}, S{16}, and C{24}.
>
> This means we could build a "quick check" worksheet to evaluate the string
> modulo (x - T) to verify a 5 bit checksum, whose operation would be similar
> to the existing checksum worksheet in structure but significantly less work.
>
> Perhaps 5 bits is too short, and it is more reasonable working modulo (x -
> T)*(x - S) to get a 10 bit checksum. A worksheet for a 15 bit checksum is
> also an option, and possibly others well depending on the size of the other
> factors. I think this process is would be about as simple as any other
> comparable hand operated checksum over the bech32 alphabet would be.
>
> I haven't looked into the smoothness of the long generator for seeds that
> are greater than 400 bits. If it isn't smooth and smoother options are
> available, perhaps it should be changed.
>
> On Wed, Feb 22, 2023 at 11:29 AM Peter Todd via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
>
>> On Sun, Feb 19, 2023 at 10:12:51PM +0000, Andrew Poelstra via bitcoin-dev
>> wrote:
>> > > What really did catch my attention, but which was kind of buried in
>> the
>> > > project documentation, is the ability to verify the integrity of each
>> > > share independently without using a computer. For example, if I
>> store a
>> > > share with some relative who lives thousands of kilometers away, I'll
>> be
>> > > able to take that share out of its tamper-evident bag on my annual
>> > > holiday visit, verify that I can still read it accurately by
>> validating
>> > > its checksum, and put it into a new bag for another year. For this
>> > > procedure, I don't need to bring copies of any of my other shares,
>> > > allowing them (and my seed) to stay safe.
>> > >
>> >
>> > This is good feedback. I strongly agree that this is the big selling
>> > point for this -- that you can vet shares/seeds which *aren't* being
>> > actively used, without exposing them to the sorts of threats associated
>> > with active use.
>> >
>> > We should make this more prominent in the BIP motivation.
>>
>> I don't think that use-case is a good selling point. The checksum that
>> Codex32
>> uses is much more complex than necessary if you are simply verifying a
>> share by
>> itself.
>>
>> A *much* simpler approach would be to use a simple mod N = 0 checksum,
>> either
>> by creating the seed such that each share passes, or by just storing an
>> additional word/symbol with the seed in such a way that sum(words) mod N
>> = 0
>> passes. This approach is not only possible to compute by hand with a
>> word/symbol->number lookup table, and pen and paper or a calculator. It's
>> so
>> simple they could probably *remember* how to do it themselves.
>>
>>
>> Secondly, if all shares have mod N checksums, it may be sufficient for
>> everyone
>> to write down the checksums of the *other* shares, to verify they are the
>> correct ones and a different (otherwise correct) share hasn't
>> accidentally been
>> substituted.
>>
>> Indeed, with some brute forcing and small checksums, I'd expect it to be
>> mathematically possible to generate Shamir's secret sharing shards such
>> that
>> every shard can share the *same* checksum. In which case the share
>> verification
>> procedure would be to simply ask every share holder to verify the checksum
>> manually using the mod N procedure, and then verify that each share
>> holder has
>> the same checksum. This would be less error prone in terms of leaking
>> information accidentally if the checksum was obviously *not* part of the
>> share:
>> eg by encoding the share with words, and the checksum with a number.
>>
>> Obviously, small checksums aren't fool proof. But we're probably better
>> off
>> creating a relatively easy procedure with a 1-in-1000 chance of an error
>> going
>> undetected than a complex procedure that people don't actually do at all.
>>
>> --
>> https://petertodd.org 'peter'[:-1]@petertodd.org
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>
[-- Attachment #2: Type: text/html, Size: 9401 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-23 3:30 ` Russell O'Connor
@ 2023-02-23 16:36 ` Russell O'Connor
2023-02-23 18:26 ` Russell O'Connor
1 sibling, 0 replies; 15+ messages in thread
From: Russell O'Connor @ 2023-02-23 16:36 UTC (permalink / raw)
To: Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 10192 bytes --]
Sorry for the repeated replies, but I would like to make one more remark
regarding the 1 character "quick check".
Because the 1 character "quick check" state is so small, the procedure
becomes simplified to just using a single table. You start with the
specified initial state, which would be the bech32 character '9', and then
you have a lookup mapping (<current state>, <next input character>) ->
<next state>. You go through the table for each character after the
prefix, updating the state as you go along. ('9','2') -> '0', then
('0','N') -> '4', and so on until you reach the final state which should be
'5'. If you like volvelles, one could be designed to implement this lookup
table.
However, I do want to note that an adjustment could be made to the codex32
generator so that this 1 character "quick check" table would become
identical to the Bech32 addition table. In other words the 1 character
quick check would become the same as adding up all the characters and
checking that you get the required final constant.
If this change were free to make, I would probably make it. However such
an adjustment would come at a cost. The current generator was chosen to
have three identical coefficients in a row (you can find the generator in
the appendix of the draft BIP). This special property slightly eases the
computation needed when creating checksums by hand (no compromise to the
quality of the checksum itself). If we made the above adjustment to the
codex32 generator, we would lose this property of having three identical
coefficients in a row.
Therefore, I am pretty hesitant to make this adjustment. Firstly the 1
character quick check is simply too small to be safely used. While it does
guarantee to detect single character errors, it has a 1 in 32 chance of
failing to detect more errors. I think a 3% failure rate is pretty bad,
and would definitely recommend people performing quick checks use a minimum
size of 2 (which has a 0.1% failure rate). Secondly the difference between
using the addition table/volvelle versus a specific table/volvelle for the
purpose of performing 1 character quick checks (which you ought not to be
doing anyways) is pretty minimal. The addition table is possibly slightly
less error prone to use because it is symmetric, but other than that the
amount of work to do is pretty much the same either way.
My conclusion is that it isn't worth compromising the ease of hand
computation for the sake of possibly making a
too-low-quality-checksum-that-no-one-should-be-using slightly more
convenient, but I thought I should mention it at least.
On Wed, Feb 22, 2023 at 10:30 PM Russell O'Connor <roconnor@blockstream.com>
wrote:
> After some consultation, I now see that generators for all degree 2 BCH
> codes, such as ours, are smooth and factor into quadratic and linear
> components.
>
> Anyhow the upshot of all this is that you can perform a "quickcheck"
> verification of the codex32 strings for whatever size of verification you
> want to do, 1 character, 2 characters, 3 characters, upto the full 13
> characters. Each of these partial verifications will have BCH properties.
> A 1 character quickchecksum will guarantee to detect any 1 character
> error. A 3 character quickchecksum will guarantee to detect any 2
> character error, etc. There remains a 1 in 32^n chance of failing to
> detect larger numbers of errors where n is the size of your quickcheck.
>
> To illustrate, let's consider a quickcheck of size 2. This can detect any
> 1 character error and will only have a 1/1024 chance of failing to detect
> other random errors. Let's take the second test vector as our example: "
> MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP870HKKQRM"
>
> You start in a specified initial state with a pair of bech32 characters.
> For MS1 strings and a size 2 quickcheck it would be the pair of Bech32
> characters 'AS'.
>
> Next we "add" the first character after the prefix, which is '2' by using
> the addition volvelle or lookup table. "Adding" '2' to 'S' yields '6' and
> our state becomes 'A6'.
>
> Next we have to look up 'A6' in a lookup table. This table is too big to
> fit in the margin of this email, so I will have to omit it. But it would
> have an entry mapping 'A6' -> 'QM'. Our state becomes 'QM'
>
> From this point we have an even number of remaining characters in the
> input string and we can work 2 characters at a time. We "add" the next two
> data characters from our string, which are 'NA'. Again, using the volvelle
> or lookup table we get that adding 'N' to 'Q' yields 'N', and adding 'A' to
> 'M' yields 'X'. So our state is now 'NX'
>
> Next we look up 'NX' in this table I haven't given you and we will find an
> entry mapping 'NX' -> 'DX', making 'DX' our new state.
>
> We keep repeating this process alternating between adding pairs of
> characters and using this unstated lookup table all the way until the end
> where we will reach a final state which will be 'H9'.
>
> If you follow this procedure with any string (upto 400 bit master seeds)
> you will always end up in the state 'H9'.
>
> A specialized worksheet would help guide the process making the process
> easier to follow.
>
>
> This process is somewhat close to Peter Todd's suggestion of using a
> lookup table on "words", which in this case would be pairs of bech32
> characters, and adding values together. The catch is that the addition is
> done with Bech32 addition rather than calculator addition, which I accept
> is a moderately large catch.
>
> Anyhow, the point is that you can do this sort of partial verification by
> hand to whatever degree you like, if you are in a rush and are willing to
> accept larger chances of failing to catch random errors.
>
>
> On Wed, Feb 22, 2023 at 2:01 PM Russell O'Connor <roconnor@blockstream.com>
> wrote:
>
>> After some poking around at the math, I do see that the 13 character
>> generator (for regular sized shares) is reasonably "smooth", having roots
>> at T{11}, S{16}, and C{24}.
>>
>> This means we could build a "quick check" worksheet to evaluate the
>> string modulo (x - T) to verify a 5 bit checksum, whose operation would be
>> similar to the existing checksum worksheet in structure but significantly
>> less work.
>>
>> Perhaps 5 bits is too short, and it is more reasonable working modulo (x
>> - T)*(x - S) to get a 10 bit checksum. A worksheet for a 15 bit checksum
>> is also an option, and possibly others well depending on the size of the
>> other factors. I think this process is would be about as simple as any
>> other comparable hand operated checksum over the bech32 alphabet would be.
>>
>> I haven't looked into the smoothness of the long generator for seeds that
>> are greater than 400 bits. If it isn't smooth and smoother options are
>> available, perhaps it should be changed.
>>
>> On Wed, Feb 22, 2023 at 11:29 AM Peter Todd via bitcoin-dev <
>> bitcoin-dev@lists.linuxfoundation.org> wrote:
>>
>>> On Sun, Feb 19, 2023 at 10:12:51PM +0000, Andrew Poelstra via
>>> bitcoin-dev wrote:
>>> > > What really did catch my attention, but which was kind of buried in
>>> the
>>> > > project documentation, is the ability to verify the integrity of each
>>> > > share independently without using a computer. For example, if I
>>> store a
>>> > > share with some relative who lives thousands of kilometers away,
>>> I'll be
>>> > > able to take that share out of its tamper-evident bag on my annual
>>> > > holiday visit, verify that I can still read it accurately by
>>> validating
>>> > > its checksum, and put it into a new bag for another year. For this
>>> > > procedure, I don't need to bring copies of any of my other shares,
>>> > > allowing them (and my seed) to stay safe.
>>> > >
>>> >
>>> > This is good feedback. I strongly agree that this is the big selling
>>> > point for this -- that you can vet shares/seeds which *aren't* being
>>> > actively used, without exposing them to the sorts of threats associated
>>> > with active use.
>>> >
>>> > We should make this more prominent in the BIP motivation.
>>>
>>> I don't think that use-case is a good selling point. The checksum that
>>> Codex32
>>> uses is much more complex than necessary if you are simply verifying a
>>> share by
>>> itself.
>>>
>>> A *much* simpler approach would be to use a simple mod N = 0 checksum,
>>> either
>>> by creating the seed such that each share passes, or by just storing an
>>> additional word/symbol with the seed in such a way that sum(words) mod N
>>> = 0
>>> passes. This approach is not only possible to compute by hand with a
>>> word/symbol->number lookup table, and pen and paper or a calculator.
>>> It's so
>>> simple they could probably *remember* how to do it themselves.
>>>
>>>
>>> Secondly, if all shares have mod N checksums, it may be sufficient for
>>> everyone
>>> to write down the checksums of the *other* shares, to verify they are the
>>> correct ones and a different (otherwise correct) share hasn't
>>> accidentally been
>>> substituted.
>>>
>>> Indeed, with some brute forcing and small checksums, I'd expect it to be
>>> mathematically possible to generate Shamir's secret sharing shards such
>>> that
>>> every shard can share the *same* checksum. In which case the share
>>> verification
>>> procedure would be to simply ask every share holder to verify the
>>> checksum
>>> manually using the mod N procedure, and then verify that each share
>>> holder has
>>> the same checksum. This would be less error prone in terms of leaking
>>> information accidentally if the checksum was obviously *not* part of the
>>> share:
>>> eg by encoding the share with words, and the checksum with a number.
>>>
>>> Obviously, small checksums aren't fool proof. But we're probably better
>>> off
>>> creating a relatively easy procedure with a 1-in-1000 chance of an error
>>> going
>>> undetected than a complex procedure that people don't actually do at all.
>>>
>>> --
>>> https://petertodd.org 'peter'[:-1]@petertodd.org
>>> _______________________________________________
>>> bitcoin-dev mailing list
>>> bitcoin-dev@lists.linuxfoundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>
[-- Attachment #2: Type: text/html, Size: 12511 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [bitcoin-dev] Codex32
2023-02-23 3:30 ` Russell O'Connor
2023-02-23 16:36 ` Russell O'Connor
@ 2023-02-23 18:26 ` Russell O'Connor
1 sibling, 0 replies; 15+ messages in thread
From: Russell O'Connor @ 2023-02-23 18:26 UTC (permalink / raw)
To: Bitcoin Protocol Discussion
[-- Attachment #1: Type: text/plain, Size: 9867 bytes --]
One more thing (Again apologies. This idea of doing partial verification is
novel to me, and I see now that I should have just waited to give a
consolidated reply).
Focusing in on the example of performing 2 character quick checks. There
are 7 different ways of building the table used in this quick check
verification process (actually there are 8, but we only need 7 of them for
our purposes here). Fun fact: If you perform the 2 character quick check
in all 7 different ways, this is equivalent to doing a full checksum
verification.
This suggests a strategy of visiting your shares on a regular basis and
performing a different 2 character quick check each time, rotating through
the 7 different ways of performing it.
Moreover, these 7 different 2 character quick checks come with a prescribed
order that will accumulate BCH guarantees as you go along. Assuming the
string isn't changing between visits then
* After the 1st table you are guaranteed to detect any 1 character error.
* After the 2nd table you are guaranteed to detect any 2 character error.
* After the 3rd table you are guaranteed to detect any 4 character error.
* After the 4th table you are guaranteed to detect any 5 character error.
* After the 5th table you are guaranteed to detect any 6 character error.
* After the 6th table you are guaranteed to detect any 7 character error.
* After the 7th table you are guaranteed to detect any 8 character error,
which is the guarantee of the full 13 character checksum. You are also
guaranteed that the full 13 character checksum is now correct.
You could perform the checks out of order, and that is okay. You will
eventually reach these BCH levels of guarantees, just not as quickly as if
you follow the prescribed order.
Of course, doing a series of 7 different 2 character quick checks is
overall more work than doing the full 13 character checksum validation.
But there is certainly an advantage in spreading the work out over time.
Each time you visit you still have the guarantee of catching any new 1
character error introduced since the last time you visited and a 99.9%
chance of catching any other random errors introduced since your last
visit. Personally I am likely to follow such a validation strategy myself,
now that I am aware of it.
On Wed, Feb 22, 2023 at 10:30 PM Russell O'Connor <roconnor@blockstream.com>
wrote:
> After some consultation, I now see that generators for all degree 2 BCH
> codes, such as ours, are smooth and factor into quadratic and linear
> components.
>
> Anyhow the upshot of all this is that you can perform a "quickcheck"
> verification of the codex32 strings for whatever size of verification you
> want to do, 1 character, 2 characters, 3 characters, upto the full 13
> characters. Each of these partial verifications will have BCH properties.
> A 1 character quickchecksum will guarantee to detect any 1 character
> error. A 3 character quickchecksum will guarantee to detect any 2
> character error, etc. There remains a 1 in 32^n chance of failing to
> detect larger numbers of errors where n is the size of your quickcheck.
>
> To illustrate, let's consider a quickcheck of size 2. This can detect any
> 1 character error and will only have a 1/1024 chance of failing to detect
> other random errors. Let's take the second test vector as our example: "
> MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP870HKKQRM"
>
> You start in a specified initial state with a pair of bech32 characters.
> For MS1 strings and a size 2 quickcheck it would be the pair of Bech32
> characters 'AS'.
>
> Next we "add" the first character after the prefix, which is '2' by using
> the addition volvelle or lookup table. "Adding" '2' to 'S' yields '6' and
> our state becomes 'A6'.
>
> Next we have to look up 'A6' in a lookup table. This table is too big to
> fit in the margin of this email, so I will have to omit it. But it would
> have an entry mapping 'A6' -> 'QM'. Our state becomes 'QM'
>
> From this point we have an even number of remaining characters in the
> input string and we can work 2 characters at a time. We "add" the next two
> data characters from our string, which are 'NA'. Again, using the volvelle
> or lookup table we get that adding 'N' to 'Q' yields 'N', and adding 'A' to
> 'M' yields 'X'. So our state is now 'NX'
>
> Next we look up 'NX' in this table I haven't given you and we will find an
> entry mapping 'NX' -> 'DX', making 'DX' our new state.
>
> We keep repeating this process alternating between adding pairs of
> characters and using this unstated lookup table all the way until the end
> where we will reach a final state which will be 'H9'.
>
> If you follow this procedure with any string (upto 400 bit master seeds)
> you will always end up in the state 'H9'.
>
> A specialized worksheet would help guide the process making the process
> easier to follow.
>
>
> This process is somewhat close to Peter Todd's suggestion of using a
> lookup table on "words", which in this case would be pairs of bech32
> characters, and adding values together. The catch is that the addition is
> done with Bech32 addition rather than calculator addition, which I accept
> is a moderately large catch.
>
> Anyhow, the point is that you can do this sort of partial verification by
> hand to whatever degree you like, if you are in a rush and are willing to
> accept larger chances of failing to catch random errors.
>
>
> On Wed, Feb 22, 2023 at 2:01 PM Russell O'Connor <roconnor@blockstream.com>
> wrote:
>
>> After some poking around at the math, I do see that the 13 character
>> generator (for regular sized shares) is reasonably "smooth", having roots
>> at T{11}, S{16}, and C{24}.
>>
>> This means we could build a "quick check" worksheet to evaluate the
>> string modulo (x - T) to verify a 5 bit checksum, whose operation would be
>> similar to the existing checksum worksheet in structure but significantly
>> less work.
>>
>> Perhaps 5 bits is too short, and it is more reasonable working modulo (x
>> - T)*(x - S) to get a 10 bit checksum. A worksheet for a 15 bit checksum
>> is also an option, and possibly others well depending on the size of the
>> other factors. I think this process is would be about as simple as any
>> other comparable hand operated checksum over the bech32 alphabet would be.
>>
>> I haven't looked into the smoothness of the long generator for seeds that
>> are greater than 400 bits. If it isn't smooth and smoother options are
>> available, perhaps it should be changed.
>>
>> On Wed, Feb 22, 2023 at 11:29 AM Peter Todd via bitcoin-dev <
>> bitcoin-dev@lists.linuxfoundation.org> wrote:
>>
>>> On Sun, Feb 19, 2023 at 10:12:51PM +0000, Andrew Poelstra via
>>> bitcoin-dev wrote:
>>> > > What really did catch my attention, but which was kind of buried in
>>> the
>>> > > project documentation, is the ability to verify the integrity of each
>>> > > share independently without using a computer. For example, if I
>>> store a
>>> > > share with some relative who lives thousands of kilometers away,
>>> I'll be
>>> > > able to take that share out of its tamper-evident bag on my annual
>>> > > holiday visit, verify that I can still read it accurately by
>>> validating
>>> > > its checksum, and put it into a new bag for another year. For this
>>> > > procedure, I don't need to bring copies of any of my other shares,
>>> > > allowing them (and my seed) to stay safe.
>>> > >
>>> >
>>> > This is good feedback. I strongly agree that this is the big selling
>>> > point for this -- that you can vet shares/seeds which *aren't* being
>>> > actively used, without exposing them to the sorts of threats associated
>>> > with active use.
>>> >
>>> > We should make this more prominent in the BIP motivation.
>>>
>>> I don't think that use-case is a good selling point. The checksum that
>>> Codex32
>>> uses is much more complex than necessary if you are simply verifying a
>>> share by
>>> itself.
>>>
>>> A *much* simpler approach would be to use a simple mod N = 0 checksum,
>>> either
>>> by creating the seed such that each share passes, or by just storing an
>>> additional word/symbol with the seed in such a way that sum(words) mod N
>>> = 0
>>> passes. This approach is not only possible to compute by hand with a
>>> word/symbol->number lookup table, and pen and paper or a calculator.
>>> It's so
>>> simple they could probably *remember* how to do it themselves.
>>>
>>>
>>> Secondly, if all shares have mod N checksums, it may be sufficient for
>>> everyone
>>> to write down the checksums of the *other* shares, to verify they are the
>>> correct ones and a different (otherwise correct) share hasn't
>>> accidentally been
>>> substituted.
>>>
>>> Indeed, with some brute forcing and small checksums, I'd expect it to be
>>> mathematically possible to generate Shamir's secret sharing shards such
>>> that
>>> every shard can share the *same* checksum. In which case the share
>>> verification
>>> procedure would be to simply ask every share holder to verify the
>>> checksum
>>> manually using the mod N procedure, and then verify that each share
>>> holder has
>>> the same checksum. This would be less error prone in terms of leaking
>>> information accidentally if the checksum was obviously *not* part of the
>>> share:
>>> eg by encoding the share with words, and the checksum with a number.
>>>
>>> Obviously, small checksums aren't fool proof. But we're probably better
>>> off
>>> creating a relatively easy procedure with a 1-in-1000 chance of an error
>>> going
>>> undetected than a complex procedure that people don't actually do at all.
>>>
>>> --
>>> https://petertodd.org 'peter'[:-1]@petertodd.org
>>> _______________________________________________
>>> bitcoin-dev mailing list
>>> bitcoin-dev@lists.linuxfoundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>
[-- Attachment #2: Type: text/html, Size: 12197 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-02-23 18:26 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-16 2:16 [bitcoin-dev] Codex32 Russell O'Connor
2023-02-16 11:50 ` Pavol Rusnak
2023-02-16 13:49 ` Andrew Poelstra
2023-02-19 20:13 ` David A. Harding
2023-02-19 22:12 ` Andrew Poelstra
2023-02-19 23:05 ` Christopher Allen
2023-02-20 0:52 ` Russell O'Connor
2023-02-22 16:29 ` Peter Todd
2023-02-22 19:01 ` Russell O'Connor
2023-02-23 3:30 ` Russell O'Connor
2023-02-23 16:36 ` Russell O'Connor
2023-02-23 18:26 ` Russell O'Connor
2023-02-22 17:24 ` Russell O'Connor
2023-02-20 18:44 ` Andrew Poelstra
2023-02-20 18:48 ` Pavol Rusnak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox