[bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists
@ 2026-06-08 14:24 Daniel Osemberg
  2026-06-13 16:34 ` 'conduition' via Bitcoin Development Mailing List
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Osemberg @ 2026-06-08 14:24 UTC (permalink / raw)
  To: Bitcoin Development Mailing List

[-- Attachment #1.1: Type: text/plain, Size: 2357 bytes --]

Hi list,

I would like to ask for early feedback on an idea before attempting any 
formal BIP submission.

The proposal is a display/input layer for BIP39 recovery phrases in 
additional native languages.

The important constraint is that this does not change the BIP39 
cryptographic flow. The canonical mnemonic remains the existing BIP39 
English wordlist, and PBKDF2 is still performed on the canonical English 
form.

The native-language lists are index-paired to the English BIP39 wordlist. 
In other words, each native word maps to the same 0-2047 index as the 
corresponding English BIP39 word. Wallets could display or accept the 
native-language form for UX purposes, but internally normalize back to the 
canonical English mnemonic before seed generation.

Motivation:

Many users around the world are asked to back up and restore Bitcoin 
wallets using English recovery words, even when English is not their native 
language. This creates UX risk, spelling mistakes, misunderstanding, and 
lower confidence during backup and recovery.

This proposal tries to improve multilingual recovery UX while keeping 
compatibility with existing BIP39 behavior.

This is not:

A new seed scheme
A replacement for BIP39
A new cryptographic standard
A change to PBKDF2 input
A wallet-specific format

It is intended as a display/input convention for wallets that want to 
support native-language recovery UX while preserving canonical BIP39 
compatibility.

A draft implementation and wordlists are here:

https://github.com/osem23/bip39-wordlists-tzur

I would appreciate feedback on:

Whether this idea is appropriate for the BIP process at all
Whether it should be considered informational rather than standards-track
Whether the index-paired approach creates hidden risks
Whether wallet developers see practical value in this
How native-speaker review and normalization rules should be handled
Whether there is prior work I should study before continuing

Thank you,
Daniel

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/e4ee1a70-aa70-4dbe-9f6c-27c26c5d17e0n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2796 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists
  2026-06-08 14:24 [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists Daniel Osemberg
@ 2026-06-13 16:34 ` 'conduition' via Bitcoin Development Mailing List
  2026-06-13 16:39   ` Daniel Osemberg
  0 siblings, 1 reply; 6+ messages in thread
From: 'conduition' via Bitcoin Development Mailing List @ 2026-06-13 16:34 UTC (permalink / raw)
  To: Daniel Osemberg; +Cc: Bitcoin Development Mailing List


[-- Attachment #1.1.1: Type: text/plain, Size: 4502 bytes --]

Hey Daniel,

So basically this would allow a user to translate an english language BIP39 seed phrase to/from other languages, insured by the knowledge that it be converted back if needed for compatibility with english-only BIP39 wallets.

Neat idea. This is how BIP39 should've been done originally. Today, a BIP39 seed derives a different master key depending on what language is used to encode the source entropy, which was arguably a mistake because it breaks compatibility between implementations written for different locales, and incentivizes everyone to use seeds of the same language so that they're maximally compatible (which is exactly what happened). 

I worry your proposal here might cause confusion if introduced today though. Say you are given a french language 12-word seed phrase. Do you map the word indices to english and then run PBKDF2 with your algorithm? Or do you run PBKDF2 on the french version as specified in the original BIP39?

There would need to be a clear way for humans and software to distinguish between a "locale-mapped seed phrase" using your spec, and a legacy french BIP39 seed phrase, so they know how to derive the correct master key.

I guess one could argue that non-english BIP39 seeds are so uncommon that you could safely assume the former, but still it leaves open an unfortunate ambiguity which could lead to lost funds in some cases.

What do you think of this?

regards,
conduition
On Saturday, June 13th, 2026 at 12:04 PM, Daniel Osemberg <daniosemberg@gmail.com> wrote:

> Hi list,
> 

> I would like to ask for early feedback on an idea before attempting any formal BIP submission.
> 

> The proposal is a display/input layer for BIP39 recovery phrases in additional native languages.
> 

> The important constraint is that this does not change the BIP39 cryptographic flow. The canonical mnemonic remains the existing BIP39 English wordlist, and PBKDF2 is still performed on the canonical English form.
> 

> The native-language lists are index-paired to the English BIP39 wordlist. In other words, each native word maps to the same 0-2047 index as the corresponding English BIP39 word. Wallets could display or accept the native-language form for UX purposes, but internally normalize back to the canonical English mnemonic before seed generation.
> 

> Motivation:
> 

> Many users around the world are asked to back up and restore Bitcoin wallets using English recovery words, even when English is not their native language. This creates UX risk, spelling mistakes, misunderstanding, and lower confidence during backup and recovery.
> 

> This proposal tries to improve multilingual recovery UX while keeping compatibility with existing BIP39 behavior.
> 

> This is not:
> 

> A new seed scheme
> A replacement for BIP39
> A new cryptographic standard
> A change to PBKDF2 input
> A wallet-specific format
> 

> It is intended as a display/input convention for wallets that want to support native-language recovery UX while preserving canonical BIP39 compatibility.
> 

> A draft implementation and wordlists are here:
> 

> https://github.com/osem23/bip39-wordlists-tzur
> 

> I would appreciate feedback on:
> 

> Whether this idea is appropriate for the BIP process at all
> Whether it should be considered informational rather than standards-track
> Whether the index-paired approach creates hidden risks
> Whether wallet developers see practical value in this
> How native-speaker review and normalization rules should be handled
> Whether there is prior work I should study before continuing
> 

> Thank you,
> Daniel
> 

> --
> You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/e4ee1a70-aa70-4dbe-9f6c-27c26c5d17e0n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/FOZaQw_eyfsyT45cgYr6VCHB79in1lybdP9HBbm-KkEQqqjwrNA5DHghqebO1nB3PhvpHyhmPoD5qlHi7KHygLzRqXU6u_B9LSm5K-H-YC8%3D%40proton.me.

[-- Attachment #1.1.2.1: Type: text/html, Size: 6599 bytes --]

[-- Attachment #1.2: publickey - conduition@proton.me - 0x474891AD.asc --]
[-- Type: application/pgp-keys, Size: 649 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 343 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists
  2026-06-13 16:34 ` 'conduition' via Bitcoin Development Mailing List
@ 2026-06-13 16:39   ` Daniel Osemberg
  2026-06-16 21:18     ` 'conduition' via Bitcoin Development Mailing List
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Osemberg @ 2026-06-13 16:39 UTC (permalink / raw)
  To: Bitcoin Development Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 6763 bytes --]



Hi conduition,

Thank you for the thoughtful feedback. I agree that ambiguity is the main 
issue any proposal like this has to handle carefully.

To clarify the design: TZUR display wordlists are not meant to replace or 
reinterpret existing BIP39 wordlists. They are separate, index-parallel 
display wordlists whose purpose is to render and accept a user-facing 
mnemonic in another language while keeping the canonical English BIP39 
mnemonic as the seed of record.

So the derivation path for a TZUR display mnemonic is always:

localized TZUR display words → word indices → canonical English BIP39 words 
→ standard BIP39 checksum validation → standard PBKDF2 → standard 
BIP32/BIP84 derivation

The localized words themselves are never passed directly into PBKDF2. Only 
the canonical English mnemonic is.

Your French example is exactly the edge case that needs to be explicit. A 
legacy French BIP39 mnemonic and a TZUR French display mnemonic are not the 
same thing. They are two different encodings that may use the same human 
language, and they must not be silently treated as interchangeable.

For that reason, a wallet implementing this convention should not just see 
“French words” and guess. It should know, or ask, which wordlist mode is 
being used:

   1. 
   
   Legacy BIP39 French wordlist
   2. 
   
   TZUR French display wordlist
   3. 
   
   Canonical English BIP39
   
The reference design also includes stable wordlist identifiers: language 
code, version, and SHA256 of the exact wordlist file. A wallet can persist 
that metadata alongside the wallet, and use it during restore to avoid 
ambiguity. But for maximum portability, the wallet should always show the 
canonical English mnemonic as the universal recovery form.

So I think your concern is valid. The safe rule is: never auto-detect 
between legacy non-English BIP39 and TZUR display lists when both exist. 
Make the mode explicit, keep the English mnemonic available, and treat the 
display mnemonic as a UX layer rather than a new seed derivation scheme.

Regards,
Daniel

On Saturday, June 13, 2026 at 7:36:05 PM UTC+3 conduition wrote:

> Hey Daniel,
>
> So basically this would allow a user to translate an english language 
> BIP39 seed phrase to/from other languages, insured by the knowledge that it 
> be converted back if needed for compatibility with english-only BIP39 
> wallets.
>
> Neat idea. This is how BIP39 should've been done originally. Today, a 
> BIP39 seed derives a different master key depending on what language is 
> used to encode the source entropy, which was arguably a mistake because it 
> breaks compatibility between implementations written for different locales, 
> and incentivizes everyone to use seeds of the same language so that they're 
> maximally compatible (which is exactly what happened). 
>
> I worry your proposal here might cause confusion if introduced today 
> though. Say you are given a french language 12-word seed phrase. Do you map 
> the word indices to english and then run PBKDF2 with your algorithm? Or do 
> you run PBKDF2 on the french version as specified in the original BIP39?
>
> There would need to be a clear way for humans and software to distinguish 
> between a "locale-mapped seed phrase" using your spec, and a legacy french 
> BIP39 seed phrase, so they know how to derive the correct master key.
>
> I guess one could argue that non-english BIP39 seeds are so uncommon that 
> you could safely assume the former, but still it leaves open an unfortunate 
> ambiguity which could lead to lost funds in some cases.
>
> What do you think of this?
>
> regards,
> conduition
> On Saturday, June 13th, 2026 at 12:04 PM, Daniel Osemberg <
> danios...@gmail.com> wrote:
>
> Hi list,
>
> I would like to ask for early feedback on an idea before attempting any 
> formal BIP submission.
>
> The proposal is a display/input layer for BIP39 recovery phrases in 
> additional native languages.
>
> The important constraint is that this does not change the BIP39 
> cryptographic flow. The canonical mnemonic remains the existing BIP39 
> English wordlist, and PBKDF2 is still performed on the canonical English 
> form.
>
> The native-language lists are index-paired to the English BIP39 wordlist. 
> In other words, each native word maps to the same 0-2047 index as the 
> corresponding English BIP39 word. Wallets could display or accept the 
> native-language form for UX purposes, but internally normalize back to the 
> canonical English mnemonic before seed generation.
>
> Motivation:
>
> Many users around the world are asked to back up and restore Bitcoin 
> wallets using English recovery words, even when English is not their native 
> language. This creates UX risk, spelling mistakes, misunderstanding, and 
> lower confidence during backup and recovery.
>
> This proposal tries to improve multilingual recovery UX while keeping 
> compatibility with existing BIP39 behavior.
>
> This is not:
>
> A new seed scheme
> A replacement for BIP39
> A new cryptographic standard
> A change to PBKDF2 input
> A wallet-specific format
>
> It is intended as a display/input convention for wallets that want to 
> support native-language recovery UX while preserving canonical BIP39 
> compatibility.
>
> A draft implementation and wordlists are here:
>
> https://github.com/osem23/bip39-wordlists-tzur
>
> I would appreciate feedback on:
>
> Whether this idea is appropriate for the BIP process at all
> Whether it should be considered informational rather than standards-track
> Whether the index-paired approach creates hidden risks
> Whether wallet developers see practical value in this
> How native-speaker review and normalization rules should be handled
> Whether there is prior work I should study before continuing
>
> Thank you,
> Daniel
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to bitcoindev+...@googlegroups.com.
> To view this discussion visit 
> https://groups.google.com/d/msgid/bitcoindev/e4ee1a70-aa70-4dbe-9f6c-27c26c5d17e0n%40googlegroups.com
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/7fe5fea8-ae9d-4081-a94f-fa9be0677012n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 9217 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists
  2026-06-13 16:39   ` Daniel Osemberg
@ 2026-06-16 21:18     ` 'conduition' via Bitcoin Development Mailing List
  2026-06-16 21:25       ` Daniel Osemberg
  0 siblings, 1 reply; 6+ messages in thread
From: 'conduition' via Bitcoin Development Mailing List @ 2026-06-16 21:18 UTC (permalink / raw)
  To: Daniel Osemberg; +Cc: Bitcoin Development Mailing List


[-- Attachment #1.1.1: Type: text/plain, Size: 10622 bytes --]

Hey Daniel,


> For that reason, a wallet implementing this convention should not just see “French words” and guess. It should know, or ask, which wordlist mode is being used:



This is the exact problem I was referring to. Imagine your french uncle dies and leaves you a 12 word french seed phrase. You install a BIP39 wallet and import the seed phrase. The wallet asks you: Is this a BIP39 seed phrase or a TZUR-translated seed phrase? You don't know what that means - all you have is a list of 12 french words. The only thing you can do as a user is try both. And if the wallet doesn't ask you at all (because maybe it only supports one format or the other), then the wallet happily imports the seed without asking and you have no idea the distinction even exists.

A seed phrase is supposed to contain all the necessary information to recover a wallet, either explicitly in its encoded payload, or implicitly in its format. That way a user doesn't have to record or remember any other extraneous meta-information - Just the seed phrase.

To make your translated wordlists work like this and fix the ambiguity, I would recommend you consider changing the encoding or the format to ensure a TZUR-translated display phrase cannot be interpreted as a BIP39 non-english seed phrase. So for instance, using a different number of words. Or use wordlists disjoint from BIP39, so that the probability of a random TZUR phrase containing at least one non-BIP39 word is overwhelming. 

One clean way to do both would be to increase the size of the TZUR wordlists. For instance, instead of encoding 12 words with 11 bits per word, you might encode 11 words with 12 bits per word - the exact same number of bits are encoded in both cases. To map the phrase back to BIP39 is as simple as changing how those bits are interpreted. Since a 4096-length word list would be double the size of a BIP39 wordlist, it must include at least 2048 words not present in the BIP39 wordlists, and so a randomly-sampled word has at most a 1 in 2 chance of being in the BIP39 wordlist. Thus the chance of all 11 random words just happening to be BIP39 words would be at least 1 in 2048. You can (and probably should) reduce that probability by reducing the size of the intersection between the TZUR and BIP39 wordlists.

You don't have to use those exact numbers either - as long as your encoding is able to package all the entropy bits from the equivalent a BIP39 seed phrase, it should work. (e.g. 13 words with 10 bits per word, encoding 130 bits). 

You could use wordlists which have size that isn't a power of two, and use some base-x math to reconstruct the entropy as a big unsigned integer. 

You could make your wordlists shorter instead of longer, though you'd need to be careful to minimize the overlap with BIP39 wordlists.

You could use a different word list for specific word indexes, e.g. the first word must be a non-bip39 word, and so it encodes less information but guarantees the phrase can't be misinterpreted as BIP39.

Anyways just some ideas. Hope it comes in handy.

regards,
conduition
On Saturday, June 13th, 2026 at 9:46 AM, Daniel Osemberg <daniosemberg@gmail.com> wrote:

> Hi conduition,
> 

> Thank you for the thoughtful feedback. I agree that ambiguity is the main issue any proposal like this has to handle carefully.
> 

> To clarify the design: TZUR display wordlists are not meant to replace or reinterpret existing BIP39 wordlists. They are separate, index-parallel display wordlists whose purpose is to render and accept a user-facing mnemonic in another language while keeping the canonical English BIP39 mnemonic as the seed of record.
> 

> So the derivation path for a TZUR display mnemonic is always:
> 

> localized TZUR display words → word indices → canonical English BIP39 words → standard BIP39 checksum validation → standard PBKDF2 → standard BIP32/BIP84 derivation
> 

> The localized words themselves are never passed directly into PBKDF2. Only the canonical English mnemonic is.
> 

> Your French example is exactly the edge case that needs to be explicit. A legacy French BIP39 mnemonic and a TZUR French display mnemonic are not the same thing. They are two different encodings that may use the same human language, and they must not be silently treated as interchangeable.
> 

> For that reason, a wallet implementing this convention should not just see “French words” and guess. It should know, or ask, which wordlist mode is being used:
> 

> 1.  Legacy BIP39 French wordlist
>     

> 2.  TZUR French display wordlist
>     

> 3.  Canonical English BIP39
>     

> 

> The reference design also includes stable wordlist identifiers: language code, version, and SHA256 of the exact wordlist file. A wallet can persist that metadata alongside the wallet, and use it during restore to avoid ambiguity. But for maximum portability, the wallet should always show the canonical English mnemonic as the universal recovery form.
> 

> So I think your concern is valid. The safe rule is: never auto-detect between legacy non-English BIP39 and TZUR display lists when both exist. Make the mode explicit, keep the English mnemonic available, and treat the display mnemonic as a UX layer rather than a new seed derivation scheme.
> 

> Regards,
> Daniel
> 

> 

> On Saturday, June 13, 2026 at 7:36:05 PM UTC+3 conduition wrote:
> 

> > Hey Daniel,
> > 

> > So basically this would allow a user to translate an english language BIP39 seed phrase to/from other languages, insured by the knowledge that it be converted back if needed for compatibility with english-only BIP39 wallets.
> > 

> > Neat idea. This is how BIP39 should've been done originally. Today, a BIP39 seed derives a different master key depending on what language is used to encode the source entropy, which was arguably a mistake because it breaks compatibility between implementations written for different locales, and incentivizes everyone to use seeds of the same language so that they're maximally compatible (which is exactly what happened).
> > 

> > I worry your proposal here might cause confusion if introduced today though. Say you are given a french language 12-word seed phrase. Do you map the word indices to english and then run PBKDF2 with your algorithm? Or do you run PBKDF2 on the french version as specified in the original BIP39?
> > 

> > There would need to be a clear way for humans and software to distinguish between a "locale-mapped seed phrase" using your spec, and a legacy french BIP39 seed phrase, so they know how to derive the correct master key.
> > 

> > I guess one could argue that non-english BIP39 seeds are so uncommon that you could safely assume the former, but still it leaves open an unfortunate ambiguity which could lead to lost funds in some cases.
> > 

> > What do you think of this?
> > 

> > regards,
> > conduition
> > 

> > On Saturday, June 13th, 2026 at 12:04 PM, Daniel Osemberg <danios...@gmail.com> wrote:
> > 

> > > Hi list,
> > > 

> > > I would like to ask for early feedback on an idea before attempting any formal BIP submission.
> > > 

> > > The proposal is a display/input layer for BIP39 recovery phrases in additional native languages.
> > > 

> > > The important constraint is that this does not change the BIP39 cryptographic flow. The canonical mnemonic remains the existing BIP39 English wordlist, and PBKDF2 is still performed on the canonical English form.
> > > 

> > > The native-language lists are index-paired to the English BIP39 wordlist. In other words, each native word maps to the same 0-2047 index as the corresponding English BIP39 word. Wallets could display or accept the native-language form for UX purposes, but internally normalize back to the canonical English mnemonic before seed generation.
> > > 

> > > Motivation:
> > > 

> > > Many users around the world are asked to back up and restore Bitcoin wallets using English recovery words, even when English is not their native language. This creates UX risk, spelling mistakes, misunderstanding, and lower confidence during backup and recovery.
> > > 

> > > This proposal tries to improve multilingual recovery UX while keeping compatibility with existing BIP39 behavior.
> > > 

> > > This is not:
> > > 

> > > A new seed scheme
> > > A replacement for BIP39
> > > A new cryptographic standard
> > > A change to PBKDF2 input
> > > A wallet-specific format
> > > 

> > > It is intended as a display/input convention for wallets that want to support native-language recovery UX while preserving canonical BIP39 compatibility.
> > > 

> > > A draft implementation and wordlists are here:
> > > 

> > > https://github.com/osem23/bip39-wordlists-tzur
> > > 

> > > I would appreciate feedback on:
> > > 

> > > Whether this idea is appropriate for the BIP process at all
> > > Whether it should be considered informational rather than standards-track
> > > Whether the index-paired approach creates hidden risks
> > > Whether wallet developers see practical value in this
> > > How native-speaker review and normalization rules should be handled
> > > Whether there is prior work I should study before continuing
> > > 

> > > Thank you,
> > > Daniel
> > 

> > > --
> > > You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
> > > To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/e4ee1a70-aa70-4dbe-9f6c-27c26c5d17e0n%40googlegroups.com.
> 

> --
> You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/7fe5fea8-ae9d-4081-a94f-fa9be0677012n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/kxuhfn7jqWsmcvm6ANEIYxdL8FiGzgHt5_tx9weHR9WMfIv_wAopjsBK9zbIb_3UdL5sLzzOTkCCdXe0YmfcbKX4_n3q5Fp8HUd4LlfPojo%3D%40proton.me.

[-- Attachment #1.1.2.1: Type: text/html, Size: 15434 bytes --]

[-- Attachment #1.2: publickey - conduition@proton.me - 0x474891AD.asc --]
[-- Type: application/pgp-keys, Size: 649 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 343 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists
  2026-06-16 21:18     ` 'conduition' via Bitcoin Development Mailing List
@ 2026-06-16 21:25       ` Daniel Osemberg
       [not found]         ` <CAH9Jg5mxa4SWwT=_o9W_eL087SOHzX8s9CnPSsKNMarhq3KRHw@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Osemberg @ 2026-06-16 21:25 UTC (permalink / raw)
  To: conduition; +Cc: Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 13826 bytes --]

Hey Conduition,

Thank you, this is a very useful way to frame the problem.

I agree that the ambiguity must be handled seriously, but I am not sure
that changing the word count, wordlist size, or encoding is the right
direction for this proposal.

The goal of this proposal is not to create a new seed phrase format. The
goal is to define fixed, independent display wordlists that map
deterministically to the canonical English BIP39 word indexes.

In this model, English BIP39 remains the canonical recovery form. The
localized phrase is a wallet-level display/input convention, not a new
entropy encoding scheme and not a replacement for BIP39.

The existing non-English BIP39 wordlists are still relevant for anyone who
created a wallet using those lists. I do not have exact data on how widely
they are used, but it is reasonable to assume they have been used by some
wallets and users. Therefore, any wallet that wants to support both legacy
non-English BIP39 mnemonics and this display-wordlist convention must be
able to distinguish between them explicitly.

That distinction is required regardless of this proposal. Today, a wallet
may already need to distinguish between English BIP39, legacy non-English
BIP39, Electrum seeds, SLIP39, BIP39 with or without passphrase, and other
wallet-specific recovery formats. A seed phrase without context can already
be ambiguous in practice.

For that reason, I think the safer requirement is not to change the
encoding, but to make the restoration mode explicit and prevent silent
interpretation.

A compatible wallet should:

   1. Treat English BIP39 as the canonical form.
   2. Treat TZUR-style localized wordlists as separate fixed display lists,
   not as existing BIP39 language lists.
   3. Never automatically reinterpret a legacy non-English BIP39 mnemonic
   as a mapped display mnemonic.
   4. Clearly label import and export modes.
   5. Always allow the user to view and export the canonical English BIP39
   phrase.
   6. Include stable wordlist identifiers, such as language code, version,
   and hash of the exact wordlist file.

I agree that using disjoint wordlists, or at least avoiding overlap with
existing BIP39 wordlists where possible, can reduce the chance of
confusion. That is a good design constraint for the display lists.

But changing the number of words or changing the encoding would move the
proposal away from being a BIP39-compatible display/input layer and toward
being a new recovery phrase format. That seems like a different proposal
with different compatibility tradeoffs.

So I think the right rule is:

Legacy BIP39 wordlists remain valid for wallets that created them.

The new display lists must be independent, fixed, clearly identified, and
mapped only to canonical English BIP39 indexes.

Wallets that support both must expose the distinction explicitly and must
never guess silently.

Thanks again. This feedback is very helpful, and I will make sure the draft
explains this distinction more clearly.

Best,
Daniel


On Wed, 17 Jun 2026 at 0:18 conduition <conduition@proton.me> wrote:

> Hey Daniel,
>
> For that reason, a wallet implementing this convention should not just see
> “French words” and guess. It should know, or ask, which wordlist mode is
> being used:
>
>
> This is the exact problem I was referring to. Imagine your french uncle
> dies and leaves you a 12 word french seed phrase. You install a BIP39
> wallet and import the seed phrase. The wallet asks you: Is this a BIP39
> seed phrase or a TZUR-translated seed phrase? You don't know what that
> means - all you have is a list of 12 french words. The only thing you can
> do as a user is try both. And if the wallet doesn't ask you at all (because
> maybe it only supports one format or the other), then the wallet happily
> imports the seed without asking and you have no idea the distinction even
> exists.
>
> A seed phrase is supposed to contain all the necessary information to
> recover a wallet, either explicitly in its encoded payload, or implicitly
> in its format. That way a user doesn't have to record or remember any other
> extraneous meta-information - Just the seed phrase.
>
> To make your translated wordlists work like this and fix the ambiguity, I
> would recommend you consider changing the encoding or the format to ensure
> a TZUR-translated display phrase cannot be interpreted as a BIP39
> non-english seed phrase. So for instance, using a different number of
> words. Or use wordlists disjoint from BIP39, so that the probability of a
> random TZUR phrase containing at least one non-BIP39 word is overwhelming.
>
> One clean way to do both would be to increase the size of the TZUR
> wordlists. For instance, instead of encoding 12 words with 11 bits per
> word, you might encode *11 words* with 12 bits per word - the exact same
> number of bits are encoded in both cases. To map the phrase back to BIP39
> is as simple as changing how those bits are interpreted. Since a
> 4096-length word list would be double the size of a BIP39 wordlist, it
> *must* include at least 2048 words not present in the BIP39 wordlists,
> and so a randomly-sampled word has at most a 1 in 2 chance of being in the
> BIP39 wordlist. Thus the chance of all 11 random words just happening to be
> BIP39 words would be at least 1 in 2048. You can (and probably should)
> reduce that probability by reducing the size of the intersection between
> the TZUR and BIP39 wordlists.
>
> You don't have to use those exact numbers either - as long as your
> encoding is able to package all the entropy bits from the equivalent a
> BIP39 seed phrase, it should work. (e.g. 13 words with 10 bits per word,
> encoding 130 bits).
>
> You could use wordlists which have size that isn't a power of two, and use
> some base-x math to reconstruct the entropy as a big unsigned integer.
>
> You could make your wordlists shorter instead of longer, though you'd need
> to be careful to minimize the overlap with BIP39 wordlists.
>
> You could use a different word list for specific word indexes, e.g. the
> first word *must* be a non-bip39 word, and so it encodes less information
> but guarantees the phrase can't be misinterpreted as BIP39.
>
> Anyways just some ideas. Hope it comes in handy.
>
> regards,
> conduition
> On Saturday, June 13th, 2026 at 9:46 AM, Daniel Osemberg <
> daniosemberg@gmail.com> wrote:
>
> Hi conduition,
>
> Thank you for the thoughtful feedback. I agree that ambiguity is the main
> issue any proposal like this has to handle carefully.
>
> To clarify the design: TZUR display wordlists are not meant to replace or
> reinterpret existing BIP39 wordlists. They are separate, index-parallel
> display wordlists whose purpose is to render and accept a user-facing
> mnemonic in another language while keeping the canonical English BIP39
> mnemonic as the seed of record.
>
> So the derivation path for a TZUR display mnemonic is always:
>
> localized TZUR display words → word indices → canonical English BIP39
> words → standard BIP39 checksum validation → standard PBKDF2 → standard
> BIP32/BIP84 derivation
>
> The localized words themselves are never passed directly into PBKDF2. Only
> the canonical English mnemonic is.
>
> Your French example is exactly the edge case that needs to be explicit. A
> legacy French BIP39 mnemonic and a TZUR French display mnemonic are not the
> same thing. They are two different encodings that may use the same human
> language, and they must not be silently treated as interchangeable.
>
> For that reason, a wallet implementing this convention should not just see
> “French words” and guess. It should know, or ask, which wordlist mode is
> being used:
>
>    1.
>
>    Legacy BIP39 French wordlist
>    2.
>
>    TZUR French display wordlist
>    3.
>
>    Canonical English BIP39
>
> The reference design also includes stable wordlist identifiers: language
> code, version, and SHA256 of the exact wordlist file. A wallet can persist
> that metadata alongside the wallet, and use it during restore to avoid
> ambiguity. But for maximum portability, the wallet should always show the
> canonical English mnemonic as the universal recovery form.
>
> So I think your concern is valid. The safe rule is: never auto-detect
> between legacy non-English BIP39 and TZUR display lists when both exist.
> Make the mode explicit, keep the English mnemonic available, and treat the
> display mnemonic as a UX layer rather than a new seed derivation scheme.
>
> Regards,
> Daniel
>
> On Saturday, June 13, 2026 at 7:36:05 PM UTC+3 conduition wrote:
>
>> Hey Daniel,
>>
>> So basically this would allow a user to translate an english language
>> BIP39 seed phrase to/from other languages, insured by the knowledge that it
>> be converted back if needed for compatibility with english-only BIP39
>> wallets.
>>
>> Neat idea. This is how BIP39 should've been done originally. Today, a
>> BIP39 seed derives a different master key depending on what language is
>> used to encode the source entropy, which was arguably a mistake because it
>> breaks compatibility between implementations written for different locales,
>> and incentivizes everyone to use seeds of the same language so that they're
>> maximally compatible (which is exactly what happened).
>>
>> I worry your proposal here might cause confusion if introduced today
>> though. Say you are given a french language 12-word seed phrase. Do you map
>> the word indices to english and then run PBKDF2 with your algorithm? Or do
>> you run PBKDF2 on the french version as specified in the original BIP39?
>>
>> There would need to be a clear way for humans and software to distinguish
>> between a "locale-mapped seed phrase" using your spec, and a legacy french
>> BIP39 seed phrase, so they know how to derive the correct master key.
>>
>> I guess one could argue that non-english BIP39 seeds are so uncommon that
>> you could safely assume the former, but still it leaves open an unfortunate
>> ambiguity which could lead to lost funds in some cases.
>>
>> What do you think of this?
>>
>> regards,
>> conduition
>> On Saturday, June 13th, 2026 at 12:04 PM, Daniel Osemberg <
>> danios...@gmail.com> wrote:
>>
>> Hi list,
>>
>> I would like to ask for early feedback on an idea before attempting any
>> formal BIP submission.
>>
>> The proposal is a display/input layer for BIP39 recovery phrases in
>> additional native languages.
>>
>> The important constraint is that this does not change the BIP39
>> cryptographic flow. The canonical mnemonic remains the existing BIP39
>> English wordlist, and PBKDF2 is still performed on the canonical English
>> form.
>>
>> The native-language lists are index-paired to the English BIP39 wordlist.
>> In other words, each native word maps to the same 0-2047 index as the
>> corresponding English BIP39 word. Wallets could display or accept the
>> native-language form for UX purposes, but internally normalize back to the
>> canonical English mnemonic before seed generation.
>>
>> Motivation:
>>
>> Many users around the world are asked to back up and restore Bitcoin
>> wallets using English recovery words, even when English is not their native
>> language. This creates UX risk, spelling mistakes, misunderstanding, and
>> lower confidence during backup and recovery.
>>
>> This proposal tries to improve multilingual recovery UX while keeping
>> compatibility with existing BIP39 behavior.
>>
>> This is not:
>>
>> A new seed scheme
>> A replacement for BIP39
>> A new cryptographic standard
>> A change to PBKDF2 input
>> A wallet-specific format
>>
>> It is intended as a display/input convention for wallets that want to
>> support native-language recovery UX while preserving canonical BIP39
>> compatibility.
>>
>> A draft implementation and wordlists are here:
>>
>> https://github.com/osem23/bip39-wordlists-tzur
>>
>> I would appreciate feedback on:
>>
>> Whether this idea is appropriate for the BIP process at all
>> Whether it should be considered informational rather than standards-track
>> Whether the index-paired approach creates hidden risks
>> Whether wallet developers see practical value in this
>> How native-speaker review and normalization rules should be handled
>> Whether there is prior work I should study before continuing
>>
>> Thank you,
>> Daniel
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Bitcoin Development Mailing List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to bitcoindev+...@googlegroups.com.
>> To view this discussion visit
>> https://groups.google.com/d/msgid/bitcoindev/e4ee1a70-aa70-4dbe-9f6c-27c26c5d17e0n%40googlegroups.com
>> .
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/7fe5fea8-ae9d-4081-a94f-fa9be0677012n%40googlegroups.com
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAFS8eiXHr3zbujpr3QndrCqAU%2BRtDxoOJca1EF5h2PvMGnCzqw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 18722 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists
       [not found]         ` <CAH9Jg5mxa4SWwT=_o9W_eL087SOHzX8s9CnPSsKNMarhq3KRHw@mail.gmail.com>
@ 2026-06-17 14:06           ` Daniel Osemberg
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Osemberg @ 2026-06-17 14:06 UTC (permalink / raw)
  To: Randy McMillan; +Cc: Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 3939 bytes --]

Hi,

I understand the point, but I think this misses the practical issue the
proposal is trying to address.

Yes, BIP39 is an application-layer specification, and yes, wallet
developers can theoretically offer locale options. But in practice, most
wallets do not provide full localization of the recovery experience, and
they do not provide a standardized way to map native-language input back to
the canonical English BIP39 flow.

That is the gap this proposal is addressing.

My own work started with BlockSight, a free Bitcoin block explorer
available in 31 languages. TZUR Wallet came later as a Bitcoin-only
self-custody wallet that integrates BlockSight natively inside the app. It
does not offer trading, conversion, buying, selling, custody, brokerage, or
any financial service. It only uses Bitcoin standards for wallet creation,
backup, receiving, and sending.

The reason this issue became obvious to me is that localization of Bitcoin
tools is still very incomplete. I am from Israel, and there are no
mainstream Bitcoin wallets with a fully Hebrew-native recovery experience,
and there is no Hebrew seed phrase experience that remains compatible with
the standard English BIP39 flow.

So while the responsibility is indeed on wallet developers to implement
good UX, there is currently no common convention for doing this safely and
interoperably across wallets.

The proposal is not saying that BIP39 itself is broken. It is saying that
the current ecosystem around BIP39 leaves many non-English users dependent
on English recovery words, and that creates a real UX and recovery-risk
problem.

This proposal tries to define a simple, optional, wallet-level convention:

English BIP39 remains canonical.

The localized words are fixed independent display/input lists.

The localized phrase maps deterministically by index back to the English
BIP39 wordlist.

The localized words are not passed directly into PBKDF2.

Wallets must always allow export of the canonical English BIP39 phrase.

In that sense, TZUR Wallet is simply one implementation of the proposal.
The broader contribution is the methodology and the prepared wordlists,
which are already available for review, testing, and potential adoption by
other wallet developers.

No one is asking for compensation for this work. The intent is to
contribute something useful back to the Bitcoin ecosystem, especially for
users who do not think, read, or back up critical financial information
naturally in English.

So I agree that wallet developers are responsible for implementation. But
that is exactly why a shared convention can be useful: it gives wallet
developers a common way to support native-language recovery UX while
preserving compatibility with English BIP39.

Best,
Daniel

On Wed, Jun 17, 2026 at 4:58 PM Randy McMillan <randy.lee.mcmillan@gmail.com>
wrote:

> “Many users around the world are asked to back up and restore Bitcoin
> wallets using English recovery words, even when English is not their native
> language. This creates UX risk, spelling mistakes, misunderstanding, and
> lower confidence during backup and recovery.”
>
> Note: Bip39 is categorized as an Applications layer specification - the
> concern in your motivation falls on the wallet developer and IMO isnt a
> short coming of the specification itself. Many applications give the user
> the ability to select locale - a wallet UI should offer wordlist options
> based on user locale preference.
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAFS8eiVN2WmJi0gpRFR%2Bab_C6VOFi6oPBx-g_EULwMDvaF%3DEtQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5173 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-17 14:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-06-08 14:24 [bitcoindev] [bitcoin-dev] Proposal discussion: BIP39 native-language display wordlists Daniel Osemberg
2026-06-13 16:34 ` 'conduition' via Bitcoin Development Mailing List
2026-06-13 16:39   ` Daniel Osemberg
2026-06-16 21:18     ` 'conduition' via Bitcoin Development Mailing List
2026-06-16 21:25       ` Daniel Osemberg
     [not found]         ` <CAH9Jg5mxa4SWwT=_o9W_eL087SOHzX8s9CnPSsKNMarhq3KRHw@mail.gmail.com>
2026-06-17 14:06           ` Daniel Osemberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox