-
Notifications
You must be signed in to change notification settings - Fork 5.9k
BIP Draft: Formosa — Themed mnemonic sentences for generating deterministic keys #2108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,224 @@ | ||
| <pre> | ||
| BIP: ? | ||
| Layer: Applications | ||
| Title: Encoding seed as themed mnemonic sentences | ||
| Authors: Yuri S Villas Boas <yuri@t3infosecurity.com> | ||
| André Fidencio Gonçalves <andre7c4@gmail.com> | ||
| Status: Draft | ||
| Type: Specification | ||
| Assigned: ? | ||
| License: BSD-2-Clause | ||
| Requires: 32, 39 | ||
| Discussion: https://gnusha.org/pi/bitcoindev/jQqInjh7VTC5byefTzENidJjigvRqf5Y7UvbrWjKPJykvhdlLETeglGE3zoAiVAxUyAXU8uWHsHEjJ0MHqqPTy4prgaIhgMyIrD9c6ZUuE0=@pm.me/#t | ||
| https://gnusha.org/pi/bitcoindev/F4cs-RJRQYBXhjoS9fc_cUc93yLrkQS5DNQAeFRHrLEQ5bScCjKSnaqN-IcXb16fxqO053muqFCx8_GzzKN5XCGCIHD9Ir1_baI5voKYfOo=@pm.me/ | ||
| https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management | ||
| </pre> | ||
|
|
||
| ==Abstract== | ||
|
|
||
| This BIP describes an expansion of BIP-0039 for the generation of deterministic | ||
| wallets. Where BIP-0039 uses a flat list of unrelated words, Formosa organizes | ||
| mnemonic words into themed sentences with syntactic structure and semantic | ||
| coherence, substantially improving memorability while retaining all properties | ||
| of the original scheme. | ||
|
|
||
| It consists of two parts: generating the mnemonic and converting it into a | ||
| binary seed. This seed can be later used to generate deterministic wallets using | ||
| BIP-0032 or similar methods. | ||
|
|
||
| Full forward and backward compatibility with BIP-0039 is maintained: seed | ||
| derivation internally converts any Formosa mnemonic back to its equivalent | ||
| BIP-0039 representation, so existing keys and addresses are preserved. | ||
|
|
||
| ==Copyright== | ||
|
|
||
| This BIP is licensed under the BSD 2-clause license. | ||
|
|
||
| ==Motivation== | ||
|
|
||
| A mnemonic code or sentence is superior for human interaction compared to the | ||
| handling of raw binary or hexadecimal representations of a wallet seed. The | ||
| sentence could be written on paper or spoken over the telephone. | ||
|
|
||
| However, human memory is an associative process: information is more readily | ||
| retained when it can be linked to existing knowledge through semantic | ||
| associations, visual imagery, and narrative context. A BIP-0039 mnemonic is a | ||
| sequence of unrelated words with no syntactic or semantic relationship, making | ||
| it difficult to form the mental associations that aid long-term retention. | ||
|
|
||
| Formosa builds upon BIP-0039 by organizing mnemonic words into themed sentences | ||
| with syntactic roles (e.g., subject, adjective, object, location). Each sentence | ||
| draws vocabulary from a coherent semantic domain --- medieval fantasy, science | ||
| fiction, nature, finance, or any custom theme --- enabling the user to form vivid | ||
| mental images that reduce memorization effort per bit of entropy. | ||
|
|
||
| This guide is meant to be a way to transport computer-generated randomness with | ||
| a human-readable transcription. It's not a way to process user-created | ||
| sentences (also known as brainwallets) into a wallet seed. | ||
|
|
||
| ==Generating the mnemonic== | ||
|
|
||
| The mnemonic must encode entropy in a multiple of 32 bits. With more entropy | ||
| security is improved but the sentence length increases. We refer to the | ||
| initial entropy length as ENT. The allowed size of ENT is 128-256 bits. | ||
|
|
||
| First, an initial entropy of ENT bits is generated. A checksum is generated by | ||
| taking the first <code>ENT / 32</code> bits of its SHA256 hash. This checksum is | ||
| appended to the end of the initial entropy. Next, these concatenated bits | ||
| are split into groups of 33 bits, which we call '''sentences'''. Each sentence is | ||
| further subdivided into variable-length bit fields, one per syntactic category, | ||
| whose lengths are defined by the active theme. Each bit field encodes an index | ||
| into the corresponding category's word list. Finally, we convert these indices | ||
| into words and use the joined words as a mnemonic sentence. | ||
|
|
||
| BIP-0039 is a special case where each sentence contains three 11-bit fields | ||
| indexing a single 2048-word list (3 x 11 = 33). | ||
|
|
||
| The following table describes the relation between the initial entropy | ||
| length (ENT), the checksum length (CS), the number of 33-bit sentences (S), | ||
| and the length of the generated mnemonic sentence (MS) in words. The word | ||
| count assumes a 6-word theme; for BIP-0039 (3 words per sentence), divide by 2. | ||
|
|
||
| <pre> | ||
| CS = ENT / 32 | ||
| S = (ENT + CS) / 33 | ||
|
|
||
| | ENT | CS | ENT+CS | S | MS (6-word) | MS (BIP-0039) | | ||
| +-------+----+--------+-----+-------------+---------------+ | ||
| | 128 | 4 | 132 | 4 | 24 | 12 | | ||
| | 160 | 5 | 165 | 5 | 30 | 15 | | ||
| | 192 | 6 | 198 | 6 | 36 | 18 | | ||
| | 224 | 7 | 231 | 7 | 42 | 21 | | ||
| | 256 | 8 | 264 | 8 | 48 | 24 | | ||
| </pre> | ||
|
Comment on lines
+77
to
+93
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’m not completely opposed to a text-only presentation, but wanted to point out that Mediawiki does include table formatting, and most readers of the BIP would probably see the rendered version. When using table formatting, I think it would be possible to skip over the abbreviations to label the table and it would still fit. |
||
|
|
||
| For each 33-bit sentence, the word selection algorithm proceeds as follows: | ||
|
|
||
| # Initialize an empty sentence array with one slot per category. | ||
| # For each category in the theme's ''filling order'': | ||
| ## Extract <code>BIT_LENGTH</code> bits from the current position in the bit stream. | ||
| ## Interpret them as an unsigned integer index. | ||
| ## If the category is ''led by'' another category, look up the appropriate sub-list from the leading category's mapping using the already-selected leading word. Otherwise, use the category's total word list. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don’t understand what you mean with “if the category is led by” |
||
| ## Select the word at the computed index from the resolved word list. | ||
| ## Place the word into the sentence array at the position given by the theme's ''natural order''. | ||
| # Output the words in natural order. | ||
|
|
||
| ==Themes== | ||
|
|
||
| The Formosa equivalent to a BIP-0039 wordlist is a '''theme'''. A theme is a JSON | ||
| document that defines syntactic categories, their word lists, bit-widths, and | ||
| optional semantic restrictions between categories. The sum of all category | ||
| bit-widths in a theme MUST equal 33. | ||
|
|
||
| An ideal theme has the following characteristics: | ||
|
|
||
| a) specific semantic scope (memory block) | ||
| - the entire vocabulary should adhere to a single coherent topic, enabling | ||
| the user to form a unified mental scene | ||
|
|
||
| b) concrete imagery | ||
| - categories should consist of elements easily associated with mental images. | ||
| Prefer concrete nouns and tangible adjectives over abstract terms | ||
|
|
||
| c) sorted wordlists | ||
| - the wordlist is sorted which allows for more efficient lookup of the code words | ||
| (i.e. implementations can use binary search instead of linear search) | ||
|
|
||
| d) first-letters uniqueness | ||
| - the wordlist is created in such a way that it's enough to type the first two | ||
| letters to unambiguously identify the word | ||
|
Comment on lines
+115
to
+129
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| The first-letters uniqueness property yields higher information density than | ||
| BIP-0039. In BIP-0039, four characters are needed to identify each word, | ||
| encoding 11 bits per 4 characters = 2.75 bits/character. In Formosa, two | ||
| characters suffice per word. The achievable density depends on the theme's | ||
| category bit-widths: | ||
|
|
||
| <pre> | ||
| | List size | Bits | Chars to identify | Density (bits/char) | | ||
| +-----------+------+-------------------+---------------------+ | ||
| | 2048 | 11 | 4 | 2.75 (BIP-0039) | | ||
| | 32 | 5 | 2 | 2.50 | | ||
| | 64 | 6 | 2 | 3.00 | | ||
| | 128 | 7 | 2 | 3.50 | | ||
| </pre> | ||
|
|
||
| As an example, the ''nationalities'' theme uses four 7-bit nationality | ||
| categories (128 entries each) and one 5-bit profession category (32 entries), | ||
| yielding 33 bits per 5-word sentence. A user typing only the first two | ||
| characters of each word types 10 characters to encode 33 bits, achieving an | ||
| information density of 33 / 10 = 3.30 bits/character --- a 20% improvement | ||
| over BIP-0039's 2.75 bits/character | ||
|
|
||
| e) semantic restrictions (optional) | ||
| - themes may define restrictions between categories so that the available word list | ||
| for one category changes depending on the word selected in a leading category, | ||
| producing more semantically coherent sentences. Restriction relationships MUST | ||
| be acyclic | ||
|
|
||
| The wordlist can contain native characters, but they must be encoded in UTF-8 | ||
| using Normalization Form Compatibility Decomposition (NFKD). | ||
|
|
||
| ==From mnemonic to seed== | ||
|
|
||
| A user may decide to protect their mnemonic with a passphrase. If a passphrase is not | ||
| present, an empty string "" is used instead. | ||
|
|
||
| To ensure forward and backward compatibility with BIP-0039, seed derivation first | ||
| converts any Formosa mnemonic back to its equivalent BIP-0039 mnemonic by extracting | ||
| the underlying entropy and re-encoding it using the BIP-0039 English word list. This | ||
| guarantees that the same entropy always produces the same seed, keys, and addresses | ||
| regardless of which theme was used. | ||
|
|
||
| To create a binary seed from the resulting BIP-0039 mnemonic, we use the PBKDF2 function | ||
| with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + | ||
| passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and | ||
| HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 | ||
| bits (= 64 bytes). | ||
|
|
||
| This seed can be later used to generate deterministic wallets using BIP-0032 or | ||
| similar methods. | ||
|
|
||
| The conversion of the mnemonic sentence to a binary seed is completely independent | ||
| from generating the sentence. This results in a rather simple code; there are no | ||
| constraints on sentence structure and clients are free to implement their own | ||
| themes or even whole sentence generators, allowing for flexibility in wordlists | ||
| for typo detection or other purposes. | ||
|
|
||
| Although using a mnemonic not generated by the algorithm described in "Generating the | ||
| mnemonic" section is possible, this is not advised and software must compute a | ||
| checksum for the mnemonic sentence using a wordlist and issue a warning if it is | ||
| invalid. | ||
|
|
||
| The described method also provides plausible deniability, because every passphrase | ||
| generates a valid seed (and thus a deterministic wallet) but only the correct one | ||
| will make the desired wallet available. | ||
|
|
||
| ==Standard themes== | ||
|
|
||
| The reference implementation ships with standard themes listed at the link below. | ||
| Since BIP-0039 is a valid Formosa theme, all existing BIP-0039 mnemonics work | ||
| without modification. | ||
|
|
||
| It is '''strongly discouraged''' to use non-standard custom themes for generating | ||
| mnemonic sentences, as the user assumes responsibility for ensuring the theme file | ||
| remains available and structurally valid. Users with proper training in security | ||
| protocols who understand these risks may benefit from custom themes through higher | ||
| memorization efficiency or an additional layer of obscurity. | ||
|
|
||
| * [[https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes|Standard Formosa Themes]] | ||
|
|
||
| ==Test vectors== | ||
|
|
||
| The test vectors include input entropy, mnemonic and seed. The | ||
| passphrase "TREZOR" is used for all vectors. Since Formosa converts back to | ||
| BIP-0039 before seed derivation, the same test vectors apply to all themes | ||
| given the same underlying entropy. | ||
|
|
||
| https://github.com/Yuri-SVB/formosa/blob/master/vectors.json | ||
|
|
||
| ==Reference Implementation== | ||
|
|
||
| Reference implementation including themes is available from | ||
|
|
||
| https://github.com/Yuri-SVB/formosa | ||

Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s slightly confusing that you speak about multiple sentences that together compose to a single mnemonic sentence. Perhaps it would be better to use distinct terms, i.e, to use a different term for sentences or for the mnemonic sentence. I’m not convinced it’s the right suggestion, but perhaps,
Ssentences make one “mnemonic story” withMSwords?