bitcoin · Yuri-SVB · Feb 28, 2026 · Mar 23, 2026 · Mar 23, 2026 · murchandamus
diff --git a/bip.mediawiki b/bip.mediawiki
@@ -0,0 +1,224 @@
+<pre>
+  BIP: ?
+  Layer: Applications
+  Title: Encoding seed as themed mnemonic sentences
+  Authors: Yuri S Villas Boas <yuri@t3infosecurity.com>
+           André Fidencio Gonçalves <andre7c4@gmail.com>
+  Status: Draft
+  Type: Specification
+  Assigned: ?
+  License: BSD-2-Clause
+  Requires: 32, 39
+  Discussion: https://gnusha.org/pi/bitcoindev/jQqInjh7VTC5byefTzENidJjigvRqf5Y7UvbrWjKPJykvhdlLETeglGE3zoAiVAxUyAXU8uWHsHEjJ0MHqqPTy4prgaIhgMyIrD9c6ZUuE0=@pm.me/#t
+              https://gnusha.org/pi/bitcoindev/F4cs-RJRQYBXhjoS9fc_cUc93yLrkQS5DNQAeFRHrLEQ5bScCjKSnaqN-IcXb16fxqO053muqFCx8_GzzKN5XCGCIHD9Ir1_baI5voKYfOo=@pm.me/
+              https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management
+</pre>
+
+==Abstract==
+
+This BIP describes an expansion of BIP-0039 for the generation of deterministic
+wallets. Where BIP-0039 uses a flat list of unrelated words, Formosa organizes
+mnemonic words into themed sentences with syntactic structure and semantic
+coherence, substantially improving memorability while retaining all properties
+of the original scheme.
+
+It consists of two parts: generating the mnemonic and converting it into a
+binary seed. This seed can be later used to generate deterministic wallets using
+BIP-0032 or similar methods.
+
+Full forward and backward compatibility with BIP-0039 is maintained: seed
+derivation internally converts any Formosa mnemonic back to its equivalent
+BIP-0039 representation, so existing keys and addresses are preserved.
+
+==Copyright==
+
+This BIP is licensed under the BSD 2-clause license.
+
+==Motivation==
+
+A mnemonic code or sentence is superior for human interaction compared to the
+handling of raw binary or hexadecimal representations of a wallet seed. The
+sentence could be written on paper or spoken over the telephone.
+
+However, human memory is an associative process: information is more readily
+retained when it can be linked to existing knowledge through semantic
+associations, visual imagery, and narrative context. A BIP-0039 mnemonic is a
+sequence of unrelated words with no syntactic or semantic relationship, making
+it difficult to form the mental associations that aid long-term retention.
+
+Formosa builds upon BIP-0039 by organizing mnemonic words into themed sentences
+with syntactic roles (e.g., subject, adjective, object, location). Each sentence
+draws vocabulary from a coherent semantic domain --- medieval fantasy, science
+fiction, nature, finance, or any custom theme --- enabling the user to form vivid
+mental images that reduce memorization effort per bit of entropy.
+
+This guide is meant to be a way to transport computer-generated randomness with
+a human-readable transcription. It's not a way to process user-created
+sentences (also known as brainwallets) into a wallet seed.
+
+==Generating the mnemonic==
+
+The mnemonic must encode entropy in a multiple of 32 bits. With more entropy
+security is improved but the sentence length increases. We refer to the
+initial entropy length as ENT. The allowed size of ENT is 128-256 bits.
+
+First, an initial entropy of ENT bits is generated. A checksum is generated by
+taking the first <code>ENT / 32</code> bits of its SHA256 hash. This checksum is
+appended to the end of the initial entropy. Next, these concatenated bits
+are split into groups of 33 bits, which we call '''sentences'''. Each sentence is
+further subdivided into variable-length bit fields, one per syntactic category,
+whose lengths are defined by the active theme. Each bit field encodes an index
+into the corresponding category's word list. Finally, we convert these indices
+into words and use the joined words as a mnemonic sentence.
+
+BIP-0039 is a special case where each sentence contains three 11-bit fields
+indexing a single 2048-word list (3 x 11 = 33).
+
+The following table describes the relation between the initial entropy
+length (ENT), the checksum length (CS), the number of 33-bit sentences (S),
+and the length of the generated mnemonic sentence (MS) in words. The word
+count assumes a 6-word theme; for BIP-0039 (3 words per sentence), divide by 2.
+
+<pre>
+CS = ENT / 32
+S  = (ENT + CS) / 33
+
+|  ENT  | CS | ENT+CS |  S  | MS (6-word) | MS (BIP-0039) |
++-------+----+--------+-----+-------------+---------------+
+|  128  |  4 |   132  |  4  |     24      |      12       |
+|  160  |  5 |   165  |  5  |     30      |      15       |
+|  192  |  6 |   198  |  6  |     36      |      18       |
+|  224  |  7 |   231  |  7  |     42      |      21       |
+|  256  |  8 |   264  |  8  |     48      |      24       |
+</pre>
+
+For each 33-bit sentence, the word selection algorithm proceeds as follows:
+
+# Initialize an empty sentence array with one slot per category.
+# For each category in the theme's ''filling order'':
+## Extract <code>BIT_LENGTH</code> bits from the current position in the bit stream.
+## Interpret them as an unsigned integer index.
+## If the category is ''led by'' another category, look up the appropriate sub-list from the leading category's mapping using the already-selected leading word. Otherwise, use the category's total word list.
+## Select the word at the computed index from the resolved word list.
+## Place the word into the sentence array at the position given by the theme's ''natural order''.
+# Output the words in natural order.
+
+==Themes==
+
+The Formosa equivalent to a BIP-0039 wordlist is a '''theme'''. A theme is a JSON
+document that defines syntactic categories, their word lists, bit-widths, and
+optional semantic restrictions between categories. The sum of all category
+bit-widths in a theme MUST equal 33.
+
+An ideal theme has the following characteristics:
+
+a) specific semantic scope (memory block)
+   - the entire vocabulary should adhere to a single coherent topic, enabling
+     the user to form a unified mental scene
+
+b) concrete imagery
+   - categories should consist of elements easily associated with mental images.
+     Prefer concrete nouns and tangible adjectives over abstract terms
+
+c) sorted wordlists
+   - the wordlist is sorted which allows for more efficient lookup of the code words
+     (i.e. implementations can use binary search instead of linear search)
+
+d) first-letters uniqueness
+   - the wordlist is created in such a way that it's enough to type the first two
+     letters to unambiguously identify the word
+
+The first-letters uniqueness property yields higher information density than
+BIP-0039. In BIP-0039, four characters are needed to identify each word,
+encoding 11 bits per 4 characters = 2.75 bits/character. In Formosa, two
+characters suffice per word. The achievable density depends on the theme's
+category bit-widths:
+
+<pre>
+| List size | Bits | Chars to identify | Density (bits/char) |
++-----------+------+-------------------+---------------------+
+|   2048    |  11  |        4          |   2.75 (BIP-0039)   |
+|    32     |   5  |        2          |   2.50              |
+|    64     |   6  |        2          |   3.00              |
+|   128     |   7  |        2          |   3.50              |
+</pre>
+
+As an example, the ''nationalities'' theme uses four 7-bit nationality
+categories (128 entries each) and one 5-bit profession category (32 entries),
+yielding 33 bits per 5-word sentence. A user typing only the first two
+characters of each word types 10 characters to encode 33 bits, achieving an
+information density of 33 / 10 = 3.30 bits/character --- a 20% improvement
+over BIP-0039's 2.75 bits/character
+
+e) semantic restrictions (optional)
+   - themes may define restrictions between categories so that the available word list
+     for one category changes depending on the word selected in a leading category,
+     producing more semantically coherent sentences. Restriction relationships MUST
+     be acyclic
+
+The wordlist can contain native characters, but they must be encoded in UTF-8
+using Normalization Form Compatibility Decomposition (NFKD).
+
+==From mnemonic to seed==
+
+A user may decide to protect their mnemonic with a passphrase. If a passphrase is not
+present, an empty string "" is used instead.
+
+To ensure forward and backward compatibility with BIP-0039, seed derivation first
+converts any Formosa mnemonic back to its equivalent BIP-0039 mnemonic by extracting
+the underlying entropy and re-encoding it using the BIP-0039 English word list. This
+guarantees that the same entropy always produces the same seed, keys, and addresses
+regardless of which theme was used.
+
+To create a binary seed from the resulting BIP-0039 mnemonic, we use the PBKDF2 function
+with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" +
+passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and
+HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512
+bits (= 64 bytes).
+
+This seed can be later used to generate deterministic wallets using BIP-0032 or
+similar methods.
+
+The conversion of the mnemonic sentence to a binary seed is completely independent
+from generating the sentence. This results in a rather simple code; there are no
+constraints on sentence structure and clients are free to implement their own
+themes or even whole sentence generators, allowing for flexibility in wordlists
+for typo detection or other purposes.
+
+Although using a mnemonic not generated by the algorithm described in "Generating the
+mnemonic" section is possible, this is not advised and software must compute a
+checksum for the mnemonic sentence using a wordlist and issue a warning if it is
+invalid.
+
+The described method also provides plausible deniability, because every passphrase
+generates a valid seed (and thus a deterministic wallet) but only the correct one
+will make the desired wallet available.
+
+==Standard themes==
+
+The reference implementation ships with standard themes listed at the link below.
+Since BIP-0039 is a valid Formosa theme, all existing BIP-0039 mnemonics work
+without modification.
+
+It is '''strongly discouraged''' to use non-standard custom themes for generating
+mnemonic sentences, as the user assumes responsibility for ensuring the theme file
+remains available and structurally valid. Users with proper training in security
+protocols who understand these risks may benefit from custom themes through higher
+memorization efficiency or an additional layer of obscurity.
+
+* [[https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes|Standard Formosa Themes]]
+
+==Test vectors==
+
+The test vectors include input entropy, mnemonic and seed. The
+passphrase "TREZOR" is used for all vectors. Since Formosa converts back to
+BIP-0039 before seed derivation, the same test vectors apply to all themes
+given the same underlying entropy.
+
+https://github.com/Yuri-SVB/formosa/blob/master/vectors.json
+
+==Reference Implementation==
+
+Reference implementation including themes is available from
+
+https://github.com/Yuri-SVB/formosa