Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions developer-guide/core-features/fine-grained-control.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,7 @@
icon="language"
href="/developer-guide/core-features/fine-grained-control/japanese"
>
OpenJTalk romaji phonemes with pitch accent digits or rising/falling edge
markers.
OpenJTalk romaji phonemes with pitch accent digits.
</Card>
</CardGroup>

Expand All @@ -84,9 +83,9 @@
<|phoneme_start|>ha0shi1ga0<|phoneme_end|>見えます。
```

## Paralanguage

Check warning on line 86 in developer-guide/core-features/fine-grained-control.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control.mdx#L86

Did you really mean 'Paralanguage'?

Paralanguage controls allow you to add natural speech elements and pauses to make the generated speech sound more human-like. There are two main types of controls:

Check warning on line 88 in developer-guide/core-features/fine-grained-control.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control.mdx#L88

Did you really mean 'Paralanguage'?

### Pause Words

Expand Down Expand Up @@ -117,7 +116,7 @@
I am, um, an (break) engineer.
```

You can combine paralanguage and phoneme control in the same text:

Check warning on line 119 in developer-guide/core-features/fine-grained-control.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control.mdx#L119

Did you really mean 'paralanguage'?

```text
I am, um, an (break) <|phoneme_start|>EH1 N JH AH0 N IH1 R<|phoneme_end|>.
Expand Down
39 changes: 8 additions & 31 deletions developer-guide/core-features/fine-grained-control/japanese.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "Japanese Phoneme Control"
description: "Control Japanese pronunciation with romaji phonemes and pitch accent markers"

Check warning on line 3 in developer-guide/core-features/fine-grained-control/japanese.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control/japanese.mdx#L3

Did you really mean 'romaji'?
icon: "language"
---

## Overview

Japanese phoneme control uses OpenJTalk-style romaji phonemes plus pitch accent information. This is useful for Japanese homographs that have the same plain phoneme sequence but different pitch accents, such as `端が`, `箸が`, and `橋が`.

Check warning on line 9 in developer-guide/core-features/fine-grained-control/japanese.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control/japanese.mdx#L9

Did you really mean 'romaji'?

```text
Standard: 橋が見えます。
Expand All @@ -27,7 +27,7 @@

The following examples all share the plain phoneme sequence `h a sh i g a`, but the pitch markers disambiguate the word:

- `端が` (edge + subject marker): `<|phoneme_start|>ha0shi1ga1<|phoneme_end|>`
- `端が` (end + subject marker): `<|phoneme_start|>ha0shi1ga1<|phoneme_end|>`
- `箸が` (chopsticks + subject marker): `<|phoneme_start|>ha1shi0ga0<|phoneme_end|>`
- `橋が` (bridge + subject marker): `<|phoneme_start|>ha0shi1ga0<|phoneme_end|>`

Expand All @@ -37,16 +37,11 @@
and adjust the digits when you need a specific accent.
</Note>

## Relation to ttslearn Prosody Symbols

Check warning on line 40 in developer-guide/core-features/fine-grained-control/japanese.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control/japanese.mdx#L40

Did you really mean 'ttslearn'?

The [ttslearn Japanese Tacotron recipe](https://r9y9.github.io/ttslearn/latest/notebooks/ch10_Recipe-Tacotron.html#%E3%83%95%E3%83%AB%E3%82%B3%E3%83%B3%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%83%A9%E3%83%99%E3%83%AB%E3%81%8B%E3%82%89%E3%81%AE%E9%9F%B3%E7%B4%A0%E5%88%97%E3%81%8A%E3%82%88%E3%81%B3%E9%9F%BB%E5%BE%8B%E8%A8%98%E5%8F%B7%E3%81%AE%E6%8A%BD%E5%87%BA) shows how to extract phonemes and prosody symbols from OpenJTalk full-context labels. That recipe prints symbols such as `[` for a pitch rise and `]` for a pitch fall.

Fish Audio phoneme tags should not contain literal `[` or `]`. Convert that prosody into either:

- Digit notation, such as `ha0shi1ga0`.
- Edge notation, such as `haJshiLga`, where `J` marks a rising edge and `L` marks a falling edge.

Use one notation style consistently inside each phoneme tag.
Fish Audio phoneme tags should not contain literal `[` or `]`. Convert that prosody into digit notation, such as `ha0shi1ga0`.

## Generate Japanese Phonemes

Expand All @@ -65,12 +60,7 @@
JAPANESE_VOWELS = "aiueoAIUEON"


def japanese_to_romaji_with_accent(
sentence: str,
boundary: bool = False,
rise_edge: str = "J",
fall_edge: str = "L",
) -> str:
def japanese_to_romaji_with_accent(sentence: str) -> str:
text = ""
labels = pyopenjtalk.extract_fullcontext(sentence)
level = -1
Expand All @@ -94,40 +84,27 @@

# Accent phrase boundary
if a3 == 1 and a2_next == 1:
if boundary:
if level >= 0:
text += " "
else:
if level >= 0:
text += str(level)
if level >= 0:
text += str(level)
level = -1
# Falling
elif a1 == 0 and a2_next == a2 + 1:
level = 0
if boundary:
text += fall_edge
else:
text += "1"
text += "1"
# Rising
elif a2 == 1 and a2_next == 2:
level = 1
if boundary:
text += rise_edge
else:
text += "0"
text += "0"
elif phoneme in JAPANESE_VOWELS:
if level < 0:
level = 0
if not boundary:
text += str(level)
text += str(level)

return text


print(japanese_to_romaji_with_accent("橋が"))
# ha0shi1ga0
print(japanese_to_romaji_with_accent("橋が", boundary=True))
# haJshiLga
```

Then place the result inside the phoneme tags:
Expand Down Expand Up @@ -160,4 +137,4 @@
<|phoneme_start|>very long paragraph with multiple clauses...<|phoneme_end|>
```

If your text contains symbols that OpenJTalk should read as words, normalize them before conversion. For example, the training preprocessor converted `%` to `パーセント` before extracting phonemes.

Check warning on line 140 in developer-guide/core-features/fine-grained-control/japanese.mdx

View check run for this annotation

Mintlify / Mintlify Validation (hanabiaiinc) - vale-spellcheck

developer-guide/core-features/fine-grained-control/japanese.mdx#L140

Did you really mean 'preprocessor'?
Loading