Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
3b7fb76
Vectorized embedding quality calculation
SGD2718 Nov 3, 2025
80aa250
Added SpeakerInitializationMode enum to dictate behavior when initial…
SGD2718 Nov 3, 2025
8978171
Implemented a lot more features for finer control
SGD2718 Nov 3, 2025
aa0f95e
Added speaker permanence and merge detection
SGD2718 Nov 5, 2025
82c7239
Merge branch 'FluidInference:main' into main
SGD2718 Nov 5, 2025
ac05a63
Updated Documentation
SGD2718 Nov 6, 2025
a976c9c
Improved Concurrency Safety & fixed typos in documentation
SGD2718 Nov 6, 2025
c00d1e9
Fixed capitalization in docstrings
SGD2718 Nov 6, 2025
44d5b55
Fixed issue where merging did not respect permanent speakers in initi…
SGD2718 Nov 6, 2025
f9a48a0
fixed typos
SGD2718 Nov 6, 2025
bacd0b1
Update Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift
SGD2718 Nov 6, 2025
35abfb2
Merge branch 'main' into main
SGD2718 Nov 6, 2025
545d561
fixed a typo
SGD2718 Nov 6, 2025
4c822e6
Merge branch 'main' into main
SGD2718 Nov 6, 2025
afdb6d8
fixed formatting for a logger warning
SGD2718 Nov 6, 2025
fe1e3c2
Merge remote-tracking branch 'refs/remotes/origin/main'
SGD2718 Nov 6, 2025
189e38b
removed extra newline from end of file
SGD2718 Nov 6, 2025
f57b86d
I think i fixed the formatting issue
SGD2718 Nov 6, 2025
a492c82
Added Test Cases with Codex
SGD2718 Nov 8, 2025
ba68030
fixed the test that failed
SGD2718 Nov 8, 2025
432ee7c
Update Documentation/SpeakerManager.md
SGD2718 Nov 8, 2025
de77a38
Update Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift
SGD2718 Nov 8, 2025
7204b4d
Update Sources/FluidAudio/Diarizer/Clustering/SpeakerManager.swift
SGD2718 Nov 8, 2025
e9ea4d2
Update Documentation/SpeakerManager.md
SGD2718 Nov 8, 2025
4b60a44
Update Documentation/SpeakerManager.md
SGD2718 Nov 8, 2025
125c9af
Fixed Minor bugs
SGD2718 Nov 8, 2025
da57fc4
Merge remote-tracking branch 'refs/remotes/origin/main'
SGD2718 Nov 8, 2025
4a07829
Update Documentation/SpeakerManager.md
SGD2718 Nov 8, 2025
a6b99fd
Fixed Formatting Issues (I think)
SGD2718 Nov 8, 2025
e443a66
Merge remote-tracking branch 'refs/remotes/origin/main'
SGD2718 Nov 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 183 additions & 3 deletions Documentation/SpeakerManager.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# SpeakerManager API

Tracks and manages speaker identities across audio chunks for streaming diarization.
Expand Down Expand Up @@ -73,6 +74,21 @@ let bob = Speaker(id: "bob", name: "Bob", currentEmbedding: bobEmbedding)
speakerManager.initializeKnownSpeakers([alice, bob])
```

Sometimes, there are already speakers in the database that may have the same ID.
```swift
let alice = Speaker(id: "alice", name: "Alice", currentEmbedding: aliceEmbedding)
let bob = Speaker(id: "bob", name: "Bob", currentEmbedding: bobEmbedding)
speakerManager.initializeKnownSpeakers([alice, bob], mode: .overwrite, preserveIfPermanent: false) // replace any speakers with ID "alice" or "bob" with the new speakers, even if the old ones were marked as permanent.
```

> The `mode` argument dictates how to handle redundant speakers. It is of type `SpeakerInitializationMode`, and can take on one of four values:
> - `.reset`: reset the speaker database and add the new speakers
> - `.merge`: merge new speakers whose IDs match with existing ones
> - `.overwrite`: overwrite existing speakers with the same IDs as the new ones
> - `.skip`: skip adding speakers whose IDs match existing ones
>
> The `preserveIfPermanent` argument determines whether existing speakers marked as permanent should be preserved (i.e., not overwritten or merged). It is `true` by default.

**Use case:** When you have pre-recorded voice samples of known speakers and want to recognize them by name instead of numeric IDs.

#### upsertSpeaker
Expand All @@ -91,16 +107,145 @@ speakerManager.upsertSpeaker(
updateCount: 5, // optional
createdAt: Date(), // optional
updatedAt: Date() // optional
isPermanent: false // optional
)
```

**Behavior:**
- If speaker ID exists: updates the existing speaker's data
- If speaker ID is new: inserts as a new speaker
- Maintains ID uniqueness and tracks numeric IDs for auto-increment
- If `isPermanent` is true, then the new speaker or the existing speaker will become permanent. This means that the speaker will not be merged or removed without an override.

#### mergeSpeaker
```swift
// merge speaker 1 into "alice"
speakerManager.mergeSpeaker("1", into: "alice")

// merge speaker 2 into speaker 3 under the name "bob", regardless of whether speaker 2 is permanent.
speakerManager.mergeSpeaker("2", into: "3", mergedName: "Bob", stopIfPermanent: false)
```

**Behavior:**
- Unless `stopIfPermanent` is `false`, the merge will be stopped if the first speaker is permanent.
- Otherwise: Merges the first speaker into the destination speaker and removes the first speaker from the known speaker database.
- If `mergedName` is provided, the destination speaker will be renamed. Otherwise, its name will be preserved.

> Note: the `mergedName` argument is optional.
> Note: `stopIfPermanent` is `true` by default.

#### removeSpeaker
Remove a speaker from the database.

```swift
// remove speaker 1
speakerManager.removeSpeaker("1")

// remove "alice" from the known speaker database, even if they are marked as permanent
speakerManager.removeSpeaker("alice", keepIfPermanent: false)
```
> Note: `keepIfPermanent` is `true` by default.

#### removeSpeakersInactive
Remove speakers that have been inactive since a certain date or for a certain duration.

```swift
// remove speakers that have been inactive since `date`
speakerManager.removeSpeakersInactive(since: date)

// remove speakers that have been inactive for 10 seconds, even if they were marked as permanent
speakerManager.removeSpeakersInactive(for: 10.0, keepIfPermanent: false)
```

> Note: Both versions of the method have an optional `keepIfPermanent` argument that defaults to `true`.

#### removeAllSpeakers
Remove all speakers that match a given predicate.

```swift
// remove all speakers with less than 5 seconds of speaking time
speakerManager.removeSpeakers(
where: { $0.duration < 5.0 },
keepIfPermanent: false // also remove permanent speakers (optional)
)

// Alternate syntax (does NOT remove permanent speakers)
speakerManager.removeSpeakers {
$0.duration < 5.0
}
```

> Note: the predicate should take in a `Speaker` object and return a `Bool`.

#### makeSpeakerPermanent
Make the speaker permanent.

```swift
speakerManager.makeSpeakerPermanent("alice") // mark "alice" as permanent
```

#### revokePermanence
Make the speaker not permanent.

```swift
speakerManager.revokePermanence(from: "alice") // mark "alice" as not permanent
```

#### resetPermanentFlags
Mark all speakers as not permanent.

```swift
speakerManager.resetPermanentFlags()
```

### Speaker Retrieval

#### findSpeaker
Find the best matching speaker to an embedding vector and the cosine distance to them, unless no match is found.

```swift
let (id, distance) = speakerManager.findSpeaker(with: embedding)
```
> Note: there is an optional `speakerThreshold` argument to use a threshold other than the default.

#### findMatchingSpeakers
Find all speakers within the maximum `speakerThreshold` to an embedding vector.

```swift
for speaker in speakerManager.findMatchingSpeakers(with: embedding) {
print("ID: \(speaker.id), Distance: \(speaker.distance)")
}
```

> Note: there is an optional `speakerThreshold` argument to use a threshold other than the default.

#### findSpeakers
Find all speakers that meet a certain predicate.
```swift
// two ways to find all speakers with > 5.0s of speaking time.
speakerManager.findSpeakers(where: { $0.duration > 5.0 })
speakerManager.findSpeakers{ $0.duration > 5.0 }
// Returns an array of IDs corresponding to speakers that meet the predicate.
```

> Note: the predicate should take in a `Speaker` object and return a `Bool`.

#### findMergeablePairs
Find all pairs of speakers that might be the same person. Specifically, find the pairs of speakers such that the cosine distance between them is less than the `speakerThreshold`.
Comment on lines +233 to +234
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect Swift syntax: parameter labels should use colons (:) not equals signs (=). The correct syntax is:

let pairs = speakerManager.findMergeablePairs(
    speakerThreshold: 0.6, // optional
    excludeIfBothPermanent: true // optional
)
Suggested change
#### findMergeablePairs
Find all pairs of speakers that might be the same person. Specifically, find the pairs of speakers such that the cosine distance between them is less than the `speakerThreshold`.
speakerThreshold: 0.6, // optional
excludeIfBothPermanent: true // optional

Copilot uses AI. Check for mistakes.

Returns a list of pairs of speaker IDs.

```swift
let pairs = speakerManager.findMergeablePairs(
speakerThreshold: 0.6, // optional
excludeIfBothPermanent: true // optional
)

for pair in pairs {
print("Merge Speaker \(pair.speakerToMerge) into Speaker \(pair.destination)")
}
```

#### getSpeaker
Get a specific speaker by ID.

Expand All @@ -118,6 +263,22 @@ let allSpeakers = speakerManager.getAllSpeakers()
// Returns: [String: Speaker] - dictionary keyed by speaker ID
```

#### getSpeakerList
Get all speakers in the database as an array of speakers (for testing/debugging)
```swift
let allSpeakers = speakerManager.getSpeakerList()
// Returns: [Speaker] - Array of speakers
```

#### hasSpeaker
Check if the speaker database has a speaker with a given ID.

```swift
if speakerManager.hasSpeaker("alice") {
print("Alice was found in the database")
}
```

#### speakerCount
Get the total number of tracked speakers.

Expand All @@ -140,13 +301,16 @@ Clear all speakers from the database.

```swift
speakerManager.reset()
speakerManager.reset(keepIfPermanent: true) // remove all non-permanent speakers from the database
```

Useful for:
- Starting a new session
- Freeing memory between recordings
- Resetting speaker tracking



## Speaker Enrollment

The `Speaker` class includes a `name` field for speaker enrollment workflows:
Expand Down Expand Up @@ -237,6 +401,7 @@ public final class Speaker: Identifiable, Codable {
public var updatedAt: Date // Last update timestamp
public var updateCount: Int // Number of updates
public var rawEmbeddings: [RawEmbedding] // Historical embeddings (max 50)
public var isPermanent: Bool // Permanence flag
}
```

Expand Down Expand Up @@ -547,13 +712,25 @@ class RealtimeDiarizer {
| Method | Returns | Description |
Copy link

Copilot AI Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing backticks and opening pipe character in markdown table row. For consistent table formatting, the method name should be wrapped in backticks and the row should start with |: | \findSpeakers(where:)` | [String] | ...`

Suggested change
| Method | Returns | Description |
| `findMergeablePairs(speakerThreshold:excludeIfBothPermanent:)` | [(speakerToMerge: String, destination: String)] | Find all pairs of very similar speakers |

Copilot uses AI. Check for mistakes.
|--------|---------|-------------|
| `assignSpeaker(_:speechDuration:confidence:)` | `Speaker?` | Assign/create speaker from embedding |
| `initializeKnownSpeakers(_:)` | `Void` | Pre-load known speaker profiles |
| `initializeKnownSpeakers(_:mode:preserveIfPermanent:)` | `Void` | Pre-load known speaker profiles |
| `findSpeaker(with:speakerThreshold:)` | `(id: String?, distance: Float)` | Find speaker that matches an embedding |
| `findMatchingSpeakers(with:speakerThreshold:)` | `[(id: String, distance: Float)]` | Find all speakers that match an embedding |
| `findSpeakers(where:)` | `[String]` | Find all speakers that meet a certain predicate
| `findMergeablePairs(speakerThreshold:excludeIfBothPermanent:)` | `[(speakerToMerge: String, destination: String)]` | Find all pairs of very similar speakers |
| `removeSpeaker(_:keepIfPermanent:)` | `Void` | Remove a speaker from the database |
| `removeSpeakersInactive(since:keepIfPermanent:)` | `Void` | Remove speakers inactive since a given date |
| `removeSpeakersInactive(for:keepIfPermanent:)` | `Void` | Remove speakers inactive for a given duration |
| `removeSpeakers(where:)` | `Void` | Remove speakers that satisfy a given predicate |
| `removeSpeakers(where:keepIfPermanent:)` | `Void` | Remove speakers that satisfy a given predicate |
| `mergeSpeaker(_:into:mergedName:stopIfPermanent:)` | `Void` | Merge a speaker into another one |
| `upsertSpeaker(_:)` | `Void` | Update or insert speaker (from object) |
| `upsertSpeaker(id:currentEmbedding:duration:...)` | `Void` | Update or insert speaker (from params) |
| `getSpeaker(for:)` | `Speaker?` | Get speaker by ID |
| `getAllSpeakers()` | `[String: Speaker]` | Get all speakers (debugging) |
| `reset()` | `Void` | Clear speaker database |
| `reassignSegment(segmentId:from:to:)` | `Bool` | Move segment between speakers |
| `getSpeakerList()` | `[Speaker]` | Get array of all speakers (debugging) |
| `hasSpeaker(_:)` | `Bool` | Check if database has a speaker with a given ID |
| `reset(keepIfPermanent:)` | `Void` | Clear speaker database |
| `resetPermanentFlags()` | `Void` | Mark all speakers as not permanent |
| `getCurrentSpeakerNames()` | `[String]` | Get sorted speaker IDs |
| `getGlobalSpeakerStats()` | `(Int, Float, Float, Int)` | Aggregate statistics |

Expand All @@ -567,6 +744,7 @@ class RealtimeDiarizer {
| `minEmbeddingUpdateDuration` | `Float` | Min duration to update embeddings (seconds) |
| `speakerCount` | `Int` | Number of tracked speakers |
| `speakerIds` | `[String]` | Sorted array of speaker IDs |
| `permanentSpeakerIds` | `[String]` | Sorted array of speaker IDs of permanent speakers |

### Speaker Properties

Expand All @@ -580,6 +758,7 @@ class RealtimeDiarizer {
| `updatedAt` | `Date` | Last update timestamp |
| `updateCount` | `Int` | Number of embedding updates |
| `rawEmbeddings` | `[RawEmbedding]` | Historical embeddings (max 50) |
| `isPermanent` | `Bool` | Permanence flag |

### Speaker Methods

Expand All @@ -602,6 +781,7 @@ class RealtimeDiarizer {
| `averageEmbeddings(_:)` | `[Float]?` | Average multiple embeddings |
| `createSpeaker(id:name:duration:embedding:config:)` | `Speaker?` | Create validated speaker |
| `updateEmbedding(current:new:alpha:)` | `[Float]?` | EMA update (pure function) |
| `reassignSegment(segmentId:from:to:)` | `Bool` | Move segment between speakers |

## See Also

Expand Down
Loading
Loading