Skip to content

Incorrect results from transcribing audio from file #740

@msluszniak

Description

@msluszniak

Description

When developing timestamping in STT I spotted that the results of the STT transcription are a bit off. The first 35 seconds of 58-second audio are recognised as music and audience chattering. Then, transcription is perfectly fine. The audio's quality is the same during the whole listening. The result is presented on the following screenshot:

Image

Reference audio: https://ai.swmansion.com/storage/moonshine/test_audio.mp3

Steps to reproduce

Run STT example from demo app and pass link from the description.

Snack or a link to a repository

No response

React Native Executorch version

Main

React Native version

0.81.5

Platforms

iOS

JavaScript runtime

Hermes

Workflow

None

Architecture

Fabric (New Architecture)

Build type

Debug mode

Device

iOS simulator

Device model

iPhone 16 Pro

AI model

model from STT demo app

Performance logs

No response

Acknowledgements

Yes

Metadata

Metadata

Assignees

Labels

modelIssues related to exporting, improving, fixing ML models

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions