Overview
There are two ways to use the speech-to-text feature:- Real-time STT: Transcribes incoming audio tracks in real-time. Ideal when low-latency feedback is important.
- Non-real-time STT: Transcribes audio blobs. Suitable when you already have a user audio recording implementation.
Real-time STT
As a recommended approach, it connects to the user’s audio track and streams the transcription result in real time.Configuration
You can configure STT options in the config when calling join(). You may explicitly specify a language or enable automatic language detection. Specifying a language improves transcription accuracy. (link) If no language is specified, STT runs in the same language as the AI Avatar.joined
event is triggered.
Start transcription
Stop transcription
muteUserAudio()
keeps the track published but stops sending audio data.
To resume STT, use unmuteUserAudio()
.
If you stopped STT using muteUserAudio()
, it can resume more quickly, so this is the recommended approach.