Flag to better support simultaneous text-to-speech and speech-to-text #1356

okharedia · 2025-01-10T07:57:34Z

hi,
I have a usecase for realtime translation, but i noticed that when the agent is speaking, some of the stt transcripts are missing / not added to the chat context so the agent will not consider those for llm and tts. I had a look at the VoicepipelineAgent internals and noticed the below code
(btw I have these config set allow_interruptions=False, preemptive_synthesis=True)

def _validate_reply_if_possible(self) -> None:
        """Check if the new agent speech should be played"""

        if self._playing_speech and not self._playing_speech.interrupted:
            should_ignore_input = False
            if not self._playing_speech.allow_interruptions:
                should_ignore_input = True
                logger.debug(
                    "skipping validation, agent is speaking and does not allow interruptions",
                    extra={"speech_id": self._playing_speech.id},
                )

and

if should_ignore_input:
                self._transcribed_text = ""
                return

I understood the transcribed text is cleared here to allow more natural flow of conversation, to keep a clean chat history of agent replying to the correct speech input.
However this does not quite fit my use case which does not tolerate missing speech input. Would it be possible to add a flag to turn this off (clearing out the transcript while speaking)

The text was updated successfully, but these errors were encountered:

davidzhao · 2025-01-12T22:45:20Z

user speech is always flowing in.. and STT is always running. since LLM requires the full input to be ready in order to start inference, we would wait until the user has completed their turn before starting inference.

can you describe what transcripts you are missing?

okharedia added the question Further information is requested label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flag to better support simultaneous text-to-speech and speech-to-text #1356

Flag to better support simultaneous text-to-speech and speech-to-text #1356

okharedia commented Jan 10, 2025

davidzhao commented Jan 12, 2025

Flag to better support simultaneous text-to-speech and speech-to-text #1356

Flag to better support simultaneous text-to-speech and speech-to-text #1356

Comments

okharedia commented Jan 10, 2025

davidzhao commented Jan 12, 2025