You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi,
I have a usecase for realtime translation, but i noticed that when the agent is speaking, some of the stt transcripts are missing / not added to the chat context so the agent will not consider those for llm and tts. I had a look at the VoicepipelineAgent internals and noticed the below code
(btw I have these config set allow_interruptions=False, preemptive_synthesis=True)
def _validate_reply_if_possible(self) -> None:
"""Check if the new agent speech should be played"""
if self._playing_speech and not self._playing_speech.interrupted:
should_ignore_input = False
if not self._playing_speech.allow_interruptions:
should_ignore_input = True
logger.debug(
"skipping validation, agent is speaking and does not allow interruptions",
extra={"speech_id": self._playing_speech.id},
)
and
if should_ignore_input:
self._transcribed_text = ""
return
I understood the transcribed text is cleared here to allow more natural flow of conversation, to keep a clean chat history of agent replying to the correct speech input.
However this does not quite fit my use case which does not tolerate missing speech input. Would it be possible to add a flag to turn this off (clearing out the transcript while speaking)
The text was updated successfully, but these errors were encountered:
user speech is always flowing in.. and STT is always running. since LLM requires the full input to be ready in order to start inference, we would wait until the user has completed their turn before starting inference.
can you describe what transcripts you are missing?
hi,
I have a usecase for realtime translation, but i noticed that when the agent is speaking, some of the stt transcripts are missing / not added to the chat context so the agent will not consider those for llm and tts. I had a look at the VoicepipelineAgent internals and noticed the below code
(btw I have these config set
allow_interruptions=False, preemptive_synthesis=True
)and
I understood the transcribed text is cleared here to allow more natural flow of conversation, to keep a clean chat history of agent replying to the correct speech input.
However this does not quite fit my use case which does not tolerate missing speech input. Would it be possible to add a flag to turn this off (clearing out the transcript while speaking)
The text was updated successfully, but these errors were encountered: