Open
Description
Title:
Agent transcribes its own voice and gets interrupted during user conversation
Description:
During a conversation session, the agent seems to transcribe its own generated speech as if it were user input. This causes unexpected interruptions and confusion in the dialog flow.
Expected Behavior:
The agent should ignore its own audio during speech recognition, and only process genuine user input.
Actual Behavior:
Agent’s voice is picked up by the speech recognition system and mistakenly transcribed as user input.
Logs:
20:04:02.544 INFO: Server: {"type":"ConversationText","role":"assistant","content":"Hello! I'm Sarah from TechStyle customer service. How can I help you today?"}
20:04:03.476 INFO: Server: {"type":"AgentAudioDone"}
20:04:03.862 INFO: Server: {"type":"UserStartedSpeaking"}
20:04:05.132 INFO: Server: {"type":"ConversationText","role":"user","content":"Can you hear me? Hello. I'm Sarah from"}
20:04:05.174 INFO: Server: {"type":"EndOfThought"}
20:04:06.333 INFO: Server: {"type":"ConversationText","role":"assistant","content":"It seems like there's been a little mix-up with names!"}
20:04:06.342 INFO: Server: {"type":"AgentStartedSpeaking","tts_latency":0.0413104,"ttt_latency":1.161309916,"total_latency":1.712456407}
20:04:07.157 INFO: Server: {"type":"UserStartedSpeaking"}
20:04:07.159 INFO: Server: {"type":"AgentAudioDone"}
20:04:08.111 INFO: Server: {"type":"ConversationText","role":"user","content":"It seems l
Metadata
Metadata
Assignees
Labels
No labels