# Talk Mode
Talk mode is a continuous voice conversation loop:
- Listen for speech
- Send transcript to the model (main session, chat.send)
- Wait for the response
- Speak it via ElevenLabs (streaming playback)
# Behavior (macOS)
- Always-on overlay while Talk mode is enabled.
- Listening → Thinking → Speaking phase transitions.
- On a short pause (silence window), the current transcript is sent.
- Replies are written to WebChat (same as typing).
- Interrupt on speech (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
# Voice directives in replies
The assistant may prefix its reply with a single JSON line to control voice:
json
{ "voice": "<voice-id>", "once": true }Rules:
- First non-empty line only.
- Unknown keys are ignored.
once: trueapplies to the current reply only.- Without
once, the voice becomes the new default for Talk mode. - The JSON line is stripped before TTS playback.
Supported keys:
voice/voice_id/voiceIdmodel/model_id/modelIdspeed,rate(WPM),stability,similarity,style,speakerBoostseed,normalize,lang,output_format,latency_tieronce
# Config (~/.opensoul/opensoul.json)
json5
{
talk: {
voiceId: "elevenlabs_voice_id",
modelId: "eleven_v3",
outputFormat: "mp3_44100_128",
apiKey: "elevenlabs_api_key",
interruptOnSpeech: true,
},
}Defaults:
interruptOnSpeech: truevoiceId: falls back toELEVENLABS_VOICE_ID/SAG_VOICE_ID(or first ElevenLabs voice when API key is available)modelId: defaults toeleven_v3when unsetapiKey: falls back toELEVENLABS_API_KEY(or gateway shell profile if available)outputFormat: defaults topcm_44100on macOS/iOS andpcm_24000on Android (setmp3_*to force MP3 streaming)
# macOS UI
- Menu bar toggle: Talk
- Config tab: Talk Mode group (voice id + interrupt toggle)
- Overlay:
- Listening: cloud pulses with mic level
- Thinking: sinking animation
- Speaking: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode
# Notes
- Requires Speech + Microphone permissions.
- Uses
chat.sendagainst session keymain. - TTS uses ElevenLabs streaming API with
ELEVENLABS_API_KEYand incremental playback on macOS/iOS/Android for lower latency. stabilityforeleven_v3is validated to0.0,0.5, or1.0; other models accept0..1.latency_tieris validated to0..4when set.- Android supports
pcm_16000,pcm_22050,pcm_24000, andpcm_44100output formats for low-latency AudioTrack streaming.