Skip to content

Voice Plugin

The Voice plugin gives you spoken audio feedback when Claude Code completes a task. The agent automatically speaks a 1—2 sentence summary before stopping, so you can multitask while Claude works and hear when it needs your attention.

  • FFmpeg (recommended) — Enables streaming audio for lower latency. Install via brew install ffmpeg (macOS) or sudo apt install ffmpeg (Linux).

The plugin uses pocket-tts, a lightweight text-to-speech library. On first use, it automatically:

  1. Starts a pocket-tts server (via uvx pocket-tts serve)
  2. Downloads the voice model (~100MB, one-time)

The server persists in the background (via nohup) so subsequent requests are instant. Server logs are written to /tmp/pocket-tts-server.log.

To stop the server manually:

Terminal window
pkill -f "pocket-tts serve"
  1. Add the marketplace (if not already added):

    Terminal window
    claude plugin marketplace add pchalasani/claude-code-tools
  2. Install the voice plugin:

    Terminal window
    claude plugin install voice@cctools-plugins

Once installed, the plugin works automatically — when the agent finishes a task, it speaks a 1—2 sentence summary before stopping. No action required on your part.

Use the /voice:speak command to control the plugin.

Terminal window
# Enable voice feedback with current voice
/voice:speak
# Disable voice feedback
/voice:speak stop

For a complete voice workflow, pair this TTS plugin with a speech-to-text app:

  • Hex with Parakeet V3 (macOS only, open-source) — stunningly fast transcription with no stuttering. Highly recommended.
  • Handy with Parakeet V3 (cross-platform, open-source) — very fast transcription, though may occasionally stutter.

The plugin uses a multi-hook strategy for fast, reliable voice summaries:

UserPromptSubmit Hook

Silently injects voice instructions each turn, telling Claude to end longer responses with a spoken summary marker.

PostToolUse Hook

Brief reminder after each tool call to keep voice instructions fresh during long tool chains.

Stop Hook

Extracts the summary marker instantly (no API call), or falls back to headless Claude summarization if the agent forgot.

This design ensures:

  • Fast feedback — Most summaries are instant (marker extraction, no API call needed)
  • Reliable — Headless Claude fallback catches cases where the agent forgets the marker
  • Silent operation — Hooks use additionalContext for noise-free injection
  • Tone matching — Summaries match the user’s conversational style (casual, formal, etc.)
  • Non-blocking — Audio plays in the background (the stop hook returns immediately via subprocess.Popen), so it never delays the next prompt
  • Streaming — When ffplay is available, audio is piped directly from the TTS server to the player with no temp file, reducing latency. Falls back to a temp WAV file when ffplay is not installed
  • Playback locking — A mkdir-based lock with stale-process detection prevents overlapping audio when multiple sessions finish at the same time
  • Infinite loop prevention — When you run /speak stop, a just_disabled flag is set. The next prompt hook sees this, injects a “stop adding 📢 markers” message to override stale instructions still in context, and clears the flag. Without this, the agent would keep producing markers from old instructions, and the stop hook would keep speaking them — an infinite loop
  • Session state tracking — Per-session state files (/tmp/voice-{id}-running, -done, -failed) let the stop hook know whether audio playback is still in progress, completed, or errored