I Built a Voice Plugin for Claude Code

Type less, talk more. A /voice command that records audio in the terminal, transcribes it, and feeds the text to Claude as if you typed it.

Claude Code Python Voice Input Open Source

Demo

The problem

I spend a lot of time in Claude Code. Typing out detailed prompts gets tedious, especially when you're explaining a complex bug or describing an architecture change. Sometimes you just want to say what you need.

I looked around for existing solutions. There are a few community projects -- voice notification hooks, TTS feedback systems, MCP server wrappers. None of them did the one simple thing I wanted: speak into the terminal, get a transcript, have Claude respond to it.

What it does

Inside Claude Code, you type /voice. Your microphone starts recording. You talk. It auto-stops after 5 seconds of silence, transcribes the audio via Google Speech API, and Claude responds as if you typed that text.

  Voice Input

  * Recording (auto-stops after 5s silence)
  * [6.2s]
  * Recorded 8.1s
  * create a REST API with FastAPI

create a REST API with FastAPI          <-- Claude sees this

That's it. No browser window. No GUI. No MCP server. Just audio in, text out, inline in your terminal.

How it works

The key design choice: status messages (recording indicator, timing, transcription progress) go to stderr. Only the final transcript goes to stdout. Claude Code reads stdout as user input, so this separation is what makes the whole thing work.

Two modes

The script detects whether it's running in a real terminal or piped through Claude Code:

TTY mode (running cv in your terminal): Press Enter to stop recording, Esc to cancel. Interactive controls via raw terminal input.
Non-TTY mode (running via /voice in Claude Code): Auto-stops after 5 seconds of silence. No keyboard interaction needed since Claude owns the terminal.

The detection is a single os.isatty(sys.stdin.fileno()) check. Simple, but it makes the tool feel native in both contexts.

Installation

Three commands:

git clone https://github.com/iAmSurajT/claude-voice.git
cd claude-voice
./install.sh

The installer handles everything: portaudio via Homebrew, Python packages, creating the /voice slash command in ~/.claude/commands/, and adding a cv shell alias. Works on macOS and Linux.

After install, /voice works globally in any project -- it's a user-level Claude Code command, not project-scoped.

The silence detection problem

The hardest part wasn't the audio recording or the transcription. It was deciding when to stop.

Stop too early and you cut someone off mid-sentence while they're thinking. Stop too late and they're sitting in silence wondering if the tool froze.

The current approach: track the mean amplitude of each audio chunk. If it stays below a threshold (800 by default) for 5 consecutive seconds after detecting at least one non-silent chunk, stop. That "after speech" condition is important -- without it, the recording would stop before you even start talking.

def is_silent(audio_chunk, threshold=SILENCE_THRESHOLD):
    return np.abs(audio_chunk).mean() < threshold

5 seconds is generous. Most natural pauses in speech are 1-2 seconds. But I'd rather wait a bit longer than clip someone's thought. The values are configurable if your environment needs different tuning.

What I'd change

A few things I'm thinking about for future versions:

Local transcription. The current setup sends audio to Google's Speech API. It works, it's free, but it adds network latency and a dependency on an external service. Whisper running locally would solve both.
Streaming transcription. Right now the full audio is transcribed in one shot after recording stops. Streaming would show words appearing as you speak.
Better silence detection. The amplitude threshold is crude. A proper VAD (voice activity detection) model would handle background noise, typing sounds, and variable mic levels more gracefully.

Try it

The repo is public: github.com/iAmSurajT/claude-voice

Install takes about 30 seconds. If you use Claude Code regularly, it's worth the setup. Talking through a problem is faster than typing it out, and the transcript quality from Google Speech is solid for technical vocabulary.