The best speech to text setup on Windows depends on what you care about most: for live, low-latency dictation use a streaming engine like Deepgram, for the highest accuracy use OpenAI Whisper, and for a fully offline, free option use Local Whisper. With PipeVoice, a free and open-source voice typing tool for Windows 10 and 11, you pick the engine and it types real keystrokes into whatever app is focused.
This guide explains how the engines differ, which wins for which job, and how to get a working free setup in a few minutes. If you just want to start, you can download PipeVoice and hold Ctrl+\ to talk.
Speech to text vs voice typing vs voice control: what's the difference
These terms overlap, but they are not the same thing.
- Speech to text (STT) is the underlying technology: an engine that turns audio of your voice into a string of text. Deepgram, Whisper, and the Windows built-in recognizer are all STT engines.
- Voice typing wraps STT in something useful: you speak, and the words land as text inside the app you are actually using, whether that is a terminal, an editor, or a chat box.
- Voice control is about commanding your computer hands-free (moving the cursor, clicking, running commands). Tools like Windows Voice Access lean this way.
PipeVoice is voice typing. You hold a hotkey, speak, release, and it types real keystrokes into the focused app. A second hotkey copies the result to your clipboard instead of typing. It also handles a few spoken commands like "new line", "new paragraph", "tab key", "scratch that", and "send it", but commanding your whole OS hands-free is not its goal.
How modern speech-to-text engines work (streaming vs batch)
There are two ways an engine can hand you text, and the difference shapes how dictation feels.
Streaming (real-time) engines transcribe as you speak. Audio is sent in small chunks and partial words appear almost immediately, then firm up as more context arrives. This is what makes dictation feel live.
Batch (after-the-fact) engines wait until you finish speaking, then process the whole clip at once. You get the result a moment after you release the key. Batch models can use the full sentence as context, which often helps accuracy, at the cost of that short wait.
Rule of thumb: streaming optimizes for latency, batch optimizes for accuracy. PipeVoice lets you choose, so you can match the engine to the task instead of compromising.
The three engines you can use with PipeVoice
PipeVoice does not lock you into one provider. You pick one of three engines, and you bring your own key for the cloud options so the cost stays yours and stays tiny.
- Deepgram (streaming): words appear live as you speak, and it is the fastest option. Needs your own free Deepgram API key. Running costs are roughly pennies per day for normal dictation.
- OpenAI Whisper (batch): the most accurate of the three. It processes your clip after you release the key. Needs your own OpenAI key.
- Local Whisper / faster-whisper (offline): runs fully on your PC, free, with no API key. The first use downloads a model of about 150MB. You can raise the model size for more accuracy, at the cost of speed and CPU.
Accuracy compared: which engine wins for what use case
No single engine is best at everything. Here is how the three line up across the things that actually matter for daily use.
| Engine | Mode | Latency | Accuracy | Offline? | Cost | Key needed? |
|---|---|---|---|---|---|---|
| Deepgram | Streaming | Lowest (live) | High | No | ~pennies/day | Free Deepgram key |
| OpenAI Whisper | Batch | Short wait after release | Highest | No | Pay-as-you-go on your key | OpenAI key |
| Local Whisper | Batch (offline) | Depends on CPU and model size | Good, scales with model | Yes | Free | None |
For a deeper breakdown see Deepgram vs Whisper vs OpenAI for dictation. The short version: pick Deepgram when you want words to appear as you talk, pick OpenAI Whisper when the text has to be right the first time, and pick Local Whisper when nothing should leave your machine.
Real-time vs after-the-fact transcription and why latency matters
Latency is the gap between finishing a thought and seeing it on screen. It sounds minor until you do it all day.
With a streaming engine, the words track your voice, so you can self-correct mid-sentence and stay in flow. This suits chat, prompts, and fast back-and-forth, for example dictating into Claude Code, Cursor, a terminal, or a browser chat box.
With batch transcription you get a clean result a beat after you release the key. That short pause is a fair trade when you are drafting prose or writing something that needs to be accurate, because the model considers the whole sentence at once.
Running speech to text fully offline on your PC
If you would rather not send audio anywhere, PipeVoice has a fully offline path: Local Whisper for transcription plus Ollama for the optional cleanup. That combination has zero cost, needs no key, and sends nothing off your PC.
PipeVoice has no account, no telemetry, and no servers of ours. The cloud engines send audio only to the provider you chose, on your key. The local path sends nothing at all. There is also an optional AI polish step (called Flow mode) that tidies filler words, punctuation, and casing. Polish sends text only, never audio, and you can run it through OpenAI, Google Gemini's free tier, OpenRouter's free community models, or local Ollama for a no-key offline option.
For the full walkthrough, see offline voice typing on Windows and the Windows voice typing overview.
Boosting accuracy with vocabulary, accents, and speech notes
The engine is only half the story. How you configure it matters just as much, especially for technical words and non-standard speech.
- Vocabulary boosting: feed in jargon, product names, and acronyms so the engine stops guessing wrong on the words you use most.
- Accent and language picker: choose British, US, Australian, Indian, New Zealand English, and more, so the model expects how you actually sound.
- Speech notes: a free-text field where you describe your speech, for example a non-native accent, a stutter, or heavy fillers. This helps the cleanup step interpret you correctly.
More tactics live in dictation accuracy tips.
Picking the right setup for coding, writing, or accessibility
PipeVoice supports per-app profiles, so each app can use a different engine, cleanup level, output mode, and auto-Enter setting. A few sensible starting points:
- Coding and prompts: Deepgram for live feedback, vocabulary boosting for your stack's terms, auto-Enter on in chat tools so "send it" actually sends. Because it types into any app, it works inside the terminal, VS Code, Cursor, and Claude Code, not just one CLI.
- Writing: OpenAI Whisper for accuracy, with Flow mode cleaning up punctuation and filler so your draft reads cleanly.
- Accessibility and RSI: toggle mode (instead of push-to-talk) so you are not holding a key, plus local dictation history to recover anything you missed.
- Privacy-first: Local Whisper plus Ollama, fully offline.
How PipeVoice compares to other options
PipeVoice's distinguishing traits are that it is free, open source, Windows-native, and lets you pick from three transcription engines. Here is how those traits stack up against other common ways to dictate on Windows. Check each vendor's site for current pricing.
| Tool | Platform | Offline? | Open source? | Engine choice? | Price |
|---|---|---|---|---|---|
| PipeVoice | Windows | Yes | Yes | Yes (3 engines) | Free |
| Wispr Flow | Mac and Windows | No | No | No | Paid (subscription) |
| Dragon Professional | Windows | Yes | No | No | Paid (one-time licence) |
| Windows Voice Access | Windows | Yes | No | No | Free (built in) |
| Talon Voice | Cross-platform | Yes | No | Limited | Free tier plus paid beta |
See the head-to-head on the Wispr Flow comparison page, or browse free voice typing software for Windows.
Honest limitations
To be straight with you: PipeVoice is Windows only, not Mac or Linux. It is currently unsigned, so Windows SmartScreen shows an "unrecognised app" warning on first run. Click More info, then Run anyway (code signing is in progress). The cloud engines need your own API key, and Local Whisper is slower than the cloud and wants a decent CPU for the larger, more accurate models.
Get started
PipeVoice is free forever, open source, and installs in a couple of minutes. A managed-key Pro tier may arrive later, but the core stays free. Download PipeVoice for Windows, hold Ctrl+\ (or Right Ctrl), and talk faster than you type. The docs cover engine setup and profiles.