Question 1

How is this different from Chrome's built-in Live Caption?

Accepted Answer

Chrome's Live Caption is system-only — it doesn't float over other apps and can't show captions in a picture-in-picture window. LiveCaptionIt pops the captions into a floating window that stays visible no matter which app you're using, and works on multiple operating systems (Chrome's Live Caption availability varies by OS).

Question 2

Does my audio get uploaded anywhere?

Accepted Answer

No. LiveCaptionIt runs the Whisper speech-recognition model directly in your browser via WebGPU and transformers.js. Audio is processed on your device and never sent to any server. The only network requests are the one-time download of the model file from Hugging Face Hub, cached locally afterward.

Question 3

Does it store anything?

Accepted Answer

LiveCaptionIt stores your last 20 caption transcripts locally in your browser (IndexedDB) so you can revisit and re-download them from the "Recent sessions" panel on the home page. Only the text is stored — never the raw audio. Click "Clear all" anytime to wipe history. Nothing ever leaves your device. Your model preference, caption style, and PiP window size are also saved (localStorage) for convenience.

Question 4

Can I try it without picking a tab or granting microphone access?

Accepted Answer

Yes. Click "Try with sample audio · no setup" on the home page. It plays a short bundled audio clip through the same pipeline that captions your real audio — no permission prompts. Useful to see how the rolling-window captions feel, judge whether your chosen model tier (tiny / base / small / large turbo) is fast enough on your device, or just confirm everything works before you commit to picking a tab.

Question 5

Which browsers are supported?

Accepted Answer

Tab/screen capture mode needs Chrome, Edge, or Brave 116+ on desktop. Microphone-only mode works in all modern browsers (Firefox + Safari included) including mobile — point your phone's mic at any audio source. The floating picture-in-picture window is Chromium-only desktop for now. On mobile, captions appear inline on the page instead.

Question 6

How fast are the captions?

Accepted Answer

The first word usually appears within ~700ms of speech. LiveCaptionIt uses a rolling-window streaming transcriber: instead of waiting for fixed 3-second chunks, it re-transcribes the recent audio every ~700ms and shows confident words bold + uncertain words muted. Words "solidify in place" as the model becomes confident. Feels like Live Caption / YouTube CC rather than delayed subtitles.

Question 7

Can I choose between speed and accuracy?

Accepted Answer

Yes. The Whisper model picker on the home page lets you choose Tiny (39 MB, ~2x faster), Base (74 MB, default — balanced), Small (244 MB, ~10% more accurate but slower), or Large turbo (537 MB, top-tier accuracy — recommended only when smaller tiers can't keep up with your audio). Each model is cached in your browser after the first download. Pick whichever matches your machine + audio quality.

Question 8

Can I download the transcript?

Accepted Answer

Yes. After you click Stop, three download buttons appear: .txt (plain text), .vtt (WebVTT, for video players that support subtitles), and .srt (SubRip, the universal subtitle format). Timestamps are at segment level (~700-1200ms granularity) — good enough for most use cases, not for frame-perfect subtitle alignment.

Question 9

Can I use my microphone instead of a tab?

Accepted Answer

Yes. Toggle the "Microphone" source on the home page instead of "Tab / window". Useful for dictation, voice notes, recording your own speech for podcast prep, or captioning a meeting where you are the speaker. LiveCaptionIt disables the browser's echo cancellation and auto-gain control for microphone mode so Whisper sees the raw audio.

Question 10

Can I customize the caption look?

Accepted Answer

Yes. The "Caption style" panel on the home page lets you adjust font size (80-200%), base font weight (regular / medium / bold), caption position in the floating window (top / middle / bottom), and the text-shadow that boosts legibility when the window sits over bright video. All settings apply live and persist in your browser.

Question 11

Can it caption audio from a desktop app like Zoom or Spotify?

Accepted Answer

On Windows: yes, if you pick "Entire screen" in the source picker and your browser is allowed to capture system audio (Chrome and Edge support this). On macOS and Linux, the browser can only capture audio from another browser tab — so use Zoom Web or Spotify Web instead. macOS users can install BlackHole (free virtual audio device) to route desktop app audio into the browser if needed.

Question 12

Can it tell who is speaking? (speaker diarization)

Accepted Answer

Not yet — and honestly, it's harder than it sounds in a browser. When you capture a tab, every speaker arrives as one mono audio stream at similar volume, so distinguishing "Speaker A" from "Speaker B" needs a separate voice-fingerprinting model (~300MB) that we haven't shipped to keep LiveCaptionIt fast and lightweight. What we DO ship: turn detection — if there's silence for ≥1.5s, captions start a fresh paragraph, which reads like meeting notes (one paragraph ≈ one person's turn) even without naming the speakers. Real diarization is on the v0.5+ roadmap.

Question 13

Can I teach it words it usually mishears?

Accepted Answer

Yes. The "Custom vocabulary" panel on the home page lets you list proper nouns, technical terms, or names you want preserved (e.g. kubectl, NeurIPS, Aishwarya, ₹). They're fed to Whisper as an initial prompt so the decoder is primed to recognize them with correct spelling and casing. Up to 200 characters. Works for any language Whisper supports.

Question 14

Can I share a transcript with someone without uploading it?

Accepted Answer

Yes. After Stop, click the Share button next to the download options. LiveCaptionIt gzip-compresses the transcript and packs it into the URL itself (no upload, no server) — your friend opens the link and sees the full session viewer with all download options. Works for transcripts up to ~16 KB (a few minutes of speech); longer sessions get a "use Export instead" toast.

Question 15

Can I install it as an app?

Accepted Answer

Yes. LiveCaptionIt is a Progressive Web App — Chrome, Edge, and Brave show an install pill on the home page that adds it to your Start menu / Applications / Home Screen. On iOS Safari, use Share → Add to Home Screen. Installed mode opens in its own window (no browser chrome), which makes the floating PiP feel like a native overlay app.

Live captions for any tab.
Floats over anything.

Keyboard shortcuts

Capture

Window

After stop

Idle

Help

Past session

How it works

Click Start

Pick the source

Captions float over anything

When you'd use it

Frequently asked

Live captions for any tab. Floats over anything.