Transcription¶

Achew performs all transcription locally using on-device models; nothing is uploaded to a third party. When you run a workflow that involves transcription (Smart Detect, Regenerate Titles) or transcribe individual chapters in the editor, Achew cuts a short segment from the start of each chapter and sends it to a transcription service.

Services¶

Achew ships with two transcription services: Whisper, and Parakeet. Each has a hardware-accelerated MLX variant available on Apple Silicon devices.

Service	Languages	Hardware
Whisper	`.en` models: English only¹ Other models: 99+ languages²	CPU
Parakeet	`v2`: English only¹ `v3`: 25 languages²	CPU³
Whisper MLX	Same as standard Whisper	Apple Silicon⁴
Parakeet MLX	Same as standard Parakeet	Apple Silicon⁴

What should I start with?

The Whisper tiny/tiny.en models are the fastest and lightest, and make an excellent starting point. If the results aren't good enough, try switching to a larger Whisper model or using one of the Parakeet models, which tend to strike the best balance between accuracy and speed.

Whisper models¶

tiny / tiny.en: Smallest, fastest, least accurate, but "good enough" for a lot of books.
base / base.en: Reasonably fast, should work fairly well for most books.
small / small.en: Fast-ish, a good balance between speed and accuracy.
medium / medium.en: Slower but more accurate.
large: Highest accuracy, slowest.
turbo: High accuracy, reasonable speed. Consider using this instead of medium or large.

All model sizes have multilingual² variants. All model sizes except large and turbo have English-only¹ variants (.en).

Parakeet models¶

0.6B v2: English only.¹
0.6B v3: Multilingual; auto-detects language. Supports 25 languages.²

Service recommendations¶

English audiobooks → Parakeet 0.6B v2 for accuracy, Whisper tiny.en for speed.
Non-English audiobooks → Parakeet 0.6B v3 if the language is supported, otherwise Whisper small/turbo for accuracy or Whisper tiny for speed.
Limited RAM / CPU → Whisper tiny/tiny.en.

Configuring transcription¶

You can configure your transcription settings from Settings → Transcription Settings, or in the Transcribe Titles screen shown before transcription occurs in specific workflows.

Start by picking the transcription service, model variant, and language you wish to use. See the Services section above for recommendations. For English audiobooks, an English-only model is recommended. For multilingual Whisper variants, it is recommended that you select your book's specific language, as the Auto option tends to be slower and less accurate.

Then, configure the other options:

Trim segments: Attempts to increase transcription speed by trimming unnecessary part from the audio. It's typically fine to keep this enabled, but you'll want to disable it if it's giving you blank/nonsensical transcriptions or if important parts of the title are being missed.
Use Bias Words: (Whisper only) Enables a word list that can help guide the transcription model toward more consistent results. This list is editable so you can tune it for your specific book or language (you'll want to use words in the target language). Use the Reset button in the top right of the edit area to reset to the default word list.
Transcription Length: Length of the audio segment extracted for transcription, before trimming. The default of 8 seconds generally works well, but you may need to increase this for books that have unusually long chapter titles.

Tips for improving transcription¶

Transcription models can vary wildly on capitalization, number formatting, and punctuation. For the most part that's just the nature of the beast, but there are a few things you can do:

Use an English-only model for English audio, or a multilingual model for other audio.
For non-English audio with Whisper, ensure you've selected a language (not Auto).
Step up to a larger Whisper variant (tiny → small → turbo), or switch to Parakeet.
Enable Bias Words and add book-specific names and terms.
Run AI Cleanup as a post-processing step and let a machine do the work for you.

Progress and cancellation¶

In a workflow, transcription occurs as a discrete step and displays full-screen progress. Canceling the transcription will take you back to the previous step of the workflow.

In the chapter editor, transcription instead runs in the background. The editor shows per-chapter status and a progress bar with a Cancel button. Cancelling stops any in-progress or queued transcriptions, while already-transcribed chapters keep their results.

Model downloads¶

Transcription models are downloaded on first use and are cached for subsequent runs. First runs can take several minutes for large models. See Storage and Backup.

`legacy-cpu` Docker image¶

The default Docker image uses Whisper builds that require AVX2 CPU support. CPUs from before ~2013 (Intel) or ~2015 (AMD) lack AVX2 and will crash when transcribing with Whisper.

As a workaround, you can swap to the legacy-cpu image tag in docker-compose.yml:

services:
  achew:
    image: sirgibblets/achew:legacy-cpu

Parakeet is not affected.

English-only models are recommended for English audio. ↩↩↩↩
See Supported Languages for the full per-variant breakdown. ↩↩↩↩
On Windows, Parakeet requires the Visual C++ Redistributable. ↩
MLX variants run dramatically faster, but are only available on M-series Macs via native install, not Docker. ↩↩