Roadmap

Roadmap

0.0.1 — First release (done)

Default command, pipe mode, GPU, daemon mode.

  • Default command: voicsh alone starts mic recording (no subcommand)
  • Pipe mode: cat file.wav | voicsh → transcribe → stdout
  • Auto-resample WAV to 16kHz mono (linear interpolation)
  • GPU feature gates: --features cuda, vulkan, hipblas
  • Logging cleanup: all output respects -v/-vv levels consistently
  • Honest README matching actual features
  • Unix socket IPC: voicsh start / voicsh stop / voicsh toggle
  • Model stays in memory (~300MB for base.en)
  • Systemd user service: voicsh install-service
  • voicsh status shows daemon health
  • Shell completions (bash/zsh/fish)
  • voicsh init — auto-tune: benchmark hardware, recommend model, download
  • voicsh follow — stream daemon events (meter, state, transcriptions)
  • voicsh config get/set/list/dump — manage config without a text editor
  • Hallucination filtering (configurable, language-specific)
  • Fan-out mode — run English + multilingual models in parallel, pick best

0.1.0 — Voice commands (done)

Spoken punctuation, formatting, and keyboard control.

  • Punctuation: “period”, “comma”, “question mark”, “exclamation mark”, etc.
  • Whitespace: “new line”, “new paragraph”, “space”, “tab”
  • Caps toggle: “all caps” / “end caps”
  • Symbols: slash, ampersand, at-sign, dollar, hash, percent, asterisk, etc.
  • Brackets: open/close paren, brace, bracket, angle bracket
  • Key combos: “delete word” (Ctrl+Backspace), “backspace”
  • Configurable vocabulary in config.toml (user overrides merge with built-ins)
  • Rule-based (no LLM needed)
  • Multi-language: en, de, es, fr, pt, it, nl, pl, ru, ja, zh, ko
  • GNOME Shell extension (voicsh install-gnome-extension)

0.2.0 — Text refinement

LLM post-processing for polished output.

  • Local: Ollama, llama.cpp server (auto-detect running)
  • --refine default|formal|casual|code
  • Timeout + fallback to raw transcription
  • Cloud providers optional (Anthropic, OpenAI)

0.3.0 — GPU and extension testing

Container-based testing for GPU backends and GNOME integration.

  • GPU compilation gates: verify --features cuda/vulkan/hipblas build in vendor containers
    • CUDA: nvidia/cuda:12.6.1-devel-ubuntu24.04 (compile-only, no GPU needed)
    • hipblas: rocm/dev-ubuntu-24.04:6.4-complete (compile-only, no GPU needed)
    • Vulkan: libvulkan-dev + mesa-vulkan-drivers on Ubuntu (compile and run)
  • Vulkan runtime tests via lavapipe (Mesa software Vulkan, VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json)
  • GNOME extension integration tests via ghcr.io/ddterm/gnome-shell-image/fedora-43 (or other distros)
    • Headless: gnome-shell --wayland --headless --unsafe-mode --virtual-monitor 1600x960
    • Install/enable via gnome-extensions CLI, check journalctl for errors
    • Screenshots via D-Bus org.gnome.Shell.Screenshot
  • CI workflow for all of the above on GitHub Actions (standard runners, no GPU/KVM needed)

Future

  • Overlay feedback (Wayland layer-shell recording indicator)
  • Streaming word-by-word display
  • Push-to-talk (hold hotkey)
  • X11 support (xdotool/xsel)
  • Profiles (per-app settings)
  • Daemon: listen for PipeWire/PulseAudio device changes, auto-recover or show helpful message

Non-goals

  • GUI settings app (config file is enough)
  • Speaker identification
  • Real-time translation
  • Windows/macOS (Linux-first)