Benchmarking

This document describes the benchmarking suite for testing WAV transcription performance across different Whisper models, backends, and languages.

Overview

The benchmark suite provides:

CLI command for quick performance comparison
Multi-backend comparison (CPU vs GPU)
Multi-language testing
Criterion integration for statistical analysis

Benchmark CLI

Installation

The benchmark tool requires the benchmark feature:

cargo build --release --features benchmark

Usage

# Basic usage
voicsh benchmark <wav-file>

# Specific models with iterations
voicsh benchmark audio.wav tiny.en,base.en,small.en 3

# Compare backends
voicsh benchmark audio.wav --compare-backends

# Test languages
voicsh benchmark audio.wav tiny --languages en,de,es,fr

# JSON output
voicsh benchmark audio.wav all --output json > results.json

Metrics Explained

Time (ms) - Total transcription time in milliseconds
RTF (Realtime Factor) - Ratio of processing time to audio duration
- RTF < 1.0 means faster than realtime (good)
- RTF = 0.5 means processing is 2x faster than realtime
- RTF = 1.0 means processing takes same time as audio duration
- RTF > 1.0 means slower than realtime (problematic for live use)
Speed - Inverse of RTF (1/RTF), shows how many times faster than realtime
- 10x speed means processing is 10 times faster than the audio duration
CPU (%) - CPU usage percentage during transcription
Mem (MB) - Memory usage in megabytes during transcription

Multi-Backend Comparison

The benchmark tool supports comparing performance across different backends (CPU, CUDA, Vulkan, HipBLAS, OpenBLAS).

Backend Detection

Use --compare-backends to see which backends are available:

voicsh benchmark audio.wav --compare-backends

Comparing Backends

To compare different backends, compile voicsh with each backend:

# CPU benchmark
cargo build --release --no-default-features --features whisper,benchmark,model-download,cli
./target/release/voicsh benchmark audio.wav tiny --output json > results-cpu.json

# CUDA benchmark (requires NVIDIA GPU and CUDA toolkit)
cargo build --release --features cuda,benchmark,model-download,cli
./target/release/voicsh benchmark audio.wav tiny --output json > results-cuda.json

# Vulkan benchmark (requires Vulkan SDK)
cargo build --release --features vulkan,benchmark,model-download,cli
./target/release/voicsh benchmark audio.wav tiny --output json > results-vulkan.json

Backend Requirements

CPU: No additional requirements (always available)
CUDA: NVIDIA GPU, CUDA toolkit, cuDNN
Vulkan: Vulkan SDK
HipBLAS: AMD GPU, ROCm/HIP toolkit
OpenBLAS: OpenBLAS library

Multi-Language Testing

Test the same model with different language codes to compare performance and accuracy:

voicsh benchmark audio.wav tiny --languages auto,en,de,es,fr,it

Notes:

Use multilingual models (without .en suffix) for language comparison
auto lets Whisper detect the language automatically
English-only models (.en suffix) ignore the language parameter and always use English
Performance differences between languages are usually minimal (within 5-10%)

Available Models

Model	Size	Language	Use Case
tiny.en	75 MB	English only	Fast, lower accuracy
tiny	75 MB	Multilingual	Fast, multilingual
base.en	142 MB	English only	Recommended for English
base	142 MB	Multilingual	Default, good balance
small.en	466 MB	English only	Better accuracy
small	466 MB	Multilingual	Better accuracy, multilingual
medium.en	1533 MB	English only	High accuracy
medium	1533 MB	Multilingual	High accuracy, multilingual
large	3094 MB	Multilingual	Highest accuracy

Installing Models

Models auto-download on first use or pre-download manually:

cargo run --release --features model-download -- download base.en

Models are cached in ~/.cache/voicsh/models/

Criterion Benchmarks

For detailed statistical analysis:

cargo bench --bench wav_transcription
open target/criterion/report/index.html

Provides mean, median, standard deviation, and regression detection across runs.

Interpreting Results

Interactive Use (Voice Typing): RTF < 1.0 required for comfortable use. Models: tiny.en, base.en, small.en

Batch Processing: Prioritize accuracy over speed. Models: medium.en, large

Resource-Constrained Systems: Use tiny.en or base.en

Troubleshooting

Model Not Installed: Download with cargo run --features model-download --release -- download <model>

Out of Memory: Test smaller models (tiny.en, base.en) or reduce iterations

Slow Performance: Ensure --release mode, check CPU usage, or try GPU acceleration

Development Contributing