Benchmarking
This document describes the benchmarking suite for testing WAV transcription performance across different Whisper models, backends, and languages.
Overview
The benchmark suite provides:
- CLI command for quick performance comparison
- Multi-backend comparison (CPU vs GPU)
- Multi-language testing
- Criterion integration for statistical analysis
Benchmark CLI
Installation
The benchmark tool requires the benchmark feature:
cargo build --release --features benchmarkUsage
# Basic usage
voicsh benchmark <wav-file>
# Specific models with iterations
voicsh benchmark audio.wav tiny.en,base.en,small.en 3
# Compare backends
voicsh benchmark audio.wav --compare-backends
# Test languages
voicsh benchmark audio.wav tiny --languages en,de,es,fr
# JSON output
voicsh benchmark audio.wav all --output json > results.jsonMetrics Explained
- Time (ms) - Total transcription time in milliseconds
- RTF (Realtime Factor) - Ratio of processing time to audio duration
- RTF < 1.0 means faster than realtime (good)
- RTF = 0.5 means processing is 2x faster than realtime
- RTF = 1.0 means processing takes same time as audio duration
- RTF > 1.0 means slower than realtime (problematic for live use)
- Speed - Inverse of RTF (1/RTF), shows how many times faster than realtime
- 10x speed means processing is 10 times faster than the audio duration
- CPU (%) - CPU usage percentage during transcription
- Mem (MB) - Memory usage in megabytes during transcription
Multi-Backend Comparison
The benchmark tool supports comparing performance across different backends (CPU, CUDA, Vulkan, HipBLAS, OpenBLAS).
Backend Detection
Use --compare-backends to see which backends are available:
voicsh benchmark audio.wav --compare-backendsComparing Backends
To compare different backends, compile voicsh with each backend:
# CPU benchmark
cargo build --release --no-default-features --features whisper,benchmark,model-download,cli
./target/release/voicsh benchmark audio.wav tiny --output json > results-cpu.json
# CUDA benchmark (requires NVIDIA GPU and CUDA toolkit)
cargo build --release --features cuda,benchmark,model-download,cli
./target/release/voicsh benchmark audio.wav tiny --output json > results-cuda.json
# Vulkan benchmark (requires Vulkan SDK)
cargo build --release --features vulkan,benchmark,model-download,cli
./target/release/voicsh benchmark audio.wav tiny --output json > results-vulkan.jsonBackend Requirements
- CPU: No additional requirements (always available)
- CUDA: NVIDIA GPU, CUDA toolkit, cuDNN
- Vulkan: Vulkan SDK
- HipBLAS: AMD GPU, ROCm/HIP toolkit
- OpenBLAS: OpenBLAS library
Multi-Language Testing
Test the same model with different language codes to compare performance and accuracy:
voicsh benchmark audio.wav tiny --languages auto,en,de,es,fr,itNotes:
- Use multilingual models (without .en suffix) for language comparison
autolets Whisper detect the language automatically- English-only models (.en suffix) ignore the language parameter and always use English
- Performance differences between languages are usually minimal (within 5-10%)
Available Models
| Model | Size | Language | Use Case |
|---|---|---|---|
| tiny.en | 75 MB | English only | Fast, lower accuracy |
| tiny | 75 MB | Multilingual | Fast, multilingual |
| base.en | 142 MB | English only | Recommended for English |
| base | 142 MB | Multilingual | Default, good balance |
| small.en | 466 MB | English only | Better accuracy |
| small | 466 MB | Multilingual | Better accuracy, multilingual |
| medium.en | 1533 MB | English only | High accuracy |
| medium | 1533 MB | Multilingual | High accuracy, multilingual |
| large | 3094 MB | Multilingual | Highest accuracy |
Installing Models
Models auto-download on first use or pre-download manually:
cargo run --release --features model-download -- download base.enModels are cached in ~/.cache/voicsh/models/
Criterion Benchmarks
For detailed statistical analysis:
cargo bench --bench wav_transcription
open target/criterion/report/index.htmlProvides mean, median, standard deviation, and regression detection across runs.
Interpreting Results
Interactive Use (Voice Typing): RTF < 1.0 required for comfortable use. Models: tiny.en, base.en, small.en
Batch Processing: Prioritize accuracy over speed. Models: medium.en, large
Resource-Constrained Systems: Use tiny.en or base.en
Troubleshooting
Model Not Installed: Download with cargo run --features model-download --release -- download <model>
Out of Memory: Test smaller models (tiny.en, base.en) or reduce iterations
Slow Performance: Ensure --release mode, check CPU usage, or try GPU acceleration