Skip to content

Hardware Comparison Guide

Choosing the right hardware for self-hosted TTS — from Raspberry Pi to RTX 5090, ranked by the metric that matters most: memory bandwidth.

Why Memory Bandwidth Determines TTS Speed

For autoregressive TTS decode (which is how all LLM-backbone TTS models generate audio tokens), every generated token requires reading the full model weights from memory. The GPU spends most of its time loading data, not computing.

The formula:

Max tokens/sec ≈ Memory bandwidth (GB/s) ÷ Model size (GB)

This means:

  • A 1 GB model on hardware with 1,000 GB/s bandwidth → ~1,000 tok/s max
  • The same 1 GB model on 273 GB/s bandwidth → ~273 tok/s max
  • 3.7× bandwidth difference = 3.7× speed difference (at batch size 1)

Compute (TFLOPS) only matters for prefill (processing input text and reference audio). For the actual audio token generation loop, bandwidth is king.

Hardware Specifications Comparison

GPU / Accelerator Tier

PlatformMemory TypeCapacityBandwidthCompute (BF16)TDPPrice (2026)
RTX 5090GDDR7, 512-bit32 GB1,792 GB/s419 TFLOPS575W~$3,000 street
RTX 4090GDDR6X, 384-bit24 GB1,008 GB/s330 TFLOPS450W$1,599 (discontinued)
RTX 3090GDDR6X, 384-bit24 GB936 GB/s71 TFLOPS350W~$700 used
RTX 4080 SuperGDDR6X, 256-bit16 GB736 GB/s198 TFLOPS320W$999
RTX 3060 12GBGDDR6, 192-bit12 GB360 GB/s25 TFLOPS170W~$250 used

Unified Memory Platforms

PlatformMemory TypeCapacityBandwidthComputeTDPPrice
Mac Studio M3 UltraLPDDR5up to 512 GB800 GB/s~22 TFLOPS (GPU)~200W$3,999–$13,000+
Mac Studio M4 MaxLPDDR5xup to 128 GB546 GB/s~18 TFLOPS (GPU)~120W$1,999–$5,999
DGX Spark (GB10)LPDDR5x128 GB273 GB/s~100 TFLOPS BF16~170W$3,999–$4,699
Apple Silicon Note

Apple M4 Ultra does not exist as of March 2026. The current Mac Studio options are M4 Max and M3 Ultra.

Edge / Embedded Tier

PlatformMemoryCapacityBandwidthGPU/NPUTDPPrice
Jetson AGX OrinLPDDR532/64 GB205 GB/s2048 CUDA / 275 TOPS15–60W$999–$1,599
Jetson Orin Nano SuperLPDDR58 GB102 GB/s1024 CUDA / 67 TOPS7–25W$249
Raspberry Pi 5LPDDR4x4/8 GB~34 GB/sCPU only~12W$60–$80
Orange Pi 5 (RK3588)LPDDR58/16 GB~25–50 GB/sCPU + 6 TOPS NPU~18W$100–$160

TTS Benchmarks by Platform

Qwen3-TTS Performance (estimated from community reports)

PlatformQwen3-TTS 1.7B RTFQwen3-TTS 0.6B RTFFirst-Token Latency
RTX 50900.48–0.550.32–0.3845–62ms
RTX 40900.65–0.850.38–0.4552–97ms
RTX 30900.95–1.260.52–0.6878–145ms
RTX 4080 Super0.82–1.150.48–0.62
RTX 3060 12GB1.65+0.85–1.15

RTF < 1.0 means faster than real-time. Lower is better.

Cross-Model Performance (NVIDIA L4 benchmark, Inferless)

Model50 words100 words200 words
Kokoro-82M<0.1s<0.2s<0.3s
MeloTTS<0.5s<1s~1.5s
Parler-TTS mini~2s~4s~8s
F5-TTS~3s~6s~12s
XTTS-v2~8s~18s~35s

Other Published Benchmarks

ModelPlatformRTFNotes
VITS (non-cloning)RTX 30900.01567× real-time — feed-forward, not autoregressive
Kani-TTS-2RTX 50800.19New LFM2 architecture
Kani-TTS-2RTX 4080~0.20Similar to 5080 (bandwidth-limited)
VibeVoice-RT 0.5BDGX Spark0.48Community benchmark
Chatterbox TurboConsumer GPUUp to 6× RT1-step distilled decoder
TADA-1BA1000.09Ultra-low codec frame rate

LLM Token Generation as TTS Proxy

Since modern TTS models use autoregressive token generation, llama.cpp benchmarks are a reliable proxy for TTS decode throughput.

Llama 2 7B Q4_0 (single-stream decode)

Platformtok/svs RTX 4090
RTX 5090264–2741.4×
RTX 4090188–1901.0× (baseline)
RTX 3090160–1620.85×
DGX Spark (batch 1)~20.50.11×
DGX Spark (batch 32)368 (total)
Mac Studio M3 Ultra (est.)~120–150~0.7×
Mac Studio M4 Max (est.)~80–100~0.5×

What this means for TTS

A TTS model generating audio tokens at the same rate as LLM text tokens:

  • RTX 5090: ~270 audio tok/s → real-time at codecs up to ~270 Hz
  • RTX 4090: ~190 audio tok/s → real-time at codecs up to ~190 Hz
  • DGX Spark: ~20 audio tok/s (batch 1) → real-time only at ~20 Hz or lower

This is why codec frame rate matters so much on DGX Spark: Qwen3-TTS-12Hz needs only 12.5 tok/s, TADA needs 2–3 tok/s, but a 50 Hz codec needs 50 tok/s — the Spark can't keep up at batch 1 for larger models.

Edge and ARM Hardware

Raspberry Pi 5

The default platform for Piper TTS (primary local TTS in Home Assistant).

  • Piper medium English: estimated RTF 0.1–0.2 (5–10× real-time) with 4 threads
  • Kokoro-82M int8 ONNX: approaching real-time on Pi 5 8GB (slower-than-real-time on Pi 4 2GB)
  • Home Assistant 2025.7+ added TTS streaming via Wyoming protocol, improving perceived response by ~10×

Rockchip RK3588 (Orange Pi 5, Rock 5B)

Same quad A76 cores as Pi 5 plus a 6 TOPS NPU for TTS decoder acceleration.

  • RKLLaMA project: Piper decoder on NPU, encoder via ONNX
  • MMS-TTS entirely on NPU
  • Potentially faster than Pi 5 for neural TTS (no published RTF benchmarks yet)

Jetson Orin Nano Super ($249)

GPU-accelerated edge option: 1,024 CUDA cores, 8 GB, 67 TOPS.

  • Community voice pipelines: Piper + faster-whisper + small LLMs
  • sherpa-onnx explicitly supports Jetson Orin (CPU + CUDA)
  • 8 GB memory limits model size

Intel N100 Mini-PCs

~$150, ~25W under load. Kokoro via OpenVINO on Intel iGPU achieves 3× CPU speed at 15W TDP. More compute than Pi 5 at ~2× power.

Use CaseHardwareTTS ModelEst. RTFCostPower
Budget voice assistantRPi 5 (8 GB)Piper medium0.1–0.2~$80~12W
Higher quality edgeRPi 5 (8 GB)Kokoro int8 ONNX~0.5–1.0~$80~12W
NPU-accelerated edgeOrange Pi 5 (16 GB)Piper on RKNN<0.2 est.~$130~18W
GPU edge deviceJetson Orin Nano SuperPiper + KokoroReal-time$24915–25W
Quiet desktop/devMac Studio M4 MaxKokoro (CoreML)0.02–0.1$1,999+~120W
Best budget GPU TTSPC + RTX 3090 (used)Any TTS model0.5–0.7~$1,200 total~400W
Production TTS serverPC + RTX 4090Qwen3-TTS / F5-TTS0.4–0.9~$2,500 total~550W
Maximum throughputPC + RTX 5090Qwen3-TTS 1.7B0.48–0.55~$4,000+ total~700W
Large model prototypingDGX Spark30B+ TTS/LLM combosVaries$3,999~170W
Huge model capacityMac Studio M3 UltraAny (via MLX)0.5–2.0$3,999+~200W

The RTX 3090 value proposition

Best Price-to-Performance in 2026

At ~$700 used, the RTX 3090's 936 GB/s bandwidth is only 7% less than the RTX 4090's 1,008 GB/s, delivering near-identical per-token generation speed. This makes the 3090 the best price-to-performance option for TTS in 2026. The main downsides are power (350W TDP) and the inability to use FP8/NVFP4 quantization (Ampere architecture).

DGX Spark: Capacity vs Speed

The DGX Spark's value proposition is model capacity, not decode speed.

Strengths

  • 128 GB unified memory — fits models that won't run on any consumer GPU
  • Two units NVLink-linked for 256 GB (run 405B models)
  • 128 GB for simultaneous STT + LLM + TTS with room to spare
  • Silent operation, Mac Mini form factor, <100W AI workload
  • Native NVIDIA CUDA ecosystem (vs Apple's MLX)

Weaknesses

  • 273 GB/s bandwidth — single-stream decode is slow (20.5 tok/s for 7B model)
  • $3,999–$4,699 for bandwidth that a $700 RTX 3090 exceeds 3.4×
  • aarch64 software ecosystem still maturing
  • "Blackwell Noise" breaks several TTS models on GPU

When DGX Spark makes sense for TTS

DGX Spark Sweet Spots
  • Running large voice cloning models (Fish S2 Pro at 4.4B, Higgs Audio at 5.8B) that don't fit in 24 GB
  • Complete voice pipelines (STT + large LLM + TTS) in one device
  • Development and testing before deploying to production GPU servers
  • Concurrent inference (batch 32 throughput is competitive)
  • Air-gapped deployments where silence and low power matter

When DGX Spark does NOT make sense for TTS

  • Single-user real-time voice conversation (bandwidth-limited)
  • Maximum generation speed per dollar (RTX 3090 wins)
  • Models that fit in 24 GB (use a 4090 instead)

References

ResourceURL
LMSYS DGX Spark Reviewlmsys.org/blog/2025-10-13-nvidia-dgx-spark/
RTX 5090 specificationsvast.ai/article/nvidia-geforce-rtx-5090-specs
RTX 3090 vs 4090 AI comparisonbestgpusforai.com/gpu-comparison/3090-vs-4090
llama.cpp benchmarksgithub.com/DandinPower/llama.cpp_bench
DGX Spark price comparisonglukhov.org/post/2025/10/nvidia-dgx-spark-prices/
DGX Spark vs alternativesaimultiple.com/dgx-spark-alternatives
Mac Studio M3 Ultra specslowendmac.com/2025/mac-studio-early-2025/
Jetson Orin specsnvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/
Kokoro on Raspberry Pimikeesto.com/posts/kokoro-82m-pi/
Home Assistant voicehome-assistant.io/blog/2025/09/11/ai-in-home-assistant/
sherpa-onnx ARM supportgithub.com/k2-fsa/sherpa-onnx
Inferless TTS benchmarkinferless.com/learn/comparing-different-text-to-speech---tts--models-part-2

Related Guides

Was this guide helpful?