Hardware Comparison Guide

Choosing the right hardware for self-hosted TTS — from Raspberry Pi to RTX 5090, ranked by the metric that matters most: memory bandwidth.

Why Memory Bandwidth Determines TTS Speed

For autoregressive TTS decode (which is how all LLM-backbone TTS models generate audio tokens), every generated token requires reading the full model weights from memory. The GPU spends most of its time loading data, not computing.

The formula:

Max tokens/sec ≈ Memory bandwidth (GB/s) ÷ Model size (GB)

This means:

A 1 GB model on hardware with 1,000 GB/s bandwidth → ~1,000 tok/s max
The same 1 GB model on 273 GB/s bandwidth → ~273 tok/s max
3.7× bandwidth difference = 3.7× speed difference (at batch size 1)

Compute (TFLOPS) only matters for prefill (processing input text and reference audio). For the actual audio token generation loop, bandwidth is king.

Hardware Specifications Comparison

GPU / Accelerator Tier

Platform	Memory Type	Capacity	Bandwidth	Compute (BF16)	TDP	Price (2026)
RTX 5090	GDDR7, 512-bit	32 GB	1,792 GB/s	419 TFLOPS	575W	~$3,000 street
RTX 4090	GDDR6X, 384-bit	24 GB	1,008 GB/s	330 TFLOPS	450W	$1,599 (discontinued)
RTX 3090	GDDR6X, 384-bit	24 GB	936 GB/s	71 TFLOPS	350W	~$700 used
RTX 4080 Super	GDDR6X, 256-bit	16 GB	736 GB/s	198 TFLOPS	320W	$999
RTX 3060 12GB	GDDR6, 192-bit	12 GB	360 GB/s	25 TFLOPS	170W	~$250 used

Unified Memory Platforms

Platform	Memory Type	Capacity	Bandwidth	Compute	TDP	Price
Mac Studio M3 Ultra	LPDDR5	up to 512 GB	800 GB/s	~22 TFLOPS (GPU)	~200W	$3,999–$13,000+
Mac Studio M4 Max	LPDDR5x	up to 128 GB	546 GB/s	~18 TFLOPS (GPU)	~120W	$1,999–$5,999
DGX Spark (GB10)	LPDDR5x	128 GB	273 GB/s	~100 TFLOPS BF16	~170W	$3,999–$4,699

Apple Silicon Note

Apple M4 Ultra does not exist as of March 2026. The current Mac Studio options are M4 Max and M3 Ultra.

Edge / Embedded Tier

Platform	Memory	Capacity	Bandwidth	GPU/NPU	TDP	Price
Jetson AGX Orin	LPDDR5	32/64 GB	205 GB/s	2048 CUDA / 275 TOPS	15–60W	$999–$1,599
Jetson Orin Nano Super	LPDDR5	8 GB	102 GB/s	1024 CUDA / 67 TOPS	7–25W	$249
Raspberry Pi 5	LPDDR4x	4/8 GB	~34 GB/s	CPU only	~12W	$60–$80
Orange Pi 5 (RK3588)	LPDDR5	8/16 GB	~25–50 GB/s	CPU + 6 TOPS NPU	~18W	$100–$160

TTS Benchmarks by Platform

Qwen3-TTS Performance (estimated from community reports)

Platform	Qwen3-TTS 1.7B RTF	Qwen3-TTS 0.6B RTF	First-Token Latency
RTX 5090	0.48–0.55	0.32–0.38	45–62ms
RTX 4090	0.65–0.85	0.38–0.45	52–97ms
RTX 3090	0.95–1.26	0.52–0.68	78–145ms
RTX 4080 Super	0.82–1.15	0.48–0.62	—
RTX 3060 12GB	1.65+	0.85–1.15	—

RTF < 1.0 means faster than real-time. Lower is better.

Cross-Model Performance (NVIDIA L4 benchmark, Inferless)

Model	50 words	100 words	200 words
Kokoro-82M	<0.1s	<0.2s	<0.3s
MeloTTS	<0.5s	<1s	~1.5s
Parler-TTS mini	~2s	~4s	~8s
F5-TTS	~3s	~6s	~12s
XTTS-v2	~8s	~18s	~35s

Other Published Benchmarks

Model	Platform	RTF	Notes
VITS (non-cloning)	RTX 3090	0.015	67× real-time — feed-forward, not autoregressive
Kani-TTS-2	RTX 5080	0.19	New LFM2 architecture
Kani-TTS-2	RTX 4080	~0.20	Similar to 5080 (bandwidth-limited)
VibeVoice-RT 0.5B	DGX Spark	0.48	Community benchmark
Chatterbox Turbo	Consumer GPU	Up to 6× RT	1-step distilled decoder
TADA-1B	A100	0.09	Ultra-low codec frame rate

LLM Token Generation as TTS Proxy

Since modern TTS models use autoregressive token generation, llama.cpp benchmarks are a reliable proxy for TTS decode throughput.

Llama 2 7B Q4_0 (single-stream decode)

Platform	tok/s	vs RTX 4090
RTX 5090	264–274	1.4×
RTX 4090	188–190	1.0× (baseline)
RTX 3090	160–162	0.85×
DGX Spark (batch 1)	~20.5	0.11×
DGX Spark (batch 32)	368 (total)	—
Mac Studio M3 Ultra (est.)	~120–150	~0.7×
Mac Studio M4 Max (est.)	~80–100	~0.5×

What this means for TTS

A TTS model generating audio tokens at the same rate as LLM text tokens:

RTX 5090: ~270 audio tok/s → real-time at codecs up to ~270 Hz
RTX 4090: ~190 audio tok/s → real-time at codecs up to ~190 Hz
DGX Spark: ~20 audio tok/s (batch 1) → real-time only at ~20 Hz or lower

This is why codec frame rate matters so much on DGX Spark: Qwen3-TTS-12Hz needs only 12.5 tok/s, TADA needs 2–3 tok/s, but a 50 Hz codec needs 50 tok/s — the Spark can't keep up at batch 1 for larger models.

Edge and ARM Hardware

Raspberry Pi 5

The default platform for Piper TTS (primary local TTS in Home Assistant).

Piper medium English: estimated RTF 0.1–0.2 (5–10× real-time) with 4 threads
Kokoro-82M int8 ONNX: approaching real-time on Pi 5 8GB (slower-than-real-time on Pi 4 2GB)
Home Assistant 2025.7+ added TTS streaming via Wyoming protocol, improving perceived response by ~10×

Rockchip RK3588 (Orange Pi 5, Rock 5B)

Same quad A76 cores as Pi 5 plus a 6 TOPS NPU for TTS decoder acceleration.

RKLLaMA project: Piper decoder on NPU, encoder via ONNX
MMS-TTS entirely on NPU
Potentially faster than Pi 5 for neural TTS (no published RTF benchmarks yet)

Jetson Orin Nano Super ($249)

GPU-accelerated edge option: 1,024 CUDA cores, 8 GB, 67 TOPS.

Community voice pipelines: Piper + faster-whisper + small LLMs
sherpa-onnx explicitly supports Jetson Orin (CPU + CUDA)
8 GB memory limits model size

Intel N100 Mini-PCs

~$150, ~25W under load. Kokoro via OpenVINO on Intel iGPU achieves 3× CPU speed at 15W TDP. More compute than Pi 5 at ~2× power.

Recommended Hardware by Use Case

Use Case	Hardware	TTS Model	Est. RTF	Cost	Power
Budget voice assistant	RPi 5 (8 GB)	Piper medium	0.1–0.2	~$80	~12W
Higher quality edge	RPi 5 (8 GB)	Kokoro int8 ONNX	~0.5–1.0	~$80	~12W
NPU-accelerated edge	Orange Pi 5 (16 GB)	Piper on RKNN	<0.2 est.	~$130	~18W
GPU edge device	Jetson Orin Nano Super	Piper + Kokoro	Real-time	$249	15–25W
Quiet desktop/dev	Mac Studio M4 Max	Kokoro (CoreML)	0.02–0.1	$1,999+	~120W
Best budget GPU TTS	PC + RTX 3090 (used)	Any TTS model	0.5–0.7	~$1,200 total	~400W
Production TTS server	PC + RTX 4090	Qwen3-TTS / F5-TTS	0.4–0.9	~$2,500 total	~550W
Maximum throughput	PC + RTX 5090	Qwen3-TTS 1.7B	0.48–0.55	~$4,000+ total	~700W
Large model prototyping	DGX Spark	30B+ TTS/LLM combos	Varies	$3,999	~170W
Huge model capacity	Mac Studio M3 Ultra	Any (via MLX)	0.5–2.0	$3,999+	~200W

The RTX 3090 value proposition

Best Price-to-Performance in 2026

At ~$700 used, the RTX 3090's 936 GB/s bandwidth is only 7% less than the RTX 4090's 1,008 GB/s, delivering near-identical per-token generation speed. This makes the 3090 the best price-to-performance option for TTS in 2026. The main downsides are power (350W TDP) and the inability to use FP8/NVFP4 quantization (Ampere architecture).

DGX Spark: Capacity vs Speed

The DGX Spark's value proposition is model capacity, not decode speed.

Strengths

128 GB unified memory — fits models that won't run on any consumer GPU
Two units NVLink-linked for 256 GB (run 405B models)
128 GB for simultaneous STT + LLM + TTS with room to spare
Silent operation, Mac Mini form factor, <100W AI workload
Native NVIDIA CUDA ecosystem (vs Apple's MLX)

Weaknesses

273 GB/s bandwidth — single-stream decode is slow (20.5 tok/s for 7B model)
$3,999–$4,699 for bandwidth that a $700 RTX 3090 exceeds 3.4×
aarch64 software ecosystem still maturing
"Blackwell Noise" breaks several TTS models on GPU

When DGX Spark makes sense for TTS

DGX Spark Sweet Spots

Running large voice cloning models (Fish S2 Pro at 4.4B, Higgs Audio at 5.8B) that don't fit in 24 GB
Complete voice pipelines (STT + large LLM + TTS) in one device
Development and testing before deploying to production GPU servers
Concurrent inference (batch 32 throughput is competitive)
Air-gapped deployments where silence and low power matter

When DGX Spark does NOT make sense for TTS

Single-user real-time voice conversation (bandwidth-limited)
Maximum generation speed per dollar (RTX 3090 wins)
Models that fit in 24 GB (use a 4090 instead)

References

Resource	URL
LMSYS DGX Spark Review	lmsys.org/blog/2025-10-13-nvidia-dgx-spark/
RTX 5090 specifications	vast.ai/article/nvidia-geforce-rtx-5090-specs
RTX 3090 vs 4090 AI comparison	bestgpusforai.com/gpu-comparison/3090-vs-4090
llama.cpp benchmarks	github.com/DandinPower/llama.cpp_bench
DGX Spark price comparison	glukhov.org/post/2025/10/nvidia-dgx-spark-prices/
DGX Spark vs alternatives	aimultiple.com/dgx-spark-alternatives
Mac Studio M3 Ultra specs	lowendmac.com/2025/mac-studio-early-2025/
Jetson Orin specs	nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/
Kokoro on Raspberry Pi	mikeesto.com/posts/kokoro-82m-pi/
Home Assistant voice	home-assistant.io/blog/2025/09/11/ai-in-home-assistant/
sherpa-onnx ARM support	github.com/k2-fsa/sherpa-onnx
Inferless TTS benchmark	inferless.com/learn/comparing-different-text-to-speech---tts--models-part-2