[metal::malloc] Resource limit exceeded during TTS audio generation on 48G macbook pro

#1
by seedds - opened

I'm attempting to generate audio using the cosyvoice3 TTS model from mlx_audio on a MacBook Pro with an M4 chip and 48GB of unified memory. Despite having substantial RAM, audio generation fails with the following error:

(RuntimeError: [metal::malloc] Resource limit (499000) exceeded.)

The traceback indicates the failure occurs during the inverse short-time Fourier transform (istft) step inside the HiFi-GAN vocoder, specifically when trying to write output frames using .at[].add().

Environment:

Device: MacBook Pro (M4 Pro, 48GB unified memory)
OS: macOS (latest stable version)
Python version: 3.12
Package: mlx_audio (TTS module, cosyvoice3 model)
Metal backend in use (via MLX)

Steps to Reproduce:

Load the cosyvoice3 TTS model.
Call generate() with valid input text.
Observe crash during audio waveform synthesis in the vocoder stage.

Error Log:
Error loading model: Audio generation failed: [metal::malloc] Resource limit (499000) exceeded.
Traceback (most recent call last):
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 1160, in generate
audio = self._model.synthesize(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 264, in synthesize
audio = self.mel_to_audio(mel)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 187, in mel_to_audio
audio, _ = self.hifigan(mel, finalize=finalize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 743, in call
generated_speech = self.decode(x=speech_feat, s=s, finalize=finalize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 706, in decode
x = self._istft(magnitude, phase)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 637, in _istft
return istft(
^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 471, in istft
output = output.at[:, start : start + n_fft].add(frames[:, i, :])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [metal::malloc] Resource limit (499000) exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/generate.py", line 357, in generate_audio
for i, result in enumerate(results):
^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 1210, in generate
raise RuntimeError(f"Audio generation failed: {e}")
RuntimeError: Audio generation failed: [metal::malloc] Resource limit (499000) exceeded.

Expected Behavior:
Audio should be generated successfully, leveraging the available 48GB of system memory. The model should not exceed Metal's internal resource limits for reasonable input lengths.

Actual Behavior:
Process fails with RuntimeError: [metal::malloc] Resource limit (499000) exceeded, even though system memory is far from exhausted. This suggests the issue is not with total RAM but with Metal's per-process or per-buffer allocation limits (possibly related to temporary tensor allocations during istft).

Additional Notes:

The error appears to stem from MLX’s Metal backend imposing a hard limit (~499k units—likely bytes or elements) on a single allocation or cumulative GPU memory usage within a kernel.
48GB of unified memory should be more than sufficient for typical TTS workloads. This implies a potential inefficiency or unbounded intermediate tensor growth in the istft implementation (e.g., large in-place updates via .at[].add() over long sequences).
May be exacerbated by long input prompts or high sample-rate output.

Suggested Fixes / Investigations:

Chunked ISTFT processing: Avoid accumulating large intermediate tensors; process audio in smaller windows.
Memory profiling: Add diagnostics to track Metal buffer usage during synthesis.
MLX backend tuning: Expose or increase Metal resource limits if possible (or fall back to CPU for large ops).
Optimize .at[].add() usage: Replace with more memory-efficient accumulation patterns if this operation triggers large temporary allocations.
MLX Community org

Thanks for reporting this. Already before reading your report, I added an optimization that should fix this error. Please try it again with the latest version (0.1.6) and let me know if it works for you now.

Sign up or log in to comment