[metal::malloc] Resource limit exceeded during TTS audio generation on 48G macbook pro
I'm attempting to generate audio using the cosyvoice3 TTS model from mlx_audio on a MacBook Pro with an M4 chip and 48GB of unified memory. Despite having substantial RAM, audio generation fails with the following error:
(RuntimeError: [metal::malloc] Resource limit (499000) exceeded.)
The traceback indicates the failure occurs during the inverse short-time Fourier transform (istft) step inside the HiFi-GAN vocoder, specifically when trying to write output frames using .at[].add().
Environment:
Device: MacBook Pro (M4 Pro, 48GB unified memory)
OS: macOS (latest stable version)
Python version: 3.12
Package: mlx_audio (TTS module, cosyvoice3 model)
Metal backend in use (via MLX)
Steps to Reproduce:
Load the cosyvoice3 TTS model.
Call generate() with valid input text.
Observe crash during audio waveform synthesis in the vocoder stage.
Error Log:
Error loading model: Audio generation failed: [metal::malloc] Resource limit (499000) exceeded.
Traceback (most recent call last):
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 1160, in generate
audio = self._model.synthesize(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 264, in synthesize
audio = self.mel_to_audio(mel)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 187, in mel_to_audio
audio, _ = self.hifigan(mel, finalize=finalize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 743, in call
generated_speech = self.decode(x=speech_feat, s=s, finalize=finalize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 706, in decode
x = self._istft(magnitude, phase)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 637, in _istft
return istft(
^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/hifigan.py", line 471, in istft
output = output.at[:, start : start + n_fft].add(frames[:, i, :])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [metal::malloc] Resource limit (499000) exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/generate.py", line 357, in generate_audio
for i, result in enumerate(results):
^^^^^^^^^^^^^^^^^^
File "/Users/f2pgod/Documents/spyder312/lib/python3.12/site-packages/mlx_audio/tts/models/cosyvoice3/cosyvoice3.py", line 1210, in generate
raise RuntimeError(f"Audio generation failed: {e}")
RuntimeError: Audio generation failed: [metal::malloc] Resource limit (499000) exceeded.
Expected Behavior:
Audio should be generated successfully, leveraging the available 48GB of system memory. The model should not exceed Metal's internal resource limits for reasonable input lengths.
Actual Behavior:
Process fails with RuntimeError: [metal::malloc] Resource limit (499000) exceeded, even though system memory is far from exhausted. This suggests the issue is not with total RAM but with Metal's per-process or per-buffer allocation limits (possibly related to temporary tensor allocations during istft).
Additional Notes:
The error appears to stem from MLX’s Metal backend imposing a hard limit (~499k units—likely bytes or elements) on a single allocation or cumulative GPU memory usage within a kernel.
48GB of unified memory should be more than sufficient for typical TTS workloads. This implies a potential inefficiency or unbounded intermediate tensor growth in the istft implementation (e.g., large in-place updates via .at[].add() over long sequences).
May be exacerbated by long input prompts or high sample-rate output.
Suggested Fixes / Investigations:
Chunked ISTFT processing: Avoid accumulating large intermediate tensors; process audio in smaller windows.
Memory profiling: Add diagnostics to track Metal buffer usage during synthesis.
MLX backend tuning: Expose or increase Metal resource limits if possible (or fall back to CPU for large ops).
Optimize .at[].add() usage: Replace with more memory-efficient accumulation patterns if this operation triggers large temporary allocations.
Thanks for reporting this. Already before reading your report, I added an optimization that should fix this error. Please try it again with the latest version (0.1.6) and let me know if it works for you now.