Deploy MusicGen on an NVIDIA RTX PRO 6000 Blackwell GPU with PyTorch 2.10.0+cu130 for AI-powered music generation from text prompts. This guide uses the latest CUDA 13.0 runtime optimized for Blackwell architecture.
This deployment was verified on June 10, 2026 using PyTorch 2.10.0+cu130 on an RTX PRO 6000 Blackwell GPU. The critical compatibility requirement is using PyTorch from the cu130 index—older cu124 wheels don’t target Blackwell sm_120 correctly.
MusicGen transforms text prompts into audio using Facebook’s conditional music generation model. The RTX PRO 6000 Blackwell GPU provides 96GB of VRAM and Blackwell architecture optimizations, making it ideal for music generation workloads that require substantial memory for high-quality, longer-duration audio synthesis.
This guide validates MusicGen deployment on Blackwell hardware with PyTorch 2.10.0+cu130, ensuring proper CUDA compute capability targeting. We’ll start with a quick smoke test using facebook/musicgen-small, then demonstrate scaling to larger models.
| Component | Version | Purpose |
|---|---|---|
| Ubuntu Server | 24.04 LTS | Base OS with NVIDIA drivers |
| PyTorch | 2.10.0+cu130 | Deep learning framework with CUDA 13.0 |
| Transformers | 4.57.3 | Hugging Face model library |
| MusicGen | facebook/musicgen-small | Text-to-music generation model |
| CUDA Runtime | 13.0 | GPU compute acceleration |
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| vCPU | 16 cores | 24+ cores | Model loading and audio processing |
| RAM | 144 GB | 192+ GB | Large model weights and generation buffers |
| GPU Memory | 48 GB | 96 GB | MusicGen-large requires significant VRAM |
| Storage | 100 GB | 500+ GB | Model cache and generated audio files |
| Network | 1 Gbps | 10+ Gbps | Model downloads from Hugging Face Hub |
Massed Compute VM Pricing
| SKU | Description | vCPU | RAM | Storage | Price | Capacity |
|---|---|---|---|---|---|---|
gpu_4x_A30 |
4x A30 (24GB) | 50 | 192 GiB | 1024 GB | $1.40/hr | 0 |
gpu_2x_l40_spot |
2x L40 (48GB) [Spot] | 26 | 144 GiB | 1250 GB | $1.55/hr | 6 |
gpu_2x_6000_ada |
2x RTX 6000 ADA (48GB) | 26 | 144 GiB | 700 GB | $1.58/hr | 6 |
gpu_2x_l40 |
2x L40 (48GB) | 26 | 144 GiB | 1250 GB | $1.72/hr | 6 |
gpu_2x_l40s |
2x L40S (48GB) | 24 | 144 GiB | 1250 GB | $1.76/hr | 7 |
gpu_1x_pro_6000_blackwell |
1x RTX PRO 6000 Blackwell (96GB) | 16 | 144 GiB | 725 GB | $2.19/hr | 10 |
The gpu_1x_pro_6000_blackwell SKU provides the optimal balance of VRAM and compute for MusicGen workloads, with native Blackwell architecture support and 96GB memory for large model inference.
Step-by-Step Deployment
Provision Blackwell GPU VM
Launch a new VM with the RTX PRO 6000 Blackwell GPU and Ubuntu 24.04 with pre-installed NVIDIA drivers:
- Product:
gpu_1x_pro_6000_blackwell - Image: Ubuntu Server 24.04 w/ Drivers (ID: 184)
- Region: Any region with available capacity
- Instance name:
blackwell-musicgen
Wait for the VM to reach running status and verify SSH connectivity. If reusing a public IP, clear any old host keys:
ssh-keygen -R YOUR_VM_IP
Verify GPU Hardware
Confirm the expected Blackwell GPU configuration and driver version:
ssh ubuntu@YOUR_VM_IP \ 'nvidia-smi --query-gpu=name,driver_version,memory.total,memory.free --format=csv,noheader'
Expected output: RTX PRO 6000 Blackwell, driver 580.x or newer, approximately 96GB total VRAM.
Install System Dependencies
Install Python, audio libraries, and create a virtual environment:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF' set -euxo pipefail # Wait for cloud-init if apt is locked sudo apt-get update sudo DEBIAN_FRONTEND=noninteractive apt-get install -y \ python3.12-venv python3-pip ffmpeg \ libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev \ libavfilter-dev libswscale-dev libswresample-dev pkg-config curl python3 -m venv ~/musicgen-env . ~/musicgen-env/bin/activate python -m pip install --upgrade pip setuptools wheel EOF
Install PyTorch with CUDA 13.0
Install PyTorch 2.10.0 from the cu130 index, which includes proper Blackwell sm_120 support:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF' set -euxo pipefail . ~/musicgen-env/bin/activate python -m pip install \ torch==2.10.0 torchvision torchaudio \ --index-url https://download.pytorch.org/whl/cu130 EOF
Verify PyTorch can access the Blackwell GPU:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF'
. ~/musicgen-env/bin/activate
python - <<'PY'
import torch
print(torch.__version__, torch.cuda.is_available(), torch.cuda.get_device_name(0))
a = torch.randn((2048, 2048), device='cuda', dtype=torch.float16)
b = torch.randn((2048, 2048), device='cuda', dtype=torch.float16)
torch.cuda.synchronize()
c = a @ b
torch.cuda.synchronize()
print('BLACKWELL_TORCH_OK', c.shape, c.dtype)
PY
EOF
Install MusicGen Dependencies
Install Hugging Face Transformers and acceleration libraries:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF' set -euxo pipefail . ~/musicgen-env/bin/activate python -m pip install --retries 5 --timeout 60 \ transformers==4.57.3 accelerate==1.12.0 EOF
Create Generation Script
Set up a self-contained MusicGen script with WAV output:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF'
set -euxo pipefail
mkdir -p ~/musicgen ~/musicgen/outputs
cat > ~/musicgen/generate.py <<'PY'
#!/usr/bin/env python3
import argparse
import wave
import numpy as np
import torch
from transformers import AutoProcessor, MusicgenForConditionalGeneration
DEFAULT_MODEL_ID = 'facebook/musicgen-small'
SAMPLE_RATE = 32000
def write_wav(path, samples):
samples = np.asarray(samples, dtype=np.float32)
samples = np.clip(samples, -1.0, 1.0)
pcm = (samples * 32767.0).astype(np.int16)
with wave.open(path, 'wb') as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(SAMPLE_RATE)
wf.writeframes(pcm.tobytes())
def main():
parser = argparse.ArgumentParser()
parser.add_argument('prompt', nargs='?', default='lo-fi hip hop beat, mellow piano')
parser.add_argument('--duration', type=int, default=4)
parser.add_argument('--output', default='outputs/smoke_test.wav')
parser.add_argument('--temperature', type=float, default=1.0)
parser.add_argument('--top-k', type=int, default=250)
parser.add_argument('--model-id', default=DEFAULT_MODEL_ID)
args = parser.parse_args()
processor = AutoProcessor.from_pretrained(args.model_id)
model = MusicgenForConditionalGeneration.from_pretrained(
args.model_id,
dtype=torch.float16,
).to('cuda')
inputs = processor(text=[args.prompt], padding=True, return_tensors='pt').to('cuda')
with torch.no_grad():
audio = model.generate(
**inputs,
max_new_tokens=int(args.duration * 50),
do_sample=True,
top_k=args.top_k,
temperature=args.temperature,
)
samples = audio[0, 0].detach().float().cpu().numpy()
write_wav(args.output, samples)
print(f'Wrote {args.output}')
print('MUSICGEN_SMOKE_OK')
if __name__ == '__main__':
main()
PY
EOF
Generate Smoke Test Audio
Run a quick smoke test with MusicGen-small to verify the complete pipeline:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF' set -euxo pipefail . ~/musicgen-env/bin/activate cd ~/musicgen python generate.py 'lo-fi hip hop beat, mellow piano' \ --duration 4 \ --output outputs/smoke_test.wav EOF
Look for the MUSICGEN_SMOKE_OK confirmation message.
Verify Audio Output
Confirm the generated WAV file is valid and contains audio data:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF'
set -euo pipefail
cd ~/musicgen
ls -lah outputs/smoke_test.wav
python - <<'PY'
import wave
with wave.open('outputs/smoke_test.wav', 'rb') as wf:
print('WAV', wf.getframerate(), wf.getnchannels(), wf.getnframes())
assert wf.getframerate() == 32000
assert wf.getnframes() > 1000
PY
EOF
Scale to Larger Models
After the smoke test passes, generate higher-quality audio with MusicGen-large:
ssh ubuntu@YOUR_VM_IP 'bash -s' <<'EOF' set -euxo pipefail . ~/musicgen-env/bin/activate cd ~/musicgen python generate.py 'epic orchestral trailer music, rising strings' \ --model-id facebook/musicgen-large \ --duration 30 \ --output outputs/musicgen_large_30s.wav EOF
Troubleshooting
Common Issues
- APT lock after cloud-init: Wait for apt processes to complete, then retry the system dependency installation step.
- CUDA sm_120 warnings: Ensure you're using PyTorch from the cu130 index exactly as specified. Older cu124 wheels lack proper Blackwell support.
- Slow package downloads: Add
--retries 5 --timeout 60to pip install commands. Some regions may have slower access to PyPI. - Hugging Face rate limits: Set up an HF_TOKEN environment variable with a read-only token if downloads are throttled.
- CUDA device-side asserts: Keep smoke test durations short and limit large model generations to 30 seconds maximum.
- Out of memory errors: Use
facebook/musicgen-smallfor initial testing or reduce generation duration.
Performance Optimization
- Use
torch.float16precision to reduce memory usage - Enable gradient checkpointing for longer sequences
- Monitor GPU memory with
nvidia-smiduring generation - Cache model weights locally to speed up subsequent runs
Skip All of This: Deploy with an AI Agent
The complete MusicGen deployment process above exists as a tested, machine-readable recipe in the Massed Compute MCP. Instead of running each step manually, you can use an AI agent to handle the entire deployment automatically.
Add this MCP server configuration to your AI client:
{
"mcpServers": {
"massed-compute": {
"type": "http",
"url": "https://vm.massedcompute.com/api/mcp",
"headers": { "Authorization": "Bearer MC_TOKEN" }
}
}
}
Then say:
The agent will match your request against the recipe catalog, provision the right VM shape with Blackwell GPU support, install PyTorch 2.10.0+cu130 with proper CUDA targeting, set up the MusicGen environment, and run verification tests. If any step fails, the process stops and reports the exact error for debugging. This recipe was last tested on June 10, 2026.
Quick Setup Guide
For experienced users, here's the condensed deployment sequence:
- Launch
gpu_1x_pro_6000_blackwellwith Ubuntu 24.04 + drivers - Install system packages:
python3.12-venv ffmpeg libav* - Create venv and install PyTorch 2.10.0+cu130 from cu130 index
- Install
transformers==4.57.3 accelerate==1.12.0 - Create generation script with WAV output using Python wave module
- Test with
facebook/musicgen-smallfor 4-second smoke clip - Scale to
facebook/musicgen-largefor production audio (30s max) - Copy audio files locally and terminate disposable VM
Total setup time: ~15 minutes for smoke test, ~60 minutes including large model download.
Frequently Asked Questions
01Why is PyTorch cu130 required for Blackwell GPUs?
Blackwell architecture uses the sm_120 compute capability, which requires CUDA 13.0 or newer for optimal performance. PyTorch wheels built for older CUDA versions (cu124 and earlier) don't include the necessary kernel optimizations and may fall back to slower code paths or produce compatibility warnings.
02How much VRAM does MusicGen require for different durations?
MusicGen-small typically uses 8-12GB for short clips, while MusicGen-large can consume 40-60GB for 30-second generations. The RTX PRO 6000 Blackwell's 96GB VRAM provides comfortable headroom for large model inference and longer audio sequences without memory pressure.
03Can I use spot instances for MusicGen workloads?
Spot instances work well for experimental MusicGen workloads since model generation is typically short-lived (minutes, not hours). However, for production pipelines or when downloading large models, on-demand instances provide guaranteed availability and avoid potential interruptions during critical operations.
04What audio formats does the generation script support?
The provided script outputs uncompressed 16-bit PCM WAV files at 32kHz sample rate using Python's built-in wave module. You can extend it to support MP3, FLAC, or other formats by installing additional audio libraries like pydub or soundfile, though WAV provides the best compatibility for post-processing workflows.
05How do I optimize generation speed for batch processing?
For batch processing, keep the model loaded in GPU memory between generations, use consistent batch sizes, enable mixed precision with torch.autocast, and consider using torch.compile() for PyTorch 2.x optimization. Processing multiple prompts in a single batch is more efficient than individual sequential generations.











