Multi-GPU Setup¶
Configure Smart-Diffusion for multi-GPU execution.
Prerequisites¶
- Multiple NVIDIA GPUs
- NCCL installed
- GPUs on same node or connected via high-speed interconnect
Single Node, Multiple GPUs¶
Unified Launcher¶
--cfp (or parallel.cfp in config) is CFG parallel factor (1 or 2).
Launcher derives infer.diffusion.cp_size = total_gpus / cfp automatically.
Multi-Node Setup¶
Using SLURM¶
Manual Configuration¶
# Node 0
export MASTER_ADDR=node0
export MASTER_PORT=29500
export WORLD_SIZE=8
export RANK=0
export LOCAL_RANK=0
python test_generate.py ...
# Node 1
export RANK=4
export LOCAL_RANK=0
python test_generate.py ...
Parallelism Strategies¶
Context Parallelism¶
Split frames across GPUs:
CFG Parallelism¶
Automatic with 2+ GPUs when CFG enabled.
Combined¶
Troubleshooting¶
NCCL Timeout¶
Increase timeout:
Network Issues¶
Check connectivity: