Performance Tuning¶
Optimize Smart-Diffusion for maximum performance.
Quick Wins¶
1. Use SageAttention¶
Speedup: Performance testing in progress Quality loss: Minimal
2. Enable FlexCache¶
Speedup: Performance testing in progress Quality loss: Minimal
3. Reduce Inference Steps¶
Speedup: Performance testing in progress Quality loss: Slight
GPU Utilization¶
Check Utilization¶
Target: GPU utilization benchmarks in progress
If low: 1. Increase batch size (future feature) 2. Use context parallelism 3. Check CPU bottlenecks
Memory Optimization¶
Reduce Memory Usage¶
Strategy 1: Low memory mode
Strategy 2: Lower resolution
Strategy 3: SageAttention
Benchmarking¶
Measure Performance¶
import time
start = time.time()
while not DiffusionTaskPool.all_finished():
chitu_generate()
elapsed = time.time() - start
print(f"Generation took {elapsed:.2f} seconds")
Expected Performance¶
Performance benchmarking is in progress. Results will be published once comprehensive testing is completed across different hardware configurations.
| Model | Resolution | Frames | Steps | A100 (40GB) | H100 (80GB) |
|---|---|---|---|---|---|
| 1.3B | 480x848 | 81 | 50 | To be tested | To be tested |
| 14B | 480x848 | 81 | 50 | To be tested | To be tested |
| 14B | 720x1280 | 121 | 50 | To be tested | To be tested |
Performance improvements with optimizations will be benchmarked
Multi-GPU Scaling¶
Context Parallelism Efficiency¶
Multi-GPU scaling benchmarks in progress.
| GPUs | Speedup | Efficiency |
|---|---|---|
| 1 | 1.0x | 100% |
| 2 | To be tested | To be tested |
| 4 | To be tested | To be tested |
| 8 | To be tested | To be tested |
CFG Parallelism¶
CFG parallelism performance testing in progress.
| GPUs | Speedup |
|---|---|
| 2 | To be tested |
Profiling¶
Enable Debug Mode¶
Shows detailed timing information.