Why Smart-Diffusion?¶
Workload Characteristics of Diffusion¶
Diffusion inference is a full-stack compute-bound task:
- Sample-by-sample execution: Batching hardly improves GPU utilization; pure streaming is mandatory.
- Long activation sequences, yet relatively small models: Sequence Parallelism becomes the most economical and scalable dimension.
- In long-sequence regimes, Full Attention accounts for ~80 % of end-to-end latency: operator-level efforts must target Attention first.
- Activations change mildly between denoising steps: a simple, lossy Feature Cache yields instant speed-ups.
Design Philosophy of Smart-Diffusion¶
Three Pillars: Parallelism × Kernels × Algorithms¶
Each can be pursued independently, but co-design extracts the last drop of performance.
(Technical deep-dives will be released incrementally—PRs welcome!)
Service Framework for Multi-User, Multi-Task Workloads¶
We ship a long-running, hot-upgradable, horizontally-scalable Diffusion service—not a frozen script that cold-starts every time.
Key idea: decompose the Diffusion pipeline into composable stages and orchestrate them with a unified scheduler:
- Let users tune their own quality-efficiency trade-off: steps, CFG, cache ratio—all at runtime.
- Keep all resources saturated: FLOPs are only the first bottleneck; memory, bandwidth and CPU must be fully utilized as well.
Developer Guide¶
Thanks for joining the Smart-Diffusion open-source community! To keep code review painless, please align on the “parameter taxonomy” first:
| Category | Life-Cycle | Location | Who Can Change | Best Practice |
|---|---|---|---|---|
| Model params | Static | chitu_core/config/models/<model>.yaml |
Nobody | Tied to weights; any change is UB |
| User params | Dynamic (per-request) | DiffusionUserParams |
End user | Expose necessary & sufficient knobs; avoid parameter spam |
| System params | Semi-dynamic (init-time) | chitu launch args |
Ops/Scheduler | No hot-edit after init; prevents distributed state explosion |
Remember:
Every extra knob costs documentation + tests + user mental model. Flexibility ≠ surface area.