Why Smart-Diffusion?¶

Characteristics of Diffusion Inference¶

Diffusion inference is compute-intensive with these key characteristics:

Sample-by-sample execution: Batching provides minimal GPU utilization improvement; single-sample streaming is sufficient.
Long sequences, small models: With very long activation sequences but relatively small model parameters, Context Parallelism is the most cost-effective parallelization strategy.
Attention is the bottleneck: In long-sequence scenarios, Full Attention accounts for over 80% of end-to-end latency, making it the primary optimization target.
Small activation changes: Activations change minimally between denoising steps, so simple Feature Cache methods can provide significant speedups.

Smart-Diffusion Design Philosophy¶

Three Optimization Directions: Parallelism × Kernels × Algorithms¶

Each direction can be optimized independently, but combining them yields the best results.
(Technical details will be updated progressively—PRs are welcome!)

Service Framework for Multi-User, Multi-Task Workloads¶

We provide a long-running, hot-upgradable, horizontally-scalable Diffusion service, not a cold-start script.
The core idea is to decompose the Diffusion pipeline into composable stages orchestrated by a unified scheduler:

Let users tune their quality-efficiency tradeoff: inference steps, CFG, cache ratio—all adjustable at runtime.
Keep all resources fully utilized: not just compute, but also memory, bandwidth, and CPU.

Developer Guide¶

Thanks for contributing to Smart-Diffusion! To make code review easier, please understand our parameter taxonomy:

Category	Lifecycle	Location	Who Can Change	Best Practice
Model params	Static	`chitu_core/config/models/<model>.yaml`	Nobody	Tied to weights; changes will break things
User params	Dynamic (per-request)	`DiffusionUserParams`	End user	Expose only necessary parameters; keep it simple
System params	Semi-dynamic (init-time)	`chitu launch args`	Ops/Scheduler	No changes after init; prevents distributed state issues

Remember:
Every extra parameter adds documentation, testing, and user complexity. Flexibility ≠ more parameters.

Directory Structure¶

/chitu_core contains Chitu's native code. Avoid modifying ServeConfig and ParallelState unless necessary. /chitu_diffusion is our diffusion framework built on Chitu's architecture. It can be modified but should maintain the basic structure. * chitu_diffusion_main.py: Main parameters for system initialization, startup, and shutdown * backend.py: Backend built on system parameters, stores models and schedules tasks.