Smart-Diffusion Documentation¶
Welcome to the Smart-Diffusion documentation! Smart-Diffusion is a high-performance diffusion model inference framework that provides extreme performance and flexible scheduling for AI-generated content (AIGC) workloads.
What is Smart-Diffusion?¶
Smart-Diffusion is built on Chitu, a high-performance LLM inference framework. It extends Chitu's capabilities to support the rapidly growing Diffusion ecosystem, providing:
- π Extreme Performance: Advanced parallelism strategies and optimized kernels
- π§ Flexible Architecture: Multiple attention backend support
- πΎ Memory Efficiency: Low memory modes with intelligent model offloading
- π Smart Caching: Feature reuse algorithms for acceleration
- π― Simple API: Easy-to-use interface with per-request configuration
Quick Links¶
-
Get Started
Install Smart-Diffusion and run your first generation in minutes
-
User Guide
Learn how to use Smart-Diffusion effectively
-
Performance Tuning
Optimize your inference for speed and memory
-
API Reference
Detailed API documentation for all components
Key Features¶
High-Performance Inference¶
Smart-Diffusion achieves superior performance through:
- Parallelism: Context parallelism (CP), CFG parallelism, and data parallelism
- Optimized Kernels: FlashAttention, SageAttention, SpargeAttention
- Smart Scheduling: Efficient task management and resource utilization
Memory Efficiency¶
Run large models on limited hardware:
- Model Offloading: CPU offloading for DiT models and encoders
- VAE Tiling: Reduced memory usage during decoding
- Flexible Configuration: Adjustable memory levels (0-3)
Feature Reuse¶
Accelerate generation with intelligent caching:
- TeaCache: Temporal adaptive caching (CVPR24)
- PAB: Pyramid attention broadcasting (ICLR25)
Supported Models¶
Currently supported:
- Wan-AI/Wan2.1-T2V-1.3B (1.3B parameters)
- Wan-AI/Wan2.1-T2V-14B (14B parameters)
- Wan-AI/Wan2.2-T2V-A14B (14B parameters, two-stage)
More models coming soon!
Architecture Overview¶
graph TD
A[UserRequest] --> B[TaskPool]
B --> C[Scheduler]
C --> |Task| G[Generator]
G --> VE[VAE Encoder]
G --> TE[TextEncoder]
TE -->|Latents| DiT[DiT Loop]
VE -->|Latents| DiT[DiT Loop]
DiT --> VD[VAE Decoder]
VD --> V[Output]
Smart-Diffusion follows a modular architecture:
- Task Management: User requests are converted to tasks and added to the task pool
- Scheduling: The scheduler selects pending tasks for execution
- Generation: The generator orchestrates the full generation pipeline:
- Text encoding (T5)
- Iterative denoising (DiT)
- VAE decoding
- Output: Generated videos are saved to disk
Community¶
Join our community:
- GitHub: chen-yy20/SmartDiffusion
- Issues: Report bugs and request features
- Discussions: Ask questions and share ideas
Next Steps¶
Ready to get started?
- Install Smart-Diffusion
- Run your first generation
- Explore advanced features
- Read the design philosophy
Note: Smart-Diffusion is under active development. We welcome contributions and feedback!