Smart-Diffusion Documentation¶

Welcome to the Smart-Diffusion documentation! Smart-Diffusion is a high-performance diffusion model inference framework that provides extreme performance and flexible scheduling for AI-generated content (AIGC) workloads.

What is Smart-Diffusion?¶

Smart-Diffusion is built on Chitu, a high-performance LLM inference framework. It extends Chitu's capabilities to support the rapidly growing Diffusion ecosystem, providing:

🚀 Extreme Performance: Advanced parallelism strategies and optimized kernels
🔧 Flexible Architecture: Multiple attention backend support
💾 Memory Efficiency: Low memory modes with intelligent model offloading
📊 Smart Caching: Feature reuse algorithms for acceleration
🎯 Simple API: Easy-to-use interface with per-request configuration

Quick Links¶

Get Started

Install Smart-Diffusion and run your first generation in minutes

Installation
User Guide

Learn how to use Smart-Diffusion effectively

Basic Usage
Performance Tuning

Optimize your inference for speed and memory

Tuning Guide
API Reference

Detailed API documentation for all components

API Docs

Key Features¶

High-Performance Inference¶

Smart-Diffusion achieves superior performance through:

Parallelism: Context parallelism (CP), CFG parallelism, and data parallelism
Optimized Kernels: FlashAttention, SageAttention, SpargeAttention
Smart Scheduling: Efficient task management and resource utilization

Memory Efficiency¶

Run large models on limited hardware:

Model Offloading: CPU offloading for DiT models and encoders
VAE Tiling: Reduced memory usage during decoding
Flexible Configuration: Adjustable memory levels (0-3)

Feature Reuse¶

Accelerate generation with intelligent caching:

TeaCache: Temporal adaptive caching (CVPR24)
PAB: Pyramid attention broadcasting (ICLR25)

Supported Models¶

Currently supported:

Wan-AI/Wan2.1-T2V-1.3B (1.3B parameters)
Wan-AI/Wan2.1-T2V-14B (14B parameters)
Wan-AI/Wan2.2-T2V-A14B (14B parameters, two-stage)

More models coming soon!

Architecture Overview¶

graph TD
    A[UserRequest] --> B[TaskPool]
    B --> C[Scheduler]
    C --> |Task| G[Generator]
    G --> VE[VAE Encoder]
    G --> TE[TextEncoder]
    TE -->|Latents| DiT[DiT Loop]
    VE -->|Latents| DiT[DiT Loop]
    DiT --> VD[VAE Decoder]
    VD --> V[Output]

Smart-Diffusion follows a modular architecture:

Task Management: User requests are converted to tasks and added to the task pool
Scheduling: The scheduler selects pending tasks for execution
Generation: The generator orchestrates the full generation pipeline:
Text encoding (T5)
Iterative denoising (DiT)
VAE decoding
Output: Generated videos are saved to disk

Community¶

Join our community:

GitHub: chen-yy20/SmartDiffusion
Issues: Report bugs and request features
Discussions: Ask questions and share ideas

Next Steps¶

Ready to get started?

Note: Smart-Diffusion is under active development. We welcome contributions and feedback!