Quick Start¶
This guide will help you generate your first video with Smart-Diffusion in just a few minutes.
Prerequisites¶
Before starting, make sure you have:
- Installed Smart-Diffusion
- Downloaded a model checkpoint (see Model Downloads)
Model Downloads¶
Smart-Diffusion currently supports the Wan-T2V series models:
| Model | Size | Download |
|---|---|---|
| Wan2.1-T2V-1.3B | 1.3B | Hugging Face |
| Wan2.1-T2V-14B | 14B | Hugging Face |
| Wan2.2-T2V-A14B | 14B | Hugging Face |
Download the model checkpoint to a local directory, e.g., /path/to/Wan2.1-T2V-1.3B.
Basic Generation¶
Step 1: Create a Test Script¶
Create a file named test_generate.py:
from chitu_diffusion import chitu_init, chitu_generate, chitu_start, chitu_terminate
from chitu_diffusion.task import DiffusionUserParams, DiffusionTask, DiffusionTaskPool
from hydra import compose, initialize
# Initialize configuration
initialize(config_path="config", version_base=None)
args = compose(config_name="wan")
# Set model checkpoint path
args.models.ckpt_dir = "/path/to/Wan2.1-T2V-1.3B"
# Initialize the backend
chitu_init(args)
chitu_start()
# Create a generation task
user_params = DiffusionUserParams(
role="user1",
prompt="A cat walking on grass.",
num_inference_steps=50,
height=480,
width=848,
num_frames=81,
guidance_scale=7.0,
)
# Add task to pool
task = DiffusionTask.from_user_request(user_params)
DiffusionTaskPool.add(task)
# Generate until completion
while not DiffusionTaskPool.all_finished():
chitu_generate()
# Terminate backend
chitu_terminate()
print(f"✅ Video saved to: {task.buffer.save_path}")
Step 2: Run the Script¶
Single GPU:
Multi-GPU (Single Node):
Multi-Node SLURM:
Step 3: View the Output¶
The generated video will be saved to:
Parameter Customization¶
Adjust Video Properties¶
user_params = DiffusionUserParams(
prompt="A beautiful sunset over the ocean",
height=720, # Video height in pixels
width=1280, # Video width in pixels
num_frames=121, # Number of frames (higher = longer video)
fps=24, # Frames per second
)
Control Generation Quality¶
user_params = DiffusionUserParams(
prompt="A dog playing in the park",
num_inference_steps=50, # More steps = better quality (slower)
guidance_scale=7.0, # Higher = more prompt adherence
)
Set Output Path¶
user_params = DiffusionUserParams(
prompt="A spaceship landing on Mars",
save_path="./my_videos/mars_landing.mp4",
)
Using Different Attention Backends¶
SageAttention (Faster, Quantized)¶
SpargeAttention (Fastest, Sparse)¶
Low Memory Mode¶
If you encounter Out-of-Memory errors:
Memory levels: - 0: All models on GPU (highest performance) - 1: VAE uses tiling - 2: T5 encoder on CPU (recommended for 24GB VRAM) - 3+: DiT model on CPU (slowest but works on limited VRAM)
Batch Generation¶
Generate multiple videos:
prompts = [
"A cat walking on grass",
"A dog playing in the park",
"A bird flying in the sky",
]
for i, prompt in enumerate(prompts):
user_params = DiffusionUserParams(
role=f"user{i}",
prompt=prompt,
save_path=f"./outputs/video_{i}.mp4",
)
task = DiffusionTask.from_user_request(user_params)
DiffusionTaskPool.add(task)
# Generate all tasks
while not DiffusionTaskPool.all_finished():
chitu_generate()
Example Outputs¶
Here are some example generations with different prompts:
Example 1: Nature Scene¶
prompt = "A serene mountain lake at sunrise, mist rising from the water"
# Resolution: 1280x720, 121 frames, 24 fps
Example 2: Urban Scene¶
prompt = "A busy city street at night, neon lights reflecting on wet pavement"
# Resolution: 848x480, 81 frames, 24 fps
Example 3: Abstract¶
prompt = "Colorful paint swirling and mixing in slow motion"
# Resolution: 720x720, 61 frames, 30 fps
Common Issues¶
Issue: Model Not Found¶
Error: FileNotFoundError: No checkpoint files found
Solution: Verify the checkpoint path is correct:
Issue: Out of Memory¶
Error: CUDA out of memory
Solutions:
1. Use lower resolution: height=480, width=848
2. Enable low memory mode: infer.diffusion.low_mem_level=2
3. Reduce batch size or frames: num_frames=61
Issue: Slow Generation¶
Solutions:
1. Use SageAttention: infer.attn_type=sage
2. Reduce inference steps: num_inference_steps=30
3. Enable FlexCache: flexcache='teacache'
Next Steps¶
Now that you've generated your first video, explore:
- Advanced Features - FlexCache, CFG parallelism, etc.
- Performance Tuning - Optimize for speed
- Multi-GPU Setup - Scale to multiple GPUs
- API Reference - Detailed API documentation
Getting Help¶
Need help?
- Check the FAQ
- Read the User Guide
- Ask in GitHub Discussions
- Report bugs in GitHub Issues