Skip to content

Quick Start

This guide will help you generate your first video with Smart-Diffusion in just a few minutes.

Prerequisites

Before starting, make sure you have:

  1. Installed Smart-Diffusion
  2. Downloaded a model checkpoint (see Model Downloads)

Model Downloads

Smart-Diffusion currently supports the Wan-T2V series models:

Model Size Download
Wan2.1-T2V-1.3B 1.3B Hugging Face
Wan2.1-T2V-14B 14B Hugging Face
Wan2.2-T2V-A14B 14B Hugging Face

Download the model checkpoint to a local directory, e.g., /path/to/Wan2.1-T2V-1.3B.

Basic Generation

Step 1: Create a Test Script

Create a file named test_generate.py:

from chitu_diffusion import chitu_init, chitu_generate, chitu_start, chitu_terminate
from chitu_diffusion.task import DiffusionUserParams, DiffusionTask, DiffusionTaskPool
from hydra import compose, initialize

# Initialize configuration
initialize(config_path="config", version_base=None)
args = compose(config_name="wan")

# Set model checkpoint path
args.models.ckpt_dir = "/path/to/Wan2.1-T2V-1.3B"

# Initialize the backend
chitu_init(args)
chitu_start()

# Create a generation task
user_params = DiffusionUserParams(
    role="user1",
    prompt="A cat walking on grass.",
    num_inference_steps=50,
    height=480,
    width=848,
    num_frames=81,
    guidance_scale=7.0,
)

# Add task to pool
task = DiffusionTask.from_user_request(user_params)
DiffusionTaskPool.add(task)

# Generate until completion
while not DiffusionTaskPool.all_finished():
    chitu_generate()

# Terminate backend
chitu_terminate()

print(f"✅ Video saved to: {task.buffer.save_path}")

Step 2: Run the Script

Single GPU:

bash run.sh system_config.yaml --num-nodes 1 --gpus-per-node 1 --cfp 1

Multi-GPU (Single Node):

bash run.sh system_config.yaml --num-nodes 1 --gpus-per-node 2 --cfp 2

Multi-Node SLURM:

bash run.sh system_config.yaml --num-nodes 2 --gpus-per-node 2 --cfp 2  # 4 GPUs

Step 3: View the Output

The generated video will be saved to:

./outputs/<timestamp>_<task_id>.mp4

Parameter Customization

Adjust Video Properties

user_params = DiffusionUserParams(
    prompt="A beautiful sunset over the ocean",
    height=720,          # Video height in pixels
    width=1280,          # Video width in pixels
    num_frames=121,      # Number of frames (higher = longer video)
    fps=24,              # Frames per second
)

Control Generation Quality

user_params = DiffusionUserParams(
    prompt="A dog playing in the park",
    num_inference_steps=50,  # More steps = better quality (slower)
    guidance_scale=7.0,      # Higher = more prompt adherence
)

Set Output Path

user_params = DiffusionUserParams(
    prompt="A spaceship landing on Mars",
    save_path="./my_videos/mars_landing.mp4",
)

Using Different Attention Backends

SageAttention (Faster, Quantized)

python test_generate.py \
    models.ckpt_dir=/path/to/checkpoint \
    infer.attn_type=sage

SpargeAttention (Fastest, Sparse)

python test_generate.py \
    models.ckpt_dir=/path/to/checkpoint \
    infer.attn_type=sparge

Low Memory Mode

If you encounter Out-of-Memory errors:

python test_generate.py \
    models.ckpt_dir=/path/to/checkpoint \
    infer.diffusion.low_mem_level=2

Memory levels: - 0: All models on GPU (highest performance) - 1: VAE uses tiling - 2: T5 encoder on CPU (recommended for 24GB VRAM) - 3+: DiT model on CPU (slowest but works on limited VRAM)

Batch Generation

Generate multiple videos:

prompts = [
    "A cat walking on grass",
    "A dog playing in the park",
    "A bird flying in the sky",
]

for i, prompt in enumerate(prompts):
    user_params = DiffusionUserParams(
        role=f"user{i}",
        prompt=prompt,
        save_path=f"./outputs/video_{i}.mp4",
    )
    task = DiffusionTask.from_user_request(user_params)
    DiffusionTaskPool.add(task)

# Generate all tasks
while not DiffusionTaskPool.all_finished():
    chitu_generate()

Example Outputs

Here are some example generations with different prompts:

Example 1: Nature Scene

prompt = "A serene mountain lake at sunrise, mist rising from the water"
# Resolution: 1280x720, 121 frames, 24 fps

Example 2: Urban Scene

prompt = "A busy city street at night, neon lights reflecting on wet pavement"
# Resolution: 848x480, 81 frames, 24 fps

Example 3: Abstract

prompt = "Colorful paint swirling and mixing in slow motion"
# Resolution: 720x720, 61 frames, 30 fps

Common Issues

Issue: Model Not Found

Error: FileNotFoundError: No checkpoint files found

Solution: Verify the checkpoint path is correct:

ls /path/to/checkpoint/diffusion_pytorch_model.safetensors

Issue: Out of Memory

Error: CUDA out of memory

Solutions: 1. Use lower resolution: height=480, width=848 2. Enable low memory mode: infer.diffusion.low_mem_level=2 3. Reduce batch size or frames: num_frames=61

Issue: Slow Generation

Solutions: 1. Use SageAttention: infer.attn_type=sage 2. Reduce inference steps: num_inference_steps=30 3. Enable FlexCache: flexcache='teacache'

Next Steps

Now that you've generated your first video, explore:

Getting Help

Need help?