Advanced Features¶

Explore Smart-Diffusion's advanced capabilities for optimal performance.

FlexCache¶

FlexCache enables feature reuse across denoising steps, providing significant speedup with minimal quality loss.

Unified Parameters¶

Use the dedicated FlexCache parameter group:

strategy: teacache, pab, ditango
cache_ratio: 0 to 1 quality-efficiency tradeoff (0 quality-first, 1 speed-first)
warmup: first N steps always full compute
cooldown: last N steps always full compute

Recommended API:

from chitu_diffusion.task import DiffusionUserParams, FlexCacheParams

user_params = DiffusionUserParams(
    prompt="A cat walking on grass",
    num_inference_steps=50,
    flexcache_params=FlexCacheParams(
        strategy="teacache",
        cache_ratio=0.4,
        warmup=5,
        cooldown=5,
    ),
)

TeaCache¶

Temporal adaptive caching strategy from CVPR24.

Usage:

user_params = DiffusionUserParams(
    prompt="A cat walking on grass",
    flexcache='teacache'
)

How it works: - Monitors feature change rates across denoising steps - Reuses features when changes are minimal - Typically provides 30-40% speedup

Pyramid Attention Broadcast (PAB)¶

Hierarchical attention broadcasting from ICLR25.

Usage:

user_params = DiffusionUserParams(
    prompt="A cat walking on grass",
    flexcache='PAB'
)

How it works: - Computes attention at coarse scales - Broadcasts to finer scales - Typically provides 40-50% speedup

DiTango¶

ASE-guided grouped compute/reuse strategy.

Usage:

user_params = DiffusionUserParams(
    prompt="A cat walking on grass",
    flexcache='ditango'
)

How it works: - Estimates group-level reuse confidence from ASE - Recomputes groups when confidence is insufficient - Uses step-level anchor gating to control drift

Current behavior notes: - Local partition is always computed each step and merged separately. - Anchor gate and per-group compute/reuse plan are synchronized across CFG positive/negative branches. - cache_ratio drives both anchor aggressiveness and global ASE-threshold quantile update. - Implementation path: chitu_diffusion/flex_cache/strategy/ditango/ditango.py. - Decision visualization output: <output_dir>/ditango_policy_step_layer_group.ppm.

Context Parallelism¶

Split long sequences across multiple GPUs.

Configuration:

python test_generate.py \
    infer.diffusion.cp_size=2 \
    infer.diffusion.up_limit=81

Benefits: - Handle longer videos (more frames) - Linear memory scaling - Near-linear speedup

CFG Parallelism¶

Split positive/negative prompts across 2 GPUs.

Automatic when: - world_size >= 2 - guidance_scale > 0

Benefits: - 2x speedup for CFG - No extra memory overhead

Custom Prompts¶

Effective Prompt Engineering¶

Good prompts: - "A fluffy cat walking slowly through tall green grass on a sunny day" - "Close-up of ocean waves crashing on a rocky shore at sunset"

Less effective: - "cat grass" (too brief) - "A cat walking on grass and also playing with a ball while a dog runs" (too complex)

Negative Prompts (Future)¶

Support for negative prompts coming soon.

Advanced Features¶

FlexCache¶

Unified Parameters¶

TeaCache¶

Pyramid Attention Broadcast (PAB)¶

DiTango¶

Context Parallelism¶

CFG Parallelism¶

Custom Prompts¶

Effective Prompt Engineering¶

Negative Prompts (Future)¶

See Also¶