FlexCache¶

FlexCache is Smart-Diffusion's unified feature reuse acceleration framework.

Unified User API¶

FlexCache is configured with one dedicated parameter group:

strategy: teacache, pab, or ditango
cache_ratio: required, range [0, 1]
warmup: required, first warmup denoising steps always run full compute
cooldown: required, last cooldown denoising steps always run full compute

cache_ratio is the only recommended user tuning knob:

0.0: quality-first, conservative cache reuse
1.0: speed-first, aggressive cache reuse

Most users should only tune cache_ratio. warmup and cooldown are advanced controls.

Strategy Mapping¶

The same cache_ratio scale is mapped to one strategy-specific core parameter:

Strategy	Internal control	Mapping direction
TeaCache	`teacache_thresh`	higher ratio -> larger threshold -> more reuse
PAB	`skip_self_range`	higher ratio -> larger skip range -> more reuse
DiTango	`anchor_rel_err_threshold` + global `ase_threshold` quantile	higher ratio -> stronger reuse preference

Other strategy internals are fixed by design for API consistency.

DiTango runtime notes: - cache_ratio is used by both anchor gating and global ASE-threshold quantile update. - Local partition is always recomputed each step for stability. - The strategy implementation moved to chitu_diffusion/flex_cache/strategy/ditango/ditango.py. - A merged decision map is written to <output_dir>/ditango_policy_step_layer_group.ppm.

Usage¶

Recommended style¶

from chitu_diffusion.task import DiffusionUserParams, FlexCacheParams

params = DiffusionUserParams(
    prompt="A cat walking on grass",
    num_inference_steps=50,
    flexcache_params=FlexCacheParams(
        strategy="ditango",
        cache_ratio=0.45,
        warmup=5,
        cooldown=5,
    ),
)

Legacy compatible style¶

params = DiffusionUserParams(
    prompt="A cat walking on grass",
    flexcache="teacache",
)

Disable FlexCache¶

params = DiffusionUserParams(
    prompt="A cat walking on grass",
    flexcache=None,
)

System Switch¶

Enable FlexCache globally at startup:

infer:
  enable_flexcache: true

If enable_flexcache is false, task-level FlexCache requests are ignored.

Parameter Validation¶

FlexCache validates parameters during task preparation:

cache_ratio must be in [0, 1]
warmup >= 0
cooldown >= 0
warmup + cooldown < num_inference_steps