Chore(release): v6.13.5 (DO NOT MERGE) by lstein · Pull Request #9325 · invoke-ai/InvokeAI

lstein · 2026-07-01T02:26:40Z

Summary

This is the working branch for v6.13.5. Do not merge until after the final release.

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

@Pfannkuchensack

…room before decode/encode (#9305) * fix(qwen): estimate VAE working memory so the cache frees room before decode/encode The Qwen Image l2i/i2l invocations called `model_on_device()` without a `working_mem_bytes` estimate, unlike the SD/SDXL path. The model cache therefore only reserved the default `device_working_mem_gb` and never evicted the resident transformer/text encoder before the VAE decode. On a near-full card (e.g. Qwen Image Edit Q8_0 with transformer + text encoder resident) the decode then OOMs trying to allocate its working set into the fragmented remainder. Add `estimate_vae_working_memory_qwen_image()` and pass it into both the decode and encode paths so the cache makes room (evicting other models when needed) before the operation runs. The constant is calibrated against a measured decode on an AMD W7900: at 1248x832 the decode grew CUDA reserved memory by ~10.06 GiB (implied constant ~5082), rounded up to 5500 for headroom. It tracks peak *reserved* (not just allocated) memory so that whenever the cache declines to free room (free >= estimate) the decode is still guaranteed to fit. Encode uses ~half, matching the other estimators (not independently measured). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(qwen): cover VAE working-memory estimate is passed to cache Address review feedback from @Pfannkuchensack on #9305: - Add test_qwen_image_working_memory.py mirroring the z-image pattern, asserting both decode and encode paths call model_on_device with the estimated working_mem_bytes (regression guard for the OOM fix). - Clarify the qwen estimator comment: the encode constant is not independently measured (half of decode, matching siblings' ratio) and should be recalibrated against a measured encode. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(qwen): recalibrate VAE working-memory constants from a measured grid Add scripts/calibrate_qwen_vae_working_memory.py, a backend-portable (CUDA/ROCm) harness that measures peak reserved-memory growth for VAE decode/encode across a resolution grid, one fresh subprocess per point. Calibrating on an AMD W7900 (fp16) showed the encode constant was wrong: the previous 2750 ("half of decode") under-estimated by ~2x at every measured resolution, the exact OOM mode Qwen Image Edit (which encodes a real image) would hit. Raise encode 2750 -> 6300. Decode 5500 is confirmed safe across the full 512^2..2048^2 range and left unchanged. The grid also showed memory is super-linear in area above ~1792^2 (an attention term) and non-monotonic (likely an SDPA-backend crossover on ROCm); both documented in the estimator. Constants are the conservative ROCm side and will be max-merged with a pending NVIDIA/CUDA run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(qwen): branch VAE working-memory constants by backend (ROCm vs CUDA) Calibrating the same fp16 grid on an NVIDIA card showed CUDA reserves ~2x (decode) to ~4x (encode) less than ROCm: the Qwen VAE is attention- heavy, and CUDA's Flash/efficient attention is O(area) and flat while the ROCm math-attention fallback is O(area^2). The backends diverge far more than any headroom, so a single constant either under-estimates on ROCm (OOM) or massively over-budgets CUDA (needless eviction). Select constants via torch.version.hip: decode: ROCm 5500 / CUDA 2900 encode: ROCm 6300 / CUDA 1600 Each verified to cover its measured grid (19 points/backend) with ~8% headroom. The CUDA run also confirms the linear model holds with Flash attention (the ROCm super-linear/non-monotonic behavior is a math- attention artifact), and that "encode is half of decode" is CUDA-only. Add parametrized tests asserting the constant selected for each (operation, backend) so a refactor can't silently swap them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(backend): ruff * calibrate: support single-file Qwen Image VAE checkpoints The calibration script only loaded the Qwen VAE from a diffusers directory via from_pretrained, so passing a single .safetensors file failed. Add _load_vae, which loads a directory as before and handles a single-file checkpoint by loading the state dict directly: a strict load for the diffusers layout, falling back to convert_wan_vae_to_diffusers for the original Qwen-Image/Wan release layout (downsamples/residual/ time_conv keys) before retrying. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Alexander Eichhorn <alex@eichhorn.dev>

lstein and others added 2 commits June 29, 2026 20:48

chore(release): bump version to v6.13.5.rc1

b70f81f

lstein requested review from JPPhoto, Pfannkuchensack, blessedcoolant and dunkeroni as code owners July 1, 2026 02:26

lstein marked this pull request as draft July 1, 2026 02:26

github-actions Bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files python-tests PRs that change python tests labels Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chore(release): v6.13.5 (DO NOT MERGE)#9325

Chore(release): v6.13.5 (DO NOT MERGE)#9325
lstein wants to merge 2 commits into
mainfrom
lstein/chore/v6.13.5

lstein commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lstein commented Jul 1, 2026

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant