Feat/t5 encoder gguf support by Pfannkuchensack · Pull Request #9324 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-07-01T01:11:07Z

Summary

Feature: Load GGUF-quantized T5 text encoders, and show/recall the T5 encoder in image metadata.

Adds support for single-file GGUF T5 text encoders (e.g. city96/t5-v1_1-xxl-encoder-gguf, llama.cpp enc.blk.* naming) so users can run FLUX/SD3 with a small quantized T5 instead of the full ~9GB encoder. The GGUF infrastructure already existed (used by FLUX/Qwen3/Z-Image); this wires T5 into it, mirroring the existing Qwen3 GGUF encoder path.

Backend

New T5Encoder_GGUF_Config (single-file; detects enc.blk.* keys + GGML tensors) registered in the AnyModelConfig union.
New T5EncoderGGUFModel loader: remaps llama.cpp T5 keys → transformers T5, infers T5Config from tensor shapes, keeps transformer weights as GGMLTensors for the autocast cache, eagerly dequantizes the token/relative-attention-bias embeddings (embedding lookups can't run on quantized tensors) and ties encoder.embed_tokens → shared.
transformers T5 gotcha: T5DenseGatedActDense.forward casts activations to self.wo.weight.dtype unless it's torch.int8 (a bitsandbytes guard). GGML weights are torch.uint8, which slips past the guard and corrupts the feed-forward output. Worked around by rebinding the FF forward to only cast for floating-point wo weights.
Made the Qwen3 GGUF/checkpoint configs reject T5 encoders (both carry token_embd.weight; the config factory resolves multi-matches from a set, so they must be mutually exclusive — disambiguated on the enc.blk.* prefix).
Reuse the T5-XXL tokenizer already vendored in the repo instead of downloading it — moved it out of backend/anima into a neutral shared backend/t5 module used by both Anima and the GGUF loader (updated pyproject.toml package-data accordingly).

Frontend

FLUX/SD3 images already store the T5 encoder in metadata (t5_encoder), but it wasn't shown in the Recall Parameters tab. Added a T5EncoderModel metadata handler (mirrors Qwen3EncoderModel) and registered it in both the handler registry (for "Recall All") and the metadata viewer's display list, plus an i18n label.

Related Issues / Discussions

https://discord.com/channels/1020123559063990373/1149510134058471514/1521658213836001291

QA Instructions

Download a T5 GGUF encoder (e.g. city96/t5-v1_1-xxl-encoder-Q6_K.gguf) and install it via the Model Manager — it should be detected as a T5 Encoder (GGUF) model.
Build a FLUX text-to-image graph using it as the T5 encoder and generate an image; output should be coherent.
Confirm a lighter quant (e.g. Q3_K_S) also loads and generates.
Open a FLUX/SD3 image's metadata → Recall Parameters tab: a T5 Encoder row appears with a recall button; clicking it (and "Recall All") sets the T5 encoder in the generation settings.

Validated locally on Q3_K_S and Q6_K: unique model classification, correct config inference (T5 v1.1 XXL), finite bf16 forward on CUDA, and cross-quant cosine-similarity 0.94–0.999 on content tokens (confirms the key mapping). Backend config/probe tests and frontend metadata tests pass.

Merge Plan

Standard merge. No DB schema or redux migration changes. pyproject.toml package-data path changed (invokeai.backend.anima → invokeai.backend.t5) — a clean build picks up the vendored tokenizer at its new location.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration — N/A (no slice shape changes)
Documentation added / updated (if applicable) — N/A
Updated What's New copy (if doing a release after this PR)

Add loading support for single-file GGUF T5 encoders (e.g. city96/t5-v1_1-xxl-encoder-gguf, llama.cpp naming), mirroring the existing Qwen3 GGUF encoder path. - Add T5Encoder_GGUF_Config (single-file, detects enc.blk.* keys + GGML tensors) and register it in the AnyModelConfig union - Add T5EncoderGGUFModel loader: remaps llama.cpp T5 keys to transformers naming, infers T5Config from tensor shapes, dequantizes token/relative-attention-bias embeddings, ties embed_tokens to shared - Work around transformers T5DenseGatedActDense casting activations to the uint8 GGML weight dtype (int8 guard doesn't cover uint8), which would corrupt the feed-forward output - Reject T5 encoders in the Qwen3 GGUF/checkpoint configs so the two stay mutually exclusive (both carry token_embd.weight; the factory resolves multi-matches from a set, so this is not order-safe) Reuse the vendored T5-XXL tokenizer instead of downloading it: move it out of Anima into a neutral invokeai/backend/t5 module shared by Anima and the GGUF loader, and update the package-data path accordingly.

Pfannkuchensack added 4 commits July 1, 2026 02:45

Chore Typegen + Openapi

efe6d1a

Add T5 Recalling

44991ab

Add 2 gguf T5 to the Starter Models

452ba3c

Pfannkuchensack requested review from JPPhoto, blessedcoolant, dunkeroni and lstein as code owners July 1, 2026 01:11

github-actions Bot added python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-tests PRs that change python tests python-deps PRs that change python dependencies labels Jul 1, 2026

Pfannkuchensack added 3 commits July 1, 2026 21:47

Merge branch 'main' into feat/t5-encoder-gguf-support

818bca9

Chore Ruff

4d3643e

Chore Typegen

7c4b12b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/t5 encoder gguf support#9324

Feat/t5 encoder gguf support#9324
Pfannkuchensack wants to merge 7 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/t5-encoder-gguf-support

Pfannkuchensack commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Pfannkuchensack commented Jul 1, 2026

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant