Skip to content

Feat/t5 encoder gguf support#9324

Open
Pfannkuchensack wants to merge 7 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/t5-encoder-gguf-support
Open

Feat/t5 encoder gguf support#9324
Pfannkuchensack wants to merge 7 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/t5-encoder-gguf-support

Conversation

@Pfannkuchensack

Copy link
Copy Markdown
Collaborator

Summary

Feature: Load GGUF-quantized T5 text encoders, and show/recall the T5 encoder in image metadata.

Adds support for single-file GGUF T5 text encoders (e.g. city96/t5-v1_1-xxl-encoder-gguf, llama.cpp enc.blk.* naming) so users can run FLUX/SD3 with a small quantized T5 instead of the full ~9GB encoder. The GGUF infrastructure already existed (used by FLUX/Qwen3/Z-Image); this wires T5 into it, mirroring the existing Qwen3 GGUF encoder path.

Backend

  • New T5Encoder_GGUF_Config (single-file; detects enc.blk.* keys + GGML tensors) registered in the AnyModelConfig union.
  • New T5EncoderGGUFModel loader: remaps llama.cpp T5 keys → transformers T5, infers T5Config from tensor shapes, keeps transformer weights as GGMLTensors for the autocast cache, eagerly dequantizes the token/relative-attention-bias embeddings (embedding lookups can't run on quantized tensors) and ties encoder.embed_tokensshared.
  • transformers T5 gotcha: T5DenseGatedActDense.forward casts activations to self.wo.weight.dtype unless it's torch.int8 (a bitsandbytes guard). GGML weights are torch.uint8, which slips past the guard and corrupts the feed-forward output. Worked around by rebinding the FF forward to only cast for floating-point wo weights.
  • Made the Qwen3 GGUF/checkpoint configs reject T5 encoders (both carry token_embd.weight; the config factory resolves multi-matches from a set, so they must be mutually exclusive — disambiguated on the enc.blk.* prefix).
  • Reuse the T5-XXL tokenizer already vendored in the repo instead of downloading it — moved it out of backend/anima into a neutral shared backend/t5 module used by both Anima and the GGUF loader (updated pyproject.toml package-data accordingly).

Frontend

  • FLUX/SD3 images already store the T5 encoder in metadata (t5_encoder), but it wasn't shown in the Recall Parameters tab. Added a T5EncoderModel metadata handler (mirrors Qwen3EncoderModel) and registered it in both the handler registry (for "Recall All") and the metadata viewer's display list, plus an i18n label.

Related Issues / Discussions

https://discord.com/channels/1020123559063990373/1149510134058471514/1521658213836001291

QA Instructions

  1. Download a T5 GGUF encoder (e.g. city96/t5-v1_1-xxl-encoder-Q6_K.gguf) and install it via the Model Manager — it should be detected as a T5 Encoder (GGUF) model.
  2. Build a FLUX text-to-image graph using it as the T5 encoder and generate an image; output should be coherent.
  3. Confirm a lighter quant (e.g. Q3_K_S) also loads and generates.
  4. Open a FLUX/SD3 image's metadata → Recall Parameters tab: a T5 Encoder row appears with a recall button; clicking it (and "Recall All") sets the T5 encoder in the generation settings.

Validated locally on Q3_K_S and Q6_K: unique model classification, correct config inference (T5 v1.1 XXL), finite bf16 forward on CUDA, and cross-quant cosine-similarity 0.94–0.999 on content tokens (confirms the key mapping). Backend config/probe tests and frontend metadata tests pass.

Merge Plan

Standard merge. No DB schema or redux migration changes. pyproject.toml package-data path changed (invokeai.backend.animainvokeai.backend.t5) — a clean build picks up the vendored tokenizer at its new location.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration — N/A (no slice shape changes)
  • Documentation added / updated (if applicable) — N/A
  • Updated What's New copy (if doing a release after this PR)

Add loading support for single-file GGUF T5 encoders (e.g.
city96/t5-v1_1-xxl-encoder-gguf, llama.cpp naming), mirroring the
existing Qwen3 GGUF encoder path.

- Add T5Encoder_GGUF_Config (single-file, detects enc.blk.* keys +
  GGML tensors) and register it in the AnyModelConfig union
- Add T5EncoderGGUFModel loader: remaps llama.cpp T5 keys to
  transformers naming, infers T5Config from tensor shapes, dequantizes
  token/relative-attention-bias embeddings, ties embed_tokens to shared
- Work around transformers T5DenseGatedActDense casting activations to
  the uint8 GGML weight dtype (int8 guard doesn't cover uint8), which
  would corrupt the feed-forward output
- Reject T5 encoders in the Qwen3 GGUF/checkpoint configs so the two
  stay mutually exclusive (both carry token_embd.weight; the factory
  resolves multi-matches from a set, so this is not order-safe)

Reuse the vendored T5-XXL tokenizer instead of downloading it: move it
out of Anima into a neutral invokeai/backend/t5 module shared by Anima
and the GGUF loader, and update the package-data path accordingly.
@github-actions github-actions Bot added python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-tests PRs that change python tests python-deps PRs that change python dependencies labels Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies python-tests PRs that change python tests Root

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant