Feat/t5 encoder gguf support#9324
Open
Pfannkuchensack wants to merge 7 commits into
Open
Conversation
Add loading support for single-file GGUF T5 encoders (e.g. city96/t5-v1_1-xxl-encoder-gguf, llama.cpp naming), mirroring the existing Qwen3 GGUF encoder path. - Add T5Encoder_GGUF_Config (single-file, detects enc.blk.* keys + GGML tensors) and register it in the AnyModelConfig union - Add T5EncoderGGUFModel loader: remaps llama.cpp T5 keys to transformers naming, infers T5Config from tensor shapes, dequantizes token/relative-attention-bias embeddings, ties embed_tokens to shared - Work around transformers T5DenseGatedActDense casting activations to the uint8 GGML weight dtype (int8 guard doesn't cover uint8), which would corrupt the feed-forward output - Reject T5 encoders in the Qwen3 GGUF/checkpoint configs so the two stay mutually exclusive (both carry token_embd.weight; the factory resolves multi-matches from a set, so this is not order-safe) Reuse the vendored T5-XXL tokenizer instead of downloading it: move it out of Anima into a neutral invokeai/backend/t5 module shared by Anima and the GGUF loader, and update the package-data path accordingly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Feature: Load GGUF-quantized T5 text encoders, and show/recall the T5 encoder in image metadata.
Adds support for single-file GGUF T5 text encoders (e.g. city96/t5-v1_1-xxl-encoder-gguf, llama.cpp
enc.blk.*naming) so users can run FLUX/SD3 with a small quantized T5 instead of the full ~9GB encoder. The GGUF infrastructure already existed (used by FLUX/Qwen3/Z-Image); this wires T5 into it, mirroring the existing Qwen3 GGUF encoder path.Backend
T5Encoder_GGUF_Config(single-file; detectsenc.blk.*keys + GGML tensors) registered in theAnyModelConfigunion.T5EncoderGGUFModelloader: remaps llama.cpp T5 keys → transformers T5, infersT5Configfrom tensor shapes, keeps transformer weights asGGMLTensors for the autocast cache, eagerly dequantizes the token/relative-attention-bias embeddings (embedding lookups can't run on quantized tensors) and tiesencoder.embed_tokens→shared.T5DenseGatedActDense.forwardcasts activations toself.wo.weight.dtypeunless it'storch.int8(a bitsandbytes guard). GGML weights aretorch.uint8, which slips past the guard and corrupts the feed-forward output. Worked around by rebinding the FF forward to only cast for floating-pointwoweights.token_embd.weight; the config factory resolves multi-matches from aset, so they must be mutually exclusive — disambiguated on theenc.blk.*prefix).backend/animainto a neutral sharedbackend/t5module used by both Anima and the GGUF loader (updatedpyproject.tomlpackage-data accordingly).Frontend
t5_encoder), but it wasn't shown in the Recall Parameters tab. Added aT5EncoderModelmetadata handler (mirrorsQwen3EncoderModel) and registered it in both the handler registry (for "Recall All") and the metadata viewer's display list, plus an i18n label.Related Issues / Discussions
https://discord.com/channels/1020123559063990373/1149510134058471514/1521658213836001291
QA Instructions
city96/t5-v1_1-xxl-encoder-Q6_K.gguf) and install it via the Model Manager — it should be detected as a T5 Encoder (GGUF) model.Q3_K_S) also loads and generates.Validated locally on
Q3_K_SandQ6_K: unique model classification, correct config inference (T5 v1.1 XXL), finite bf16 forward on CUDA, and cross-quant cosine-similarity 0.94–0.999 on content tokens (confirms the key mapping). Backend config/probe tests and frontend metadata tests pass.Merge Plan
Standard merge. No DB schema or redux migration changes.
pyproject.tomlpackage-data path changed (invokeai.backend.anima→invokeai.backend.t5) — a clean build picks up the vendored tokenizer at its new location.Checklist
What's Newcopy (if doing a release after this PR)