Skip to content

fix: Use LONGBLOB for SQL registry proto columns on MySQL#6566

Open
nikolauspschuetz wants to merge 1 commit into
feast-dev:masterfrom
nikolauspschuetz:fix/sql-registry-mysql-longblob-5800
Open

fix: Use LONGBLOB for SQL registry proto columns on MySQL#6566
nikolauspschuetz wants to merge 1 commit into
feast-dev:masterfrom
nikolauspschuetz:fix/sql-registry-mysql-longblob-5800

Conversation

@nikolauspschuetz

Copy link
Copy Markdown

What

The SQL registry stores each Feast object as a serialized protobuf in a binary column. On MySQL/MariaDB, SQLAlchemy's LargeBinary maps to BLOB, which caps at 64 KB. A single FeatureView proto routinely exceeds that, so MySQL silently truncates the write and the registry later fails to load with a protobuf DecodeError (e.g. feast serve failing to start). PostgreSQL and SQLite were never affected.

Fixes #5800.

How

  • Introduce a dialect-aware ProtoBytes type that emits LONGBLOB on MySQL and MariaDB while keeping LargeBinary's default mapping on every other dialect (BLOB on SQLite, BYTEA on PostgreSQL), and apply it to all binary proto/metadata columns.
    • MariaDB is registered separately because SQLAlchemy 2.x reports dialect.name == "mariadb".
    • The variants are chained (not variadic) so the expression also works on SQLAlchemy 1.4.x, which Feast still supports (SQLAlchemy>=1.4.19).
  • metadata.create_all only creates missing tables, so existing MySQL registries are not migrated automatically. Added a best-effort startup check (_warn_if_narrow_blob_columns) that, on MySQL/MariaDB, logs an error naming any columns still typed BLOB and points at the documented migration. It is logged at ERROR (so monitoring pipelines that filter below ERROR still catch it) but deliberately does not refuse to start — a registry whose protos all fit in 64 KB is unaffected and a routine upgrade should not break it. (Happy to switch to fail-fast-with-escape-hatch if maintainers prefer.)
  • Documented the migration in the SQL registry reference, including the metadata-lock risk, a mandatory "stop writers first" checklist, and online-schema-change guidance (pt-online-schema-change / gh-ost). Note: BLOBLONGBLOB is a column type change, which InnoDB performs with ALGORITHM=COPY (rebuild) — INPLACE is not supported.

Tests

  • Unit (tests/unit/infra/registry/test_sql_registry.py): assert the compiled DDL emits LONGBLOB on MySQL/MariaDB, BLOB on SQLite, BYTEA on PostgreSQL; cover the startup-check paths (errors on stale, silent when migrated, skips non-MySQL, never raises on query failure, runs on both read+write engines).
  • Integration (tests/integration/registration/test_universal_registry.py): test_apply_feature_view_large_proto_roundtrip_mysql applies a FeatureView with a >64 KB proto on the live mysql_registry / mysql_registry_async fixtures and reads it back from the DB (allow_cache=False), asserting byte-identity — the regression guard for the truncation bug.

Docs / agent guidance

  • docs/reference/registries/sql.md — new "serialized-proto columns use LONGBLOB" section + migration runbook.
  • Updated the in-repo agent rules/skills (feast-components rule pair, feast-architecture and feast-testing SKILLs) so the ProtoBytes convention and test patterns are discoverable.

@nikolauspschuetz nikolauspschuetz requested a review from a team as a code owner June 29, 2026 02:29
@nikolauspschuetz nikolauspschuetz force-pushed the fix/sql-registry-mysql-longblob-5800 branch 3 times, most recently from cef62af to b3e3790 Compare June 29, 2026 06:23
The SQL registry stores each Feast object as a serialized protobuf in a
binary column. On MySQL/MariaDB, SQLAlchemy's LargeBinary maps to BLOB,
which caps at 64 KB. A single FeatureView proto routinely exceeds that, so
MySQL silently truncates the write and the registry later fails to load
with a protobuf DecodeError (e.g. `feast serve` failing to start).
PostgreSQL and SQLite were never affected.

Introduce a dialect-aware `ProtoBytes` type that emits LONGBLOB on MySQL
and MariaDB while keeping LargeBinary's default mapping on every other
dialect, and apply it to all binary proto/metadata columns. The variants
are chained (not variadic) so the expression also works on SQLAlchemy
1.4.x, which Feast still supports.

`metadata.create_all` only creates missing tables, so existing MySQL
registries are not migrated automatically. Add a best-effort startup
warning that names any columns still typed BLOB and points operators at
the documented ALTER TABLE migration, and document the migration (with
metadata-lock / online-schema-change guidance) in the SQL registry
reference.

Tests assert the compiled DDL emits LONGBLOB on MySQL and MariaDB and
BLOB on SQLite, and cover the startup-warning paths.

Signed-off-by: Nikolaus Schuetz <nikolauspschuetz@gmail.com>
@nikolauspschuetz nikolauspschuetz force-pushed the fix/sql-registry-mysql-longblob-5800 branch from b3e3790 to f45ed4d Compare June 29, 2026 06:33
@nikolauspschuetz

Copy link
Copy Markdown
Author

@franciscojavierarceo @ntkathole — would appreciate a review when you have a moment. 🙏

This fixes #5800: on MySQL/MariaDB, SQLAlchemy's LargeBinary maps to BLOB (64 KB cap), so a serialized FeatureView proto over 64 KB is silently truncated on write and later fails to load with a DecodeError (e.g. feast serve won't start). PostgreSQL/SQLite are unaffected.

Approach:

  • Dialect-aware ProtoBytes type → LONGBLOB on MySQL/MariaDB, default mapping elsewhere; applied to all serialized-proto columns. Variants are chained (not variadic) to stay compatible with the supported SQLAlchemy 1.4.x floor.
  • Best-effort startup diagnostic that logs (at ERROR) any registry proto column still typed BLOB and points at a documented ALTER TABLE migration — create_all never widens existing columns, so existing MySQL registries aren't auto-migrated.
  • Tests: DDL-compile unit tests (LONGBLOB on MySQL/MariaDB, BLOB/BYTEA elsewhere) + a live >64 KB round-trip integration test on the mysql_registry fixtures + diagnostic-path coverage.
  • Docs: SQL-registry reference gains a migration runbook (lock-safety + online-schema-change guidance). Agent skills/rules updated per the repo's .claude/rules convention.

Happy to adjust anything — including whether the startup check should hard-fail (with an env-var escape hatch) instead of logging, if you'd prefer that posture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feast SQL registry fails to start due to truncated FeatureView protobuf in MySQL (BLOB size limit)

1 participant