fix: Use LONGBLOB for SQL registry proto columns on MySQL#6566
Open
nikolauspschuetz wants to merge 1 commit into
Open
fix: Use LONGBLOB for SQL registry proto columns on MySQL#6566nikolauspschuetz wants to merge 1 commit into
nikolauspschuetz wants to merge 1 commit into
Conversation
cef62af to
b3e3790
Compare
The SQL registry stores each Feast object as a serialized protobuf in a binary column. On MySQL/MariaDB, SQLAlchemy's LargeBinary maps to BLOB, which caps at 64 KB. A single FeatureView proto routinely exceeds that, so MySQL silently truncates the write and the registry later fails to load with a protobuf DecodeError (e.g. `feast serve` failing to start). PostgreSQL and SQLite were never affected. Introduce a dialect-aware `ProtoBytes` type that emits LONGBLOB on MySQL and MariaDB while keeping LargeBinary's default mapping on every other dialect, and apply it to all binary proto/metadata columns. The variants are chained (not variadic) so the expression also works on SQLAlchemy 1.4.x, which Feast still supports. `metadata.create_all` only creates missing tables, so existing MySQL registries are not migrated automatically. Add a best-effort startup warning that names any columns still typed BLOB and points operators at the documented ALTER TABLE migration, and document the migration (with metadata-lock / online-schema-change guidance) in the SQL registry reference. Tests assert the compiled DDL emits LONGBLOB on MySQL and MariaDB and BLOB on SQLite, and cover the startup-warning paths. Signed-off-by: Nikolaus Schuetz <nikolauspschuetz@gmail.com>
b3e3790 to
f45ed4d
Compare
Author
|
@franciscojavierarceo @ntkathole — would appreciate a review when you have a moment. 🙏 This fixes #5800: on MySQL/MariaDB, SQLAlchemy's Approach:
Happy to adjust anything — including whether the startup check should hard-fail (with an env-var escape hatch) instead of logging, if you'd prefer that posture. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The SQL registry stores each Feast object as a serialized protobuf in a binary column. On MySQL/MariaDB, SQLAlchemy's
LargeBinarymaps toBLOB, which caps at 64 KB. A singleFeatureViewproto routinely exceeds that, so MySQL silently truncates the write and the registry later fails to load with a protobufDecodeError(e.g.feast servefailing to start). PostgreSQL and SQLite were never affected.Fixes #5800.
How
ProtoBytestype that emitsLONGBLOBon MySQL and MariaDB while keepingLargeBinary's default mapping on every other dialect (BLOBon SQLite,BYTEAon PostgreSQL), and apply it to all binary proto/metadata columns.dialect.name == "mariadb".SQLAlchemy>=1.4.19).metadata.create_allonly creates missing tables, so existing MySQL registries are not migrated automatically. Added a best-effort startup check (_warn_if_narrow_blob_columns) that, on MySQL/MariaDB, logs an error naming any columns still typedBLOBand points at the documented migration. It is logged at ERROR (so monitoring pipelines that filter below ERROR still catch it) but deliberately does not refuse to start — a registry whose protos all fit in 64 KB is unaffected and a routine upgrade should not break it. (Happy to switch to fail-fast-with-escape-hatch if maintainers prefer.)pt-online-schema-change/gh-ost). Note:BLOB→LONGBLOBis a column type change, which InnoDB performs withALGORITHM=COPY(rebuild) —INPLACEis not supported.Tests
tests/unit/infra/registry/test_sql_registry.py): assert the compiled DDL emitsLONGBLOBon MySQL/MariaDB,BLOBon SQLite,BYTEAon PostgreSQL; cover the startup-check paths (errors on stale, silent when migrated, skips non-MySQL, never raises on query failure, runs on both read+write engines).tests/integration/registration/test_universal_registry.py):test_apply_feature_view_large_proto_roundtrip_mysqlapplies aFeatureViewwith a >64 KB proto on the livemysql_registry/mysql_registry_asyncfixtures and reads it back from the DB (allow_cache=False), asserting byte-identity — the regression guard for the truncation bug.Docs / agent guidance
docs/reference/registries/sql.md— new "serialized-proto columns use LONGBLOB" section + migration runbook.feast-componentsrule pair,feast-architectureandfeast-testingSKILLs) so theProtoBytesconvention and test patterns are discoverable.