[UT][VL] Refresh TPC-H q19 plan stability golden file#12374
Draft
brijrajk wants to merge 1 commit into
Draft
Conversation
The ExprId normalizer in GlutenPlanStabilitySuite uses regex `#\d+` which inadvertently matches TPC-H string literals such as Brand#11, Brand#12, Brand#13 (p_brand values in q19's filter). Over the 264 commits since the golden file was added in apache#11805, new optimizer rules shifted the ExprId counter so Brand#12 now normalizes to Brand#6 and _pre_1#14 to _pre_1#13, causing a spurious plan mismatch. Regenerated by running GlutenTPCHPlanStabilitySuite with SPARK_GENERATE_GOLDEN_FILES=1. Only q19/explain.txt changes; simplified.txt and all other queries are unaffected. Verified: q19 fails on main without this fix (21/22); passes with it (22/22). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #12375.
Problem
GlutenTPCHPlanStabilitySuite→tpch/q19has been failing inspark-test-spark40CI runs for PRs that touch Velox backend Scala files.Root cause
GlutenPlanStabilitySuite.glutenNormalizeIds()uses the regex(?<prefix>(?<!id=)#)\\d+L?which matches any#<number>in the explain text — including TPC-H string literals. Thep_brandfilter in q19 uses valuesBrand#11,Brand#12,Brand#13(actual TPC-H spec data values). These appear unquoted in the explain output:The normalizer incorrectly treats
#12as an ExprId and remaps it sequentially. The suite code itself warns about this at line 67–68:What changed
The golden file was committed in #11805 (
c37fee4e5, 2026-03-24). Since then 264 commits landed onmain, shifting the ExprId counter.Brand#12now normalizes toBrand#6and_pre_1#14shifts to_pre_1#13.Exact diff (original vs current):
Evidence that this is pre-existing
Ran
GlutenTPCHPlanStabilitySuiteonmainat commit6097b59a6(2026-06-25, [MINOR][VL] Build Arrow 18 with patch for Power #12344) — without any pending PR applied:Then regenerated with
SPARK_GENERATE_GOLDEN_FILES=1and re-ran:Only
q19/explain.txtchanged.simplified.txtand all other queries (q1–q18, q20–q22) are unaffected.Why it only surfaces on PRs touching Velox backend Scala files
spark-test-spark40is only triggered when Velox backend Scala files are modified. Most PRs touch native C++ code, docs, or non-Velox modules and never trigger this check.Fix
Regenerated
q19/explain.txtby runningGlutenTPCHPlanStabilitySuitewithSPARK_GENERATE_GOLDEN_FILES=1 SPARK_ANSI_SQL_MODE=false.A proper long-term fix (tracked in #12375) would be to make
glutenNormalizeIdsskip#Noccurrences inside string literal contexts.Impact
gluten-ut/spark40/src/test/resources/backends-velox/gluten-tpch-plan-stability/q19/explain.txtchanges