Skip to content

[GLUTEN-11539][VL] Improve error message when spark.io.compression.codec=none#12360

Open
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/11539-none-codec-message
Open

[GLUTEN-11539][VL] Improve error message when spark.io.compression.codec=none#12360
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/11539-none-codec-message

Conversation

@brijrajk

@brijrajk brijrajk commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Follow-up to #12333.

When spark.io.compression.codec=none, the error message from #12333 told users to configure spark.gluten.sql.columnar.shuffle.codec, which is misleading if their intent is to disable shuffle compression.

none is special-cased to also point users to spark.shuffle.compress=false, which is the correct knob for disabling Gluten native shuffle compression (already handled in ColumnarShuffleWriter and ColumnarBatchSerializer).

Before (from #12333):

Gluten shuffle only supports lz4, zstd. none is not supported.
You may configure spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.

After:

Gluten shuffle does not support codec 'none'. To disable shuffle compression,
set spark.shuffle.compress=false. To use a supported codec, set
spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.

Files changed

  • GlutenShuffleUtils.scala — special-cases none with a more actionable error message
  • MiscOperatorSuite.scala — updates the none regression test assertion to match

How was this patch tested?

MiscOperatorSuite — 97/97 passed locally (Spark 4.0, Velox backend).


Was this patch authored or co-authored using generative AI tooling?

Yes. Claude Code (claude-sonnet-4-6) was used as an AI assistant during development.

Related issue: #11539

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels Jun 24, 2026
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thx~

.toLowerCase(Locale.ROOT)
val supportedCodecs = BackendsApiManager.getSettings.shuffleSupportedCodec()
if (!supportedCodecs.contains(codec)) {
if (codec == "none") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the value to None indicates no compression. It should not throw an exception, correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The throw is intentional — this was discussed in #12333 with @FelixYBW and @marin-ma. Gluten's native shuffle writer doesn't have a "no codec" code path; the correct knob to disable compression is spark.shuffle.compress=false (already handled in ColumnarShuffleWriter and ColumnarBatchSerializer). This PR improves the none case specifically by pointing users directly to that config, rather than the misleading spark.gluten.sql.columnar.shuffle.codec message from #12333.

s"To use a supported codec, set ${GlutenConfig.COLUMNAR_SHUFFLE_CODEC.key} " +
s"to ${supportedCodecs.mkString(" or ")}.")
}
throw new IllegalArgumentException(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a special judgment for "none", can you just refine the current error message for all invalid input?

            throw new IllegalArgumentException(
              s"Gluten shuffle does not support codec '$codec'. " +
                s"To disable shuffle compression, set spark.shuffle.compress=false. " +
                s"To use a supported codec, set ${GlutenConfig.COLUMNAR_SHUFFLE_CODEC.key} " +
                s"to ${supportedCodecs.mkString(" or ")}.")

In fact, spark throws same exception for all invalid codecs, and I don't find there's any document in spark saying "none" means uncompressed in spark.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — simplified in the latest commit. The unified message now reads:

Gluten shuffle does not support codec '$codec'. To disable shuffle compression, set spark.shuffle.compress=false. To use a supported codec, set spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.

This covers all invalid inputs (including none) without special-casing.

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk brijrajk force-pushed the fix/11539-none-codec-message branch from 340d72f to 08cc799 Compare June 26, 2026 14:36
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

…dec=none

Emit a unified error for all unsupported codecs that names the
offending codec, hints at spark.shuffle.compress=false to disable
compression entirely, and lists the valid codec choices.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brijrajk brijrajk force-pushed the fix/11539-none-codec-message branch from 08cc799 to 0692720 Compare June 26, 2026 14:38
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants