[GLUTEN-11539][VL] Improve error message when spark.io.compression.codec=none#12360
[GLUTEN-11539][VL] Improve error message when spark.io.compression.codec=none#12360brijrajk wants to merge 1 commit into
Conversation
|
Run Gluten Clickhouse CI on x86 |
| .toLowerCase(Locale.ROOT) | ||
| val supportedCodecs = BackendsApiManager.getSettings.shuffleSupportedCodec() | ||
| if (!supportedCodecs.contains(codec)) { | ||
| if (codec == "none") { |
There was a problem hiding this comment.
Setting the value to None indicates no compression. It should not throw an exception, correct?
There was a problem hiding this comment.
The throw is intentional — this was discussed in #12333 with @FelixYBW and @marin-ma. Gluten's native shuffle writer doesn't have a "no codec" code path; the correct knob to disable compression is spark.shuffle.compress=false (already handled in ColumnarShuffleWriter and ColumnarBatchSerializer). This PR improves the none case specifically by pointing users directly to that config, rather than the misleading spark.gluten.sql.columnar.shuffle.codec message from #12333.
| s"To use a supported codec, set ${GlutenConfig.COLUMNAR_SHUFFLE_CODEC.key} " + | ||
| s"to ${supportedCodecs.mkString(" or ")}.") | ||
| } | ||
| throw new IllegalArgumentException( |
There was a problem hiding this comment.
Instead of adding a special judgment for "none", can you just refine the current error message for all invalid input?
throw new IllegalArgumentException(
s"Gluten shuffle does not support codec '$codec'. " +
s"To disable shuffle compression, set spark.shuffle.compress=false. " +
s"To use a supported codec, set ${GlutenConfig.COLUMNAR_SHUFFLE_CODEC.key} " +
s"to ${supportedCodecs.mkString(" or ")}.")
In fact, spark throws same exception for all invalid codecs, and I don't find there's any document in spark saying "none" means uncompressed in spark.
There was a problem hiding this comment.
Good point — simplified in the latest commit. The unified message now reads:
Gluten shuffle does not support codec '$codec'. To disable shuffle compression, set spark.shuffle.compress=false. To use a supported codec, set spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.
This covers all invalid inputs (including none) without special-casing.
|
Run Gluten Clickhouse CI on x86 |
340d72f to
08cc799
Compare
|
Run Gluten Clickhouse CI on x86 |
…dec=none Emit a unified error for all unsupported codecs that names the offending codec, hints at spark.shuffle.compress=false to disable compression entirely, and lists the valid codec choices. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
08cc799 to
0692720
Compare
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
Follow-up to #12333.
When
spark.io.compression.codec=none, the error message from #12333 told users to configurespark.gluten.sql.columnar.shuffle.codec, which is misleading if their intent is to disable shuffle compression.noneis special-cased to also point users tospark.shuffle.compress=false, which is the correct knob for disabling Gluten native shuffle compression (already handled inColumnarShuffleWriterandColumnarBatchSerializer).Before (from #12333):
After:
Files changed
GlutenShuffleUtils.scala— special-casesnonewith a more actionable error messageMiscOperatorSuite.scala— updates thenoneregression test assertion to matchHow was this patch tested?
MiscOperatorSuite— 97/97 passed locally (Spark 4.0, Velox backend).Was this patch authored or co-authored using generative AI tooling?
Yes. Claude Code (claude-sonnet-4-6) was used as an AI assistant during development.
Related issue: #11539