From 29bea969f9a56ce19ac905118528e7677a167371 Mon Sep 17 00:00:00 2001 From: Danny McCormick Date: Fri, 26 Jun 2026 18:15:58 +0000 Subject: [PATCH 1/2] Add ValidatesRunner coverage comparison between Java and Python --- .../compatCoverage/compat-test-comparison.md | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 .test-infra/compatCoverage/compat-test-comparison.md diff --git a/.test-infra/compatCoverage/compat-test-comparison.md b/.test-infra/compatCoverage/compat-test-comparison.md new file mode 100644 index 000000000000..e92b8410d39f --- /dev/null +++ b/.test-infra/compatCoverage/compat-test-comparison.md @@ -0,0 +1,30 @@ +# ValidatesRunner Test Coverage Comparison: Java vs Python SDK + +This document compares the coverage of `ValidatesRunner` (Java) and `it_validatesrunner` (Python) integration test suites. +These tests are used to validate that runner implementations adhere to the Apache Beam model. + +The table below maps key Beam model features to their corresponding validating tests in both SDKs. + +| Test Case | Description | Java SDK | Python SDK | +| :--- | :--- | :---: | :---: | +| **Basic ParDo** | Verifies that a basic `ParDo` transform executes correctly on elements. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L398) | ❌ | +| **ParDo Empty Input** | Verifies that `ParDo` executes correctly when the input `PCollection` is empty. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L409) | ❌ | +| **ParDo Multi-Output** | Verifies that `ParDo` can emit elements to multiple tagged outputs. Python has multiple variations (yield, return, undeclared, empty). | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L688) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L253) | +| **Basic GroupByKey** | Verifies that `GroupByKey` groups elements by their key. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L127) | ❌ | +| **GroupByKey Empty** | Verifies that `GroupByKey` behaves correctly when the input is empty. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L159) | ❌ | +| **Basic Flatten** | Verifies that `Flatten` merges multiple `PCollection`s into a single `PCollection`. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L90) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L742) | +| **Flatten Singleton** | Verifies that `Flatten` works on a list containing only a single `PCollection`. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L103) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L757) | +| **Flatten Multiple Copies** | Verifies that `Flatten` works when the same `PCollection` is passed multiple times. Note: Python version is disabled in streaming on Dataflow. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L141) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L831) | +| **Flatten Empty** | Verifies that `Flatten` works on an empty list of `PCollection`s. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L129) | ❌ | +| **Flatten Multiple Consumers** | Verifies `Flatten` when the input `PCollection`s have other consumers in the pipeline. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L323) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L866) | +| **Singleton Side Input** | Verifies consuming a `PCollection` as a singleton side input. Python has variations for different defaults and duplicate views. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L87) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L253) | +| **Empty Singleton Side Input** | Verifies behavior when a singleton side input is empty. Java expects an error; Python returns an empty side input helper. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L193) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L152) | +| **Iterable Side Input** | Verifies consuming a `PCollection` as an iterable side input. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L550) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L192) | +| **List Side Input** | Verifies consuming a `PCollection` as a list side input. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L326) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L228) | +| **Impulse** | Verifies the `Impulse` transform which produces a single empty byte array element. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ImpulseTest.java#L40) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L217) | + +## Observations + +1. **ParDo and GroupByKey**: Java has explicit `ValidatesRunner` tests for basic `ParDo` and `GroupByKey` (including empty inputs). Python lacks these in its `it_validatesrunner` suite, likely relying on them being implicitly tested by other composite transforms or running them only in unit tests (DirectRunner). +2. **Flatten**: Both SDKs have good coverage for various `Flatten` scenarios, including singleton lists, multiple copies of the same PCollection, and multiple consumers. Python lacks an explicit `it_validatesrunner` test for flattening an empty list. +3. **Side Inputs**: Both SDKs have comprehensive coverage for side inputs (Singleton, Iterable, List). Note that they handle empty singleton side inputs differently (Java throws an exception, Python returns an empty side input representation). From bc42d5a48f6c8131e8c4448d9ad6d7e1823e3dd9 Mon Sep 17 00:00:00 2001 From: Danny McCormick Date: Fri, 26 Jun 2026 18:40:16 +0000 Subject: [PATCH 2/2] Expand ValidatesRunner coverage comparison with comprehensive cross-SDK test mapping Replaces the initial stub with a full comparison table covering 100+ test cases across 14 feature categories: ParDo, State & Timers, GroupByKey, Flatten, Combine, CoGroupByKey, Side Inputs, Windowing, Metrics, TestStream, Impulse, SDF, Reshuffle, Redistribute, DoFn Lifecycle, Timestamps, Per-key Ordering, and XLang transforms. Co-Authored-By: Claude Sonnet 4 --- .../compatCoverage/compat-test-comparison.md | 178 +++++++++++++++--- 1 file changed, 156 insertions(+), 22 deletions(-) diff --git a/.test-infra/compatCoverage/compat-test-comparison.md b/.test-infra/compatCoverage/compat-test-comparison.md index e92b8410d39f..93562ce88585 100644 --- a/.test-infra/compatCoverage/compat-test-comparison.md +++ b/.test-infra/compatCoverage/compat-test-comparison.md @@ -1,30 +1,164 @@ -# ValidatesRunner Test Coverage Comparison: Java vs Python SDK +# ValidatesRunner Cross-SDK Test Coverage Comparison -This document compares the coverage of `ValidatesRunner` (Java) and `it_validatesrunner` (Python) integration test suites. -These tests are used to validate that runner implementations adhere to the Apache Beam model. +This document compares the coverage of the `ValidatesRunner` test suite between the Java SDK and the Python SDK. -The table below maps key Beam model features to their corresponding validating tests in both SDKs. +The Java SDK marks tests with `@Category(ValidatesRunner.class)` to designate tests that must pass on any compliant runner implementation. The Python SDK lacks an exact equivalent annotation; the closest analog is the set of integration tests in `ptransform_test.py`, `sideinputs_test.py`, `combiners_test.py`, `fn_runner_test.py`, `metric_test.py`, `dofn_lifecycle_test.py`, and `validate_runner_xlang_test.py`, which run core Beam primitives against a real runner (typically DirectRunner or the Portability FnApiRunner). + +A ✓ indicates the test case is covered in that SDK with a link to the test. An ✗ indicates no direct equivalent test was found in the SDK's validates-runner suite. | Test Case | Description | Java SDK | Python SDK | | :--- | :--- | :---: | :---: | -| **Basic ParDo** | Verifies that a basic `ParDo` transform executes correctly on elements. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L398) | ❌ | -| **ParDo Empty Input** | Verifies that `ParDo` executes correctly when the input `PCollection` is empty. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L409) | ❌ | -| **ParDo Multi-Output** | Verifies that `ParDo` can emit elements to multiple tagged outputs. Python has multiple variations (yield, return, undeclared, empty). | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L688) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L253) | -| **Basic GroupByKey** | Verifies that `GroupByKey` groups elements by their key. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L127) | ❌ | -| **GroupByKey Empty** | Verifies that `GroupByKey` behaves correctly when the input is empty. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L159) | ❌ | -| **Basic Flatten** | Verifies that `Flatten` merges multiple `PCollection`s into a single `PCollection`. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L90) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L742) | -| **Flatten Singleton** | Verifies that `Flatten` works on a list containing only a single `PCollection`. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L103) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L757) | -| **Flatten Multiple Copies** | Verifies that `Flatten` works when the same `PCollection` is passed multiple times. Note: Python version is disabled in streaming on Dataflow. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L141) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L831) | -| **Flatten Empty** | Verifies that `Flatten` works on an empty list of `PCollection`s. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L129) | ❌ | -| **Flatten Multiple Consumers** | Verifies `Flatten` when the input `PCollection`s have other consumers in the pipeline. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L323) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L866) | -| **Singleton Side Input** | Verifies consuming a `PCollection` as a singleton side input. Python has variations for different defaults and duplicate views. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L87) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L253) | -| **Empty Singleton Side Input** | Verifies behavior when a singleton side input is empty. Java expects an error; Python returns an empty side input helper. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L193) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L152) | -| **Iterable Side Input** | Verifies consuming a `PCollection` as an iterable side input. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L550) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L192) | -| **List Side Input** | Verifies consuming a `PCollection` as a list side input. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L326) | [✔️](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L228) | -| **Impulse** | Verifies the `Impulse` transform which produces a single empty byte array element. | [✔️](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ImpulseTest.java#L40) | [✔️](../../sdks/python/apache_beam/transforms/ptransform_test.py#L217) | +| **— ParDo —** | | | | +| ParDo — Basic Execution | A `DoFn` applied to a non-empty `PCollection` produces correctly transformed output elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L398) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L124) | +| ParDo — Empty Input | A `DoFn` applied to an empty `PCollection` produces no output elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L410) | ✗ | +| ParDo — Empty Outputs | A `DoFn` that emits nothing for every element produces an empty output `PCollection`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L426) | ✗ | +| ParDo — Pipeline Options Parameter | A `DoFn` can receive `PipelineOptions` as an annotated parameter at runtime. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L634) | ✗ | +| ParDo — Multiple Tagged Outputs | A `DoFn` that writes to multiple tagged output `PCollection`s produces the correct elements in each. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L689) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L254) | +| ParDo — Side Inputs | A `DoFn` can read from side input `PCollection`s and produce output reflecting their values. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L846) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L438) | +| ParDo — Multi-output with Side Inputs | A multi-output `DoFn` can read side inputs and emit to both main and tagged outputs correctly. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1223) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L419) | +| ParDo — Side Inputs across Multiple Windows | Side inputs are windowed correctly so each main-input window reads the side input for its corresponding window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1317) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L506) | +| ParDo — Multi-output Chaining | Chaining multiple `ParDo` transforms with tagged outputs produces correct elements at each step. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1448) | ✗ | +| ParDo — Bundle Finalization | Bundle finalization callbacks registered in `@ProcessElement` are invoked after the bundle succeeds. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1638) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1454) | +| ParDo — Error in StartBundle | An exception thrown in `@StartBundle` propagates and fails the pipeline. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1753) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L403) | +| ParDo — Error in ProcessElement | An exception thrown in `@ProcessElement` propagates and fails the pipeline. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1765) | ✗ | +| ParDo — Error in FinishBundle | An exception thrown in `@FinishBundle` propagates and fails the pipeline. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1778) | ✗ | +| ParDo — Windowing in Start/Finish Bundle | Windowing information is accessible from within `@StartBundle` and `@FinishBundle`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1791) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L350) | +| ParDo — Output with Custom Timestamp | Elements emitted from `@ProcessElement` can carry explicitly set output timestamps. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L1998) | ✗ | +| ParDo — Shift Timestamp | Timestamps can be shifted forward within the allowed skew without error. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L2054) | ✗ | +| **— State and Timers —** | | | | +| Value State | A stateful `DoFn` can read and write a `ValueState` cell that persists across elements within a key. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L2285) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L649) | +| Bag State | A stateful `DoFn` can accumulate elements into a `BagState` cell and read them back. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L2603) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L873) | +| Set State | A stateful `DoFn` can maintain a `SetState` of unique elements and read them back correctly. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L2651) | ✗ | +| Map State | A stateful `DoFn` can store and retrieve key-value pairs from a `MapState`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L2704) | ✗ | +| Combining State | A stateful `DoFn` uses a `CombiningState` accumulator that merges values incrementally. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L3560) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L649) | +| Event-time Timer | A `DoFn` registers an event-time timer that fires correctly when the watermark advances past the target. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L4713) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L705) | +| Processing-time Timer | A `DoFn` registers a processing-time timer that fires when processing time advances. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L5256) | ✗ | +| Dynamic Timer (Timer Family) | A `DoFn` uses a timer family to register multiple named timers dynamically per element. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L6901) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L950) | +| OnWindowExpiration | A `DoFn` annotated with `@OnWindowExpiration` fires once per window when the window expires. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java#L7266) | ✗ | +| **— GroupByKey —** | | | | +| GroupByKey — Basic | `GroupByKey` on a non-empty `PCollection>` produces the correct grouped `PCollection>>`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L127) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L520) | +| GroupByKey — Empty Input | `GroupByKey` applied to an empty `PCollection` produces an empty output `PCollection`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L159) | ✗ | +| GroupByKey — Timestamp Combiner (Earliest) | When `EARLIEST` is set as the `TimestampCombiner`, the earliest input timestamp is propagated to each output group. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L505) | ✗ | +| GroupByKey — Timestamp Combiner (Latest) | When `LATEST` is set as the `TimestampCombiner`, the latest input timestamp is propagated to each output group. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L525) | ✗ | +| GroupByKey — Fixed Windows | `GroupByKey` groups elements separately within each fixed window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L736) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1272) | +| GroupByKey — Sliding Windows | `GroupByKey` correctly groups elements that appear in multiple overlapping sliding windows. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L777) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1257) | +| GroupByKey — Merging Windows (Sessions) | `GroupByKey` correctly merges session windows before grouping elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/GroupByKeyTest.java#L807) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1272) | +| **— Flatten —** | | | | +| Flatten — Basic | `Flatten` merges multiple `PCollection`s into a single `PCollection` containing all elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L91) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L743) | +| Flatten — Singleton List | `Flatten` on a list containing a single `PCollection` passes all elements through unchanged. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L103) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L758) | +| Flatten — Empty List | `Flatten` applied to an empty list of `PCollection`s produces an empty output `PCollection`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L129) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L750) | +| Flatten — Multiple Copies of Same PCollection | `Flatten` can include the same `PCollection` multiple times, causing each element to appear multiple times in the output. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L141) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L832) | +| Flatten — Empty Result as Side Input | An empty `PCollection` produced by `Flatten` can be used as a side input without error. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L195) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L534) | +| Flatten — Multiple Consumers | `Flatten` inputs that also have other downstream consumers do not cause incorrect data sharing or duplication. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L323) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L867) | +| Flatten — Iterables | `Flatten.iterables()` on a `PCollection>` produces a flat `PCollection`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L249) | ✗ | +| Flatten — Different Coders | `Flatten` works correctly when input and output `PCollection`s have different but compatible coders. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/FlattenTest.java#L358) | ✗ | +| **— Combine —** | | | | +| Combine — Globally | `Combine.globally()` applies a `CombineFn` across all elements and produces the correct single result. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L660) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L459) | +| Combine — per Key | `Combine.perKey()` applies a merging function independently per key and produces the correct per-key results. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L891) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L485) | +| Combine — Hot Key Fanout | Hot-key fanout distributes combining work across shards and still produces the correct final result. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L706) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L520) | +| Combine — Hot Key with Accumulating Mode | Hot-key fanout combined with an accumulating-panes trigger produces correct results per pane. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L741) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L553) | +| Combine — With Side Inputs | `CombineFnWithContext` reads a side input during merging and produces the correct context-dependent result. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L995) | ✗ | +| Combine — in Fixed Windows | `Combine.globally()` and `Combine.perKey()` produce correct results within each fixed window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1063) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L709) | +| Combine — in Sliding Windows | `Combine.globally()` accumulates sliding windows correctly so elements that appear in multiple windows are counted accordingly. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1132) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L587) | +| Combine — in Session Windows | `Combine.globally()` and `Combine.perKey()` produce correct results within merged session windows. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1269) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L687) | +| Combine — Globally as Singleton Side Input | The result of `Combine.globally()` can be used as a singleton side input in a downstream `DoFn`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L1360) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L507) | +| **— CoGroupByKey —** | | | | +| CoGroupByKey — Basic | `CoGroupByKey` joins elements from multiple `PCollection`s by key and produces the correct `CoGbkResult` per key. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGroupByKeyTest.java#L448) | ✗ | +| CoGroupByKey — With Windowing | `CoGroupByKey` respects window boundaries so elements in different windows are grouped separately per key. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/join/CoGroupByKeyTest.java#L473) | ✗ | +| **— Side Inputs —** | | | | +| Side Input — Singleton | A `View.asSingleton()` side input is accessible from `@ProcessElement` and returns the correct single value. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L88) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L75) | +| Side Input — Windowed Singleton | A singleton side input windowed with fixed windows returns the correct value for the main input's window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L114) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L129) | +| Side Input — Empty Singleton with Default | An empty `PCollection` used as a singleton side input returns the configured default value without error. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L194) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L153) | +| Side Input — Non-singleton Error | Attempting to use a multi-element `PCollection` as a singleton side input without a default raises an error. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L222) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L172) | +| Side Input — List | A `View.asList()` side input is accessible as an ordered `List` containing all elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L326) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L229) | +| Side Input — Windowed List | A list side input windowed with fixed windows contains only elements from the matching window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L423) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L81) | +| Side Input — Empty List | An empty `PCollection` used as a list side input returns an empty list rather than an error. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L472) | ✗ | +| Side Input — Iterable | A `View.asIterable()` side input is accessible as an `Iterable` over all elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L550) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L192) | +| Side Input — Windowed Iterable | An iterable side input windowed with fixed windows iterates only over elements in the matching window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L578) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L122) | +| Side Input — Empty Iterable | An empty `PCollection` used as an iterable side input returns an empty iterable without error. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L624) | ✗ | +| Side Input — Multimap | A `View.asMultimap()` side input allows lookup of all values for a given key. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapViewTest.java#L80) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L582) | +| Side Input — Map | A `View.asMap()` side input allows efficient single-value lookup by key. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/MapViewTest.java#L526) | ✗ | +| Side Input — Fixed-window to Fixed-window | A fixed-windowed side input consumed by a fixed-windowed main input is matched correctly window by window. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L687) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L81) | +| Side Input — Fixed-window to Global | A fixed-windowed side input consumed in a global-windowed `DoFn` falls back to the single-window default. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L728) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L95) | +| Side Input — Triggered Latest Singleton | A singleton side input derived from a triggered GBK reflects the most recently produced pane's value. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ViewTest.java#L273) | [✓](../../sdks/python/apache_beam/transforms/sideinputs_test.py#L367) | +| **— Windowing —** | | | | +| Windowing — Fixed (Partitioning) | Fixed windows assign each element to exactly one window; per-key counts and timestamps within each window are correct. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java#L107) | [✓](../../sdks/python/apache_beam/transforms/combiners_test.py#L709) | +| Windowing — Sliding (Non-Partitioning) | Sliding windows assign each element to multiple overlapping windows; element-count totals reflect that multiplicity. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java#L132) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1257) | +| Windowing — Session (Merging) | Session windows merge elements that are close in time into a single window; isolated bursts remain separate. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java#L157) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1272) | +| Windowing — Preservation | Windowing assignments are preserved through a `Flatten` followed by a windowed aggregation. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowingTest.java#L175) | ✗ | +| **— Metrics —** | | | | +| Metrics — Committed Counter | A counter metric incremented in a `DoFn` reports the correct committed value after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L317) | [✓](../../sdks/python/apache_beam/metrics/metric_test.py#L157) | +| Metrics — Committed Distribution | A distribution metric (min, max, mean, sum) reports correct committed values after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L325) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1386) | +| Metrics — Committed Gauge | A gauge metric reports the latest committed value after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L333) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1386) | +| Metrics — Committed StringSet | A string-set metric reports the correct set of committed string values after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L341) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1386) | +| Metrics — Committed BoundedTrie | A bounded-trie metric reports the correct committed trie structure after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L349) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1386) | +| Metrics — Attempted Counter | A counter metric reports the correct attempted (including retried) value after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L455) | [✓](../../sdks/python/apache_beam/metrics/metric_test.py#L157) | +| Metrics — Attempted Distribution | A distribution metric reports the correct attempted values (including retried bundles) after the pipeline finishes. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L463) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1386) | +| Metrics — Bounded Source in Split | Metrics emitted during source split and advance are reported as attempted metrics for bounded sources. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/metrics/MetricsTest.java#L495) | ✗ | +| **— TestStream —** | | | | +| TestStream — Late Data Accumulation | Late elements that arrive within the allowed lateness window are included in accumulated panes; droppably-late elements are excluded. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L89) | ✗ | +| TestStream — Processing-time Trigger | A processing-time trigger fires correctly when the `TestStream` advances the processing-time clock. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L160) | ✗ | +| TestStream — Discarding Mode | With `DISCARD_FIRED_PANES`, each pane contains only the elements that arrived since the last pane fired. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L191) | ✗ | +| TestStream — Elements near Positive Infinity | Elements timestamped near the maximum event-time boundary are assigned to the global window and processed correctly. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L273) | ✗ | +| TestStream — Multiple Streams | Two independent `TestStream` sources can coexist in the same pipeline and advance their watermarks independently. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L299) | ✗ | +| TestStream — Timers with ParDo | Event-time timers registered in a stateful `DoFn` fire correctly when a `TestStream` advances the watermark. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/testing/TestStreamTest.java#L419) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L676) | +| **— Impulse —** | | | | +| Impulse — Basic | `Impulse.create()` emits a single empty byte-array element, which can be flat-mapped into an arbitrary collection. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ImpulseTest.java#L40) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L218) | +| **— Splittable DoFn (SDF) —** | | | | +| SDF — Basic (Bounded) | A bounded SDF correctly processes each restriction, producing the expected output elements. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L144) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L974) | +| SDF — Basic (Unbounded) | An unbounded SDF processes restrictions incrementally and produces correct output until the pipeline is drained. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L150) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L974) | +| SDF — Watermark Tracking | An SDF that emits watermark estimates advances the pipeline watermark correctly as restrictions are processed. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L177) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1026) | +| SDF — Windowed Side Input | A bounded SDF can access a windowed side input and produces output that depends on the correct window's value. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L459) | ✗ | +| SDF — Additional (Tagged) Output | A bounded SDF can emit to both the main output and additional tagged outputs correctly. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L761) | ✗ | +| SDF — Checkpointing | A bounded SDF that checkpoints mid-restriction still produces the correct final output across checkpoint boundaries. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L309) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1116) | +| SDF — Bundle Finalization | Bundle finalization callbacks registered inside an SDF's `@ProcessElement` are invoked after the bundle succeeds. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L1029) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1454) | +| SDF — Lifecycle Methods | `@Setup`, `@StartBundle`, `@FinishBundle`, and `@Teardown` are called in the correct order for a bounded SDF. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L900) | ✗ | +| **— Reshuffle —** | | | | +| Reshuffle — Basic | `Reshuffle.of()` on a KV collection preserves all elements and the assigned windowing strategy. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java#L102) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1183) | +| Reshuffle — Preserves Timestamps | Element timestamps are preserved exactly after passing through `Reshuffle`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java#L123) | ✗ | +| Reshuffle — Preserves Metadata | Window assignments and pane info are preserved exactly after passing through `Reshuffle`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java#L169) | ✗ | +| Reshuffle — After GBK | `Reshuffle` works correctly after a `GroupByKey` and does not corrupt window assignments. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReshuffleTest.java#L261) | [✓](../../sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1188) | +| **— Redistribute —** | | | | +| Redistribute — Basic | `Redistribute.byKey()` preserves all elements and the assigned windowing strategy after redistribution. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/RedistributeTest.java#L111) | ✗ | +| Redistribute — Preserves Timestamps | Element timestamps are preserved exactly after passing through `Redistribute`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/RedistributeTest.java#L132) | ✗ | +| Redistribute — Preserves Metadata | Window assignments and pane info are preserved after passing through `Redistribute`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/RedistributeTest.java#L178) | ✗ | +| **— DoFn Lifecycle —** | | | | +| Lifecycle — Method Call Order | `@Setup`, `@StartBundle`, `@ProcessElement`, `@FinishBundle`, and `@Teardown` are called in the correct order per bundle. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java#L77) | [✓](../../sdks/python/apache_beam/transforms/dofn_lifecycle_test.py#L79) | +| Lifecycle — Teardown After Setup Exception | `@Teardown` is still invoked even when `@Setup` throws an exception, ensuring resource cleanup. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java#L180) | ✗ | +| Lifecycle — Teardown After StartBundle Exception | `@Teardown` is still invoked even when `@StartBundle` throws an exception, ensuring resource cleanup. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java#L193) | [✓](../../sdks/python/apache_beam/transforms/ptransform_test.py#L403) | +| Lifecycle — Teardown After ProcessElement Exception | `@Teardown` is still invoked even when `@ProcessElement` throws an exception, ensuring resource cleanup. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java#L206) | ✗ | +| Lifecycle — Teardown After FinishBundle Exception | `@Teardown` is still invoked even when `@FinishBundle` throws an exception, ensuring resource cleanup. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoLifecycleTest.java#L220) | ✗ | +| **— Timestamps —** | | | | +| WithTimestamps — Apply | `WithTimestamps.of(fn)` correctly stamps each element with the timestamp returned by the provided function. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithTimestampsTest.java#L52) | ✗ | +| WithTimestamps — Backwards with Skew | Backwards timestamp assignments succeed when `withAllowedTimestampSkew` is configured with sufficient skew. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/WithTimestampsTest.java#L107) | ✗ | +| ReifyTimestamps — In Values | `ReifyTimestamps.inValues()` wraps each KV value in a `TimestampedValue` retaining the element's timestamp. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ReifyTimestampsTest.java#L44) | ✗ | +| **— Per-key Ordering —** | | | | +| Per-key Ordering — With Reshuffle | Elements delivered to a `@RequiresTimeSortedInput` DoFn are presented in ascending timestamp order per key, even after a `Reshuffle`. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/PerKeyOrderingTest.java#L101) | ✗ | +| **— Cross-Language Transforms (XLang) —** | | | | +| XLang — ParDo Single Input/Output | Cross-language ParDo maps elements from one language's `PCollection` through an expansion-service transform and produces the correct single output. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L258) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L280) | +| XLang — ParDo Multi Input/Output with Side Input | Cross-language ParDo with multiple main inputs, a side input, and multiple tagged outputs correctly routes elements to each output. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L279) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L286) | +| XLang — GroupByKey | A cross-language GroupByKey correctly groups KV pairs produced by one SDK when consumed by another. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L301) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L292) | +| XLang — GroupByKey with GBEK (Valid Secret) | Cross-language GroupByKey runs correctly when the `GBEK` pipeline option references a valid GCP Secret Manager secret. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L386) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L381) | +| XLang — GroupByKey with GBEK (Invalid Secret) | Cross-language GroupByKey fails with an appropriate error when the `GBEK` pipeline option references a malformed or nonexistent secret. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L406) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L381) | +| XLang — CoGroupByKey | A cross-language CoGroupByKey correctly joins two KV PCollections by key when the transform crosses language boundaries. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L435) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L298) | +| XLang — Combine Globally | A cross-language global Combine produces the correct aggregate when the CombineFn is implemented in a foreign SDK. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L456) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L304) | +| XLang — Combine per Key | A cross-language per-key Combine produces the correct per-key aggregates when the merging function is in a foreign SDK. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L477) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L310) | +| XLang — Flatten | A cross-language Flatten merges PCollections from a foreign-SDK expansion service into the expected single output. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L498) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L317) | +| XLang — Partition | A cross-language Partition splits a PCollection into multiple named output collections according to a foreign-SDK PartitionFn. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L520) | [✓](../../sdks/python/apache_beam/transforms/validate_runner_xlang_test.py#L323) | +| XLang — Python Dependencies | Cross-language transforms that depend on third-party Python packages succeed when those packages are correctly staged by the runner. | [✓](../../sdks/java/core/src/test/java/org/apache/beam/sdk/util/construction/ValidateRunnerXlangTest.java#L529) | ✗ | ## Observations -1. **ParDo and GroupByKey**: Java has explicit `ValidatesRunner` tests for basic `ParDo` and `GroupByKey` (including empty inputs). Python lacks these in its `it_validatesrunner` suite, likely relying on them being implicitly tested by other composite transforms or running them only in unit tests (DirectRunner). -2. **Flatten**: Both SDKs have good coverage for various `Flatten` scenarios, including singleton lists, multiple copies of the same PCollection, and multiple consumers. Python lacks an explicit `it_validatesrunner` test for flattening an empty list. -3. **Side Inputs**: Both SDKs have comprehensive coverage for side inputs (Singleton, Iterable, List). Note that they handle empty singleton side inputs differently (Java throws an exception, Python returns an empty side input representation). +1. **Strong parity**: Both SDKs cover the most fundamental primitives well — basic `ParDo`, `GroupByKey`, `Flatten`, `Combine`, side inputs (singleton, list, iterable), metrics, `Impulse`, SDF basics, `Reshuffle`, DoFn lifecycle, and all cross-language transforms. These form a solid shared foundation. + +2. **Java-only coverage**: Java has explicit validates-runner tests for many advanced scenarios that the Python suite does not directly cover: timestamp manipulation (`WithTimestamps`, `ReifyTimestamps`), per-key ordering guarantees, `Redistribute`, most `TestStream` scenarios (processing-time triggers, discarding mode, multiple streams), advanced timer patterns (processing-time timers, `@OnWindowExpiration`), set/map state types, `CoGroupByKey`, and lifecycle teardown-on-exception tests. + +3. **Windowing coverage gap**: Java's `WindowingTest` explicitly verifies partitioning, non-partitioning, and merging windowing semantics as first-class validates-runner tests. Python's windowing coverage is distributed across other test files (`fn_runner_test.py`, `combiners_test.py`) and tends to be more incidental than deliberate. + +4. **CoGroupByKey gap**: Java has dedicated validates-runner tests for `CoGroupByKey` including a windowed variant. Python's `CoGroupByKey` is used only incidentally in `ptransform_test.py` and has no dedicated windowed test in the validates-runner suite. + +5. **State and timers gap**: Java's validates-runner suite extensively covers stateful processing (value, bag, set, map, combining state) and all timer types (event-time, processing-time, timer families, `@OnWindowExpiration`). Python has basic event-time timer and value/bag state coverage in `fn_runner_test.py`, but many advanced patterns (set/map state, processing-time timers, `@OnWindowExpiration`) have no equivalent test. + +6. **Redistribute**: Java has a dedicated `RedistributeTest` with `@Category(ValidatesRunner.class)`. The Python SDK has no direct equivalent validates-runner tests for the `Redistribute` transform. + +7. **Cross-language parity**: The cross-language validates-runner suite is the most symmetric section — both Java and Python have nearly identical coverage for all core xlang transforms. The only asymmetry is `XLang — Python Dependencies`, which exists only in Java because it specifically tests Python dependency staging from the Java SDK's perspective.