Fix multipart download response metadata for presigned URL and normal paths by jencymaryjoseph · Pull Request #7077 · aws/aws-sdk-java-v2

jencymaryjoseph · 2026-06-25T17:50:19Z

Motivation and Context

When the S3 multipart async client downloads a large object in multiple part requests (partNumber for normal, ranged GETs for presigned URLs), the response metadata exposed to the customer reflects only the first part — not the full object. Customers see incorrect contentLength (part size instead of total), a partial contentRange, and meaningless
composite checksum values.

Modifications

The fix has two prongs because there are two download paths with different architectures:

Parallel path (toFile): A subscriber manages all parts concurrently and controls when
resultFuture.complete() is called. We rewrite the response just before completing the future.
Serial path (toBytes, custom transformers): Parts flow one at a time through a
SplittingTransformer which calls the customer's onResponse() with the first part's
response. We inject a responseMapper into the splitting infrastructure that rewrites the
response at the onResponse() delivery point — before the customer ever sees it.

Both paths use the same toFullObjectResponse() function to do the actual rewrite.

Common infrastructure (sdk-core + MultipartDownloadUtils)

Shared rewrite logic and the mechanism to inject it into the splitting infrastructure,
used by both presigned and normal paths.

Added responseMapper (UnaryOperator) to SplittingTransformer and
ByteArraySplittingTransformer. When the splitting infrastructure delivers the first part's
response to the customer's transformer via onResponse(), the mapper rewrites it first.
This is the injection point for the serial path (toBytes, custom transformers) — without it,
the customer's onResponse() callback would see raw per-part metadata with no way to fix it
after the fact.
Added toFullObjectResponse() — the rewrite function itself. Takes the first part's response
and produces what a single non-multipart GetObject would have returned:
- contentLength → total object size (parsed from Content-Range)
- contentRange → bytes 0-(total-1)/total
- All checksum value fields → null when checksumType is COMPOSITE (composite checksums are
  per-part hashes that cannot be validated against the full object)
Added splitWithResponseRewrite() — convenience method that calls split() with
toFullObjectResponse pre-configured as the mapper. Used by both DownloadObjectHelper
and PresignedUrlDownloadHelper on their serial paths.

Presigned URL path

The parallel subscriber rewrites the response before completing the future; the serial path
injects the rewrite via the responseMapper.

Parallel: ParallelPresignedUrlMultipartDownloaderSubscriber calls toFullObjectResponse()
before completing the result future.
Serial: PresignedUrlDownloadHelper uses splitWithResponseRewrite() so the mapper
fires at onResponse() delivery.
416 fix: Broadened the empty-object fallback catch to also match raw S3Exception with
status 416. The serial path (via SplittingTransformer) surfaces the raw exception directly
without wrapping it — so the original catch on EmptyObjectRangeNotSatisfiableException alone
never matched, causing the fallback to be skipped entirely for custom transformers.

Normal (non-presigned) path

Same two-prong approach as presigned — parallel rewrites at future completion, serial
injects via responseMapper.

Parallel: ParallelMultipartDownloaderSubscriber calls toFullObjectResponse() before
completing the result future.
Serial: DownloadObjectHelper uses splitWithResponseRewrite().

Testing

Unit tests: MultipartDownloadUtilsTest — 11 tests covering toFullObjectResponse()
(content-length/range rewrite, checksum nulling for COMPOSITE, preservation for FULL_OBJECT,
no-op when Content-Range is absent).
WireMock tests:
- S3MultipartClientGetObjectWiremockTest — custom transformer receives full-object metadata
- PresignedUrlMultipartDownloaderSubscriberWiremockTest — 416 fallback works for custom
  transformers (fails without the fix)
Integration tests:
- AsyncPresignedUrlExtensionTestSuite — presigned toBytes/toFile metadata assertions
- S3MultipartClientFileDownloadIntegrationTest — normal toFile + checksumMode assertions
- CustomTransformerMultipartIntegrationTest — custom transformer sees correct metadata
  and nulled composite checksums

Screenshots (if appropriate)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist

I have read the CONTRIBUTING document
Local run of mvn install succeeds
My code follows the code style of this project
My change requires a change to the Javadoc documentation
I have updated the Javadoc documentation accordingly
I have added tests to cover my changes
All new and existing tests passed
I have added a changelog entry. Adding a new entry must be accomplished by running the scripts/new-change script and following the instructions. Commit the new file created by the script in .changes/next-release with your changes.
My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

I confirm that this pull request can be released under the Apache 2 license

zoewangg · 2026-06-25T19:33:50Z

+    }
+
+    @Test
+    void multipartDownload_checksumModeEnabled_hasCorrectFullObjectMetadata() throws Exception {


Is this test necessary?

zoewangg · 2026-06-25T19:34:29Z

+    }
+
+    @Test
+    void multipartDownload_toBytes_smallObject_hasCorrectFullObjectMetadata() throws Exception {


Can we consolidate this with multipartDownload_toFile_hasCorrectFullObjectMetadata using parameterized tests?

zoewangg · 2026-06-25T19:35:26Z

    }

+    @Test
+    void getObject_withRangeRequest_preservesPartialMetadata() throws Exception {


Same here, let's try to consolidate tests with parameterized tests

zoewangg · 2026-06-25T19:35:38Z

+    }
+
+    @Test
+    void getObject_mpuObjectWithChecksumMode_hasCorrectMetadata() throws Exception {


Same here. How is checksum mode special?

zoewangg · 2026-06-25T19:36:04Z

+    }
+
    // Helper methods
+    private static void uploadMpuObjectWithChecksum() {


Checksum should be enabled by default, any reason we need to upload it with checksum?

If an object is MPU without checksumMode enabled, S3 doesnt return checksum.
If an object is MPU with checksumMode enabled, S3 doesnt returns FULL_OBJECT checksum.
And if uploaded with checksum enabled and with an explicit checksum algorithm like .checksumAlgorithm(ChecksumAlgorithm.CRC32)) S3 returns COMPOSITE checksum.

zoewangg · 2026-06-25T19:37:28Z

+        if (transformer instanceof ByteArrayAsyncResponseTransformer) {
+            return (SplitResult<GetObjectResponse, T>)
+                ((ByteArrayAsyncResponseTransformer<GetObjectResponse>) transformer).split(splitConfig, mapper);
+        }


Any reason we have special logic for ByteArrayAsyncResponseTransformer? ByteArrayAsyncResponseTransformer is an internal API and not supposed to be used across modules

Oh yeah, removed the instanceof and added split(config, mapper) to the AsyncResponseTransformer interface (with ByteArrayAsyncResponseTransformer overriding it). splitWithResponseRewrite() now just calls transformer.split(splitConfig, mapper)

zoewangg · 2026-06-25T19:39:16Z

-                if (cause instanceof EmptyObjectRangeNotSatisfiableException) {
+                // Parallel path wraps it as EmptyObjectRangeNotSatisfiableException;
+                // serial path (toBytes, custom transformers) surfaces raw S3Exception.
+                if (cause instanceof EmptyObjectRangeNotSatisfiableException


Question: what is EmptyObjectRangeNotSatisfiableException?

EmptyObjectRangeNotSatisfiableException is an internal exception created by the parallel subscriber when it gets a 416 from S3 on a ranged request to an empty object. The serial path doesnt go through the subscriber, so the raw 416 S3Exception arrives without being wrapped. Planning to remove this exception class as a follow up and just use isRangeNotSatisfiable() for all paths.

zoewangg · 2026-06-25T19:41:52Z

+             UnaryOperator.identity());
+    }
+
+    private SplittingTransformer(AsyncResponseTransformer<ResponseT, ResultT> upstreamResponseTransformer,


Can we update this ctor to take a Builder parameter? That way, we don't need to create a new ctor.

…parameterized tests

… wrapper

jencymaryjoseph · 2026-06-26T18:01:13Z

-                              ? progressUpdater.wrapForNonSerialFileDownload(
-                                  responseTransformer, GetObjectRequest.builder().build())
-                              : progressUpdater.wrapResponseTransformer(responseTransformer);
+        if (isS3ClientMultipartEnabled()


Fixes test failure for bytesTransferred not firing for presigned toBytes multipart downloads.
That path was routed to wrapForNonSerialFileDownload, which only counts bytes inside its split() override, but the serial download splits and drives onStream directly, bypassing it. Now routed by parallelSplitSupported() so serial toBytes uses wrapResponseTransformerForMultipartDownload (counts in onStream), mirroring the regular download path

zoewangg · 2026-06-26T17:46:22Z

+     * Creates a {@link SplitResult} with a response mapper applied at the upstream {@code onResponse} delivery point.
+     */
+    @SdkInternalApi
+    default SplitResult<ResponseT, ResultT> split(SplittingTransformerConfiguration splitConfig,


IMO all public methods in a public API class are inherently public APIs, so we can't really add SdkInternalApi. Should we consider folding responseMapper into SplittingTransformerConfiguration. That way, we don't have to introduce another method

zoewangg · 2026-06-26T17:50:18Z

+        this(upstreamResponseTransformer, resultFuture, UnaryOperator.identity());
+    }
+
+    public ByteArraySplittingTransformer(AsyncResponseTransformer<ResponseT, ResponseBytes<ResponseT>>


Why do we need to new ctor? can we just add a new parameter?

zoewangg · 2026-06-26T18:07:02Z

-                              : progressUpdater.wrapResponseTransformer(responseTransformer);
+        if (isS3ClientMultipartEnabled()
+            && presignedDownloadRequest.presignedUrlDownloadRequest().range() == null) {
+            if (responseTransformer.split(b -> b.bufferSizeInBytes(1L)).parallelSplitSupported()) {


I'm a bit concerned that invoking responseTransformer.split may have implications, for example, involving a service call (they are harmless in ou implementations today, but we can't guarantee future implementations or custom implementations).

Is there another way?

zoewangg · 2026-06-26T18:08:26Z


    private final Map<Integer, ByteBuffer> buffers;

+    private final UnaryOperator<ResponseT> responseMapper;


Question: don't we need to update FileAsyncResponseTransfomer as well?

zoewangg · 2026-06-26T18:10:08Z

+     * @return full-object response with total content-length, full content-range,
+     *         and checksum values nulled if checksum type is COMPOSITE
+     */
+    public static GetObjectResponse toFullObjectResponse(GetObjectResponse firstPartResponse) {


Should we include other fields such as etag, version ID etc if they are present?

jencymaryjoseph added 3 commits June 25, 2026 10:19

Add response rewrite infrastructure for multipart download metadata

c3252d8

Fix presigned URL multipart download response metadata and 416 fallback

2397de3

Fix normal multipart download response metadata

c24d9db

jencymaryjoseph requested a review from a team as a code owner June 25, 2026 17:50

zoewangg reviewed Jun 25, 2026

View reviewed changes

jencymaryjoseph added 3 commits June 26, 2026 09:29

Consolidate normal path integration tests with parameterized tests

587e5c6

Address review: Builder ctor, split(config,mapper) interface method, …

d9bc3d6

…parameterized tests

Fix transfer manager presigned serial wrapper to use correct progress…

9cf092f

… wrapper

jencymaryjoseph requested a review from zoewangg June 26, 2026 17:04

jencymaryjoseph commented Jun 26, 2026

View reviewed changes

zoewangg reviewed Jun 26, 2026

View reviewed changes


		private final Map<Integer, ByteBuffer> buffers;

		private final UnaryOperator<ResponseT> responseMapper;

Uh oh!

Conversation

jencymaryjoseph commented Jun 25, 2026

Motivation and Context

Modifications

Common infrastructure (sdk-core + MultipartDownloadUtils)

Presigned URL path

Normal (non-presigned) path

Testing

Screenshots (if appropriate)

Types of changes

Checklist

License

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants