Fix multipart download response metadata for presigned URL and normal paths#7077
Conversation
| } | ||
|
|
||
| @Test | ||
| void multipartDownload_checksumModeEnabled_hasCorrectFullObjectMetadata() throws Exception { |
| } | ||
|
|
||
| @Test | ||
| void multipartDownload_toBytes_smallObject_hasCorrectFullObjectMetadata() throws Exception { |
There was a problem hiding this comment.
Can we consolidate this with multipartDownload_toFile_hasCorrectFullObjectMetadata using parameterized tests?
| } | ||
|
|
||
| @Test | ||
| void getObject_withRangeRequest_preservesPartialMetadata() throws Exception { |
There was a problem hiding this comment.
Same here, let's try to consolidate tests with parameterized tests
| } | ||
|
|
||
| @Test | ||
| void getObject_mpuObjectWithChecksumMode_hasCorrectMetadata() throws Exception { |
There was a problem hiding this comment.
Same here. How is checksum mode special?
| } | ||
|
|
||
| // Helper methods | ||
| private static void uploadMpuObjectWithChecksum() { |
There was a problem hiding this comment.
Checksum should be enabled by default, any reason we need to upload it with checksum?
There was a problem hiding this comment.
If an object is MPU without checksumMode enabled, S3 doesnt return checksum.
If an object is MPU with checksumMode enabled, S3 doesnt returns FULL_OBJECT checksum.
And if uploaded with checksum enabled and with an explicit checksum algorithm like .checksumAlgorithm(ChecksumAlgorithm.CRC32)) S3 returns COMPOSITE checksum.
| if (transformer instanceof ByteArrayAsyncResponseTransformer) { | ||
| return (SplitResult<GetObjectResponse, T>) | ||
| ((ByteArrayAsyncResponseTransformer<GetObjectResponse>) transformer).split(splitConfig, mapper); | ||
| } |
There was a problem hiding this comment.
Any reason we have special logic for ByteArrayAsyncResponseTransformer? ByteArrayAsyncResponseTransformer is an internal API and not supposed to be used across modules
There was a problem hiding this comment.
Oh yeah, removed the instanceof and added split(config, mapper) to the AsyncResponseTransformer interface (with ByteArrayAsyncResponseTransformer overriding it). splitWithResponseRewrite() now just calls transformer.split(splitConfig, mapper)
| if (cause instanceof EmptyObjectRangeNotSatisfiableException) { | ||
| // Parallel path wraps it as EmptyObjectRangeNotSatisfiableException; | ||
| // serial path (toBytes, custom transformers) surfaces raw S3Exception. | ||
| if (cause instanceof EmptyObjectRangeNotSatisfiableException |
There was a problem hiding this comment.
Question: what is EmptyObjectRangeNotSatisfiableException?
There was a problem hiding this comment.
EmptyObjectRangeNotSatisfiableException is an internal exception created by the parallel subscriber when it gets a 416 from S3 on a ranged request to an empty object. The serial path doesnt go through the subscriber, so the raw 416 S3Exception arrives without being wrapped. Planning to remove this exception class as a follow up and just use isRangeNotSatisfiable() for all paths.
| UnaryOperator.identity()); | ||
| } | ||
|
|
||
| private SplittingTransformer(AsyncResponseTransformer<ResponseT, ResultT> upstreamResponseTransformer, |
There was a problem hiding this comment.
Can we update this ctor to take a Builder parameter? That way, we don't need to create a new ctor.
| ? progressUpdater.wrapForNonSerialFileDownload( | ||
| responseTransformer, GetObjectRequest.builder().build()) | ||
| : progressUpdater.wrapResponseTransformer(responseTransformer); | ||
| if (isS3ClientMultipartEnabled() |
There was a problem hiding this comment.
Fixes test failure for bytesTransferred not firing for presigned toBytes multipart downloads.
That path was routed to wrapForNonSerialFileDownload, which only counts bytes inside its split() override, but the serial download splits and drives onStream directly, bypassing it. Now routed by parallelSplitSupported() so serial toBytes uses wrapResponseTransformerForMultipartDownload (counts in onStream), mirroring the regular download path
| * Creates a {@link SplitResult} with a response mapper applied at the upstream {@code onResponse} delivery point. | ||
| */ | ||
| @SdkInternalApi | ||
| default SplitResult<ResponseT, ResultT> split(SplittingTransformerConfiguration splitConfig, |
There was a problem hiding this comment.
IMO all public methods in a public API class are inherently public APIs, so we can't really add SdkInternalApi. Should we consider folding responseMapper into SplittingTransformerConfiguration. That way, we don't have to introduce another method
| this(upstreamResponseTransformer, resultFuture, UnaryOperator.identity()); | ||
| } | ||
|
|
||
| public ByteArraySplittingTransformer(AsyncResponseTransformer<ResponseT, ResponseBytes<ResponseT>> |
There was a problem hiding this comment.
Why do we need to new ctor? can we just add a new parameter?
| : progressUpdater.wrapResponseTransformer(responseTransformer); | ||
| if (isS3ClientMultipartEnabled() | ||
| && presignedDownloadRequest.presignedUrlDownloadRequest().range() == null) { | ||
| if (responseTransformer.split(b -> b.bufferSizeInBytes(1L)).parallelSplitSupported()) { |
There was a problem hiding this comment.
I'm a bit concerned that invoking responseTransformer.split may have implications, for example, involving a service call (they are harmless in ou implementations today, but we can't guarantee future implementations or custom implementations).
Is there another way?
|
|
||
| private final Map<Integer, ByteBuffer> buffers; | ||
|
|
||
| private final UnaryOperator<ResponseT> responseMapper; |
There was a problem hiding this comment.
Question: don't we need to update FileAsyncResponseTransfomer as well?
| * @return full-object response with total content-length, full content-range, | ||
| * and checksum values nulled if checksum type is COMPOSITE | ||
| */ | ||
| public static GetObjectResponse toFullObjectResponse(GetObjectResponse firstPartResponse) { |
There was a problem hiding this comment.
Should we include other fields such as etag, version ID etc if they are present?
Motivation and Context
When the S3 multipart async client downloads a large object in multiple part requests (partNumber for normal, ranged GETs for presigned URLs), the response metadata exposed to the customer reflects only the first part — not the full object. Customers see incorrect
contentLength(part size instead of total), a partialcontentRange, and meaninglesscomposite checksum values.
Modifications
The fix has two prongs because there are two download paths with different architectures:
resultFuture.complete()is called. We rewrite the response just before completing the future.SplittingTransformerwhich calls the customer'sonResponse()with the first part'sresponse. We inject a
responseMapperinto the splitting infrastructure that rewrites theresponse at the
onResponse()delivery point — before the customer ever sees it.Both paths use the same
toFullObjectResponse()function to do the actual rewrite.Common infrastructure (sdk-core + MultipartDownloadUtils)
Shared rewrite logic and the mechanism to inject it into the splitting infrastructure,
used by both presigned and normal paths.
responseMapper(UnaryOperator) toSplittingTransformerandByteArraySplittingTransformer. When the splitting infrastructure delivers the first part'sresponse to the customer's transformer via
onResponse(), the mapper rewrites it first.This is the injection point for the serial path (toBytes, custom transformers) — without it,
the customer's
onResponse()callback would see raw per-part metadata with no way to fix itafter the fact.
toFullObjectResponse()— the rewrite function itself. Takes the first part's responseand produces what a single non-multipart
GetObjectwould have returned:contentLength→ total object size (parsed from Content-Range)contentRange→bytes 0-(total-1)/totalper-part hashes that cannot be validated against the full object)
splitWithResponseRewrite()— convenience method that callssplit()withtoFullObjectResponsepre-configured as the mapper. Used by bothDownloadObjectHelperand
PresignedUrlDownloadHelperon their serial paths.Presigned URL path
The parallel subscriber rewrites the response before completing the future; the serial path
injects the rewrite via the responseMapper.
ParallelPresignedUrlMultipartDownloaderSubscribercallstoFullObjectResponse()before completing the result future.
PresignedUrlDownloadHelperusessplitWithResponseRewrite()so the mapperfires at
onResponse()delivery.S3Exceptionwithstatus 416. The serial path (via SplittingTransformer) surfaces the raw exception directly
without wrapping it — so the original catch on
EmptyObjectRangeNotSatisfiableExceptionalonenever matched, causing the fallback to be skipped entirely for custom transformers.
Normal (non-presigned) path
Same two-prong approach as presigned — parallel rewrites at future completion, serial
injects via responseMapper.
ParallelMultipartDownloaderSubscribercallstoFullObjectResponse()beforecompleting the result future.
DownloadObjectHelperusessplitWithResponseRewrite().Testing
MultipartDownloadUtilsTest— 11 tests coveringtoFullObjectResponse()(content-length/range rewrite, checksum nulling for COMPOSITE, preservation for FULL_OBJECT,
no-op when Content-Range is absent).
S3MultipartClientGetObjectWiremockTest— custom transformer receives full-object metadataPresignedUrlMultipartDownloaderSubscriberWiremockTest— 416 fallback works for customtransformers (fails without the fix)
AsyncPresignedUrlExtensionTestSuite— presigned toBytes/toFile metadata assertionsS3MultipartClientFileDownloadIntegrationTest— normal toFile + checksumMode assertionsCustomTransformerMultipartIntegrationTest— custom transformer sees correct metadataand nulled composite checksums
Screenshots (if appropriate)
Types of changes
Checklist
mvn installsucceedsscripts/new-changescript and following the instructions. Commit the new file created by the script in.changes/next-releasewith your changes.License