Skip to content

[DISCUSS][FLIP-597][filesystem] Add common object storage stream abstractions#28547

Draft
Izeren wants to merge 1 commit into
apache:masterfrom
Izeren:FLIP-597/common-fs-abstractions
Draft

[DISCUSS][FLIP-597][filesystem] Add common object storage stream abstractions#28547
Izeren wants to merge 1 commit into
apache:masterfrom
Izeren:FLIP-597/common-fs-abstractions

Conversation

@Izeren

@Izeren Izeren commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

This PR provides a reference implementation of FLIP-597 to accompany the FLIP discussion on the mailing list: https://lists.apache.org/thread/npjcpx3o73pg7b4p1hg073sqnj49b66j

FLIP-597 is the umbrella FLIP for supporting Hadoop-less filesystems within Flink that rely solely on cloud SDKs. This PR adds the common, cloud-agnostic stream abstractions defined in the FLIP, placed in flink-core so they are written, tested, and maintained once. These abstractions are then shared by cloud-specific filesystem implementations (AWS S3, Azure ABFS, GCS).

The purpose of sharing this code early is to:

  • Allow the community to review the concrete API design alongside the FLIP text
  • Enable contributors working on GCS or S3 native implementations to try the abstractions and provide feedback
  • Demonstrate how cloud-specific filesystems compose with these interfaces (see the FLIP for Azure and S3 examples)

All new types are annotated @Internal and @Experimental. No existing public APIs are modified.

Brief change log

  • Added ReadContext and WriteContext — immutable context descriptors for read/write operations
  • Added InputStreamOpener and OutputStreamOpener — cloud-agnostic functional interfaces for opening streams
  • Added InputStreamExtension and BufferingInputStreamExtension — extension point for customizing stream opening (e.g., buffering, decryption, compression)
  • Added RawAndWrappedInputStreams — value class pairing raw and wrapped streams for lifecycle management
  • Added ObjectStorageInputStream — thread-safe FSDataInputStream with lazy initialization, seek optimization (read-and-discard vs close-and-reopen), and composable extensions
  • Added ObjectStorageOutputStream — thread-safe FSDataOutputStream with commit-on-close semantics
  • All types placed in org.apache.flink.core.fs alongside existing FileSystem and FSDataInputStream

Verifying this change

  • Unit tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no (all @Internal @Experimental)
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs (all public types have Javadoc; user-facing docs will follow with cloud-specific FLIPs)

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code

Generated-by: Claude Code

@Izeren Izeren changed the title [DRAFT][FLIP-597][filesystem] Add common object storage stream abstractions [DISCUSS][FLIP-597][filesystem] Add common object storage stream abstractions Jun 25, 2026
@flinkbot

flinkbot commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants