From f429bf7d05198448a2be1cbe2a1f50254c3bb12b Mon Sep 17 00:00:00 2001 From: David Whittington Date: Sun, 17 Aug 2025 11:27:07 -0500 Subject: [PATCH 1/6] docs: add chunk retrieval architecture diagrams (PE-8468) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add PlantUML diagrams documenting chunk retrieval architecture: - chunk-source-priority.puml: Shows fallback order of chunk sources - chunk-component-architecture.puml: Shows component relationships Update CLAUDE.md to ensure diagrams are maintained as code changes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- CLAUDE.md | 1 + .../src/chunk-component-architecture.puml | 32 +++++++++++++++++++ docs/diagrams/src/chunk-source-priority.puml | 27 ++++++++++++++++ 3 files changed, 60 insertions(+) create mode 100644 docs/diagrams/src/chunk-component-architecture.puml create mode 100644 docs/diagrams/src/chunk-source-priority.puml diff --git a/CLAUDE.md b/CLAUDE.md index 38451d61a..6a1a0456e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -5,6 +5,7 @@ - Keep glossary definitions concise and focused on concepts rather than implementation details - Organize new terms into the appropriate existing sections - When modifying code, add or improve JSDoc comments where possible to enhance documentation +- Chunk retrieval architecture diagrams are in `docs/diagrams/src/chunk-source-priority.puml` and `docs/diagrams/src/chunk-component-architecture.puml` - update these when making changes to chunk sources, caching behavior, or retrieval flow ## Releases diff --git a/docs/diagrams/src/chunk-component-architecture.puml b/docs/diagrams/src/chunk-component-architecture.puml new file mode 100644 index 000000000..0b3663739 --- /dev/null +++ b/docs/diagrams/src/chunk-component-architecture.puml @@ -0,0 +1,32 @@ +@startuml +skinparam backgroundColor white +skinparam componentStyle rectangle + +title Chunk Retrieval Component Architecture + +component "Client" as C + +package "AR.IO Node" { + component "Express Handler" as H + component "SQLite DB" as DB + component "Composite Source" as CS + database "File Cache" as FC +} + +cloud "External Sources" { + component "AR.IO Peers" as P + component "Arweave" as A + component "S3" as S +} + +C --> H: HTTP Request +H --> DB: Lookup TX +H --> CS: Get chunk +CS --> FC: Check cache +CS ..> P: Fetch +CS ..> A: Fetch +CS ..> S: Fetch + +note right of CS: Configurable parallelism\n(e.g., 1-3 concurrent) + +@enduml \ No newline at end of file diff --git a/docs/diagrams/src/chunk-source-priority.puml b/docs/diagrams/src/chunk-source-priority.puml new file mode 100644 index 000000000..32590c585 --- /dev/null +++ b/docs/diagrams/src/chunk-source-priority.puml @@ -0,0 +1,27 @@ +@startuml +skinparam backgroundColor white + +title Chunk Retrieval Source Priority + +left to right direction + +rectangle "Client Request" as CR + +rectangle "AR.IO Node" { + rectangle "1. Local Cache" as LC #90EE90 + rectangle "2. AR.IO Peers" as AP #87CEEB + rectangle "3. Arweave Network" as AN #FFB6C1 + rectangle "4. Legacy S3" as S3 #FFFFE0 +} + +CR --> LC: First +LC --> AP: If miss +AP --> AN: If fail +AN --> S3: Last resort + +note bottom of LC: Fastest\n(local disk) +note bottom of AP: Fast\n(nearby nodes) +note bottom of AN: Slower\n(blockchain) +note bottom of S3: Backup\n(cloud storage) + +@enduml \ No newline at end of file From 97ab1caf8845b25d4a1f1c4d2ba702917c2e3f49 Mon Sep 17 00:00:00 2001 From: David Whittington Date: Mon, 18 Aug 2025 15:52:33 -0500 Subject: [PATCH 2/6] docs: improve chunk component architecture diagram clarity (PE-8468) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Double diagram size for better readability (DPI 400) - Update component names for accuracy: - "SQLite DB" → "TX Offset Index" - "File Cache" → "Local Cache" - "AR.IO Peers" → "AR.IO Network" - "Arweave" → "Arweave Network" - Add separate Browser and AR.IO Peer client components - Update abbreviations for consistency (TOI, LC, ARIO, AR) - Change "Lookup TX" to "Lookup offset" for clarity 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- .../src/chunk-component-architecture.puml | 27 ++++++++++--------- 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/docs/diagrams/src/chunk-component-architecture.puml b/docs/diagrams/src/chunk-component-architecture.puml index 0b3663739..fd63978cf 100644 --- a/docs/diagrams/src/chunk-component-architecture.puml +++ b/docs/diagrams/src/chunk-component-architecture.puml @@ -1,31 +1,34 @@ @startuml +skinparam dpi 400 skinparam backgroundColor white skinparam componentStyle rectangle title Chunk Retrieval Component Architecture -component "Client" as C +component "Browser" as B +component "AR.IO Peer" as P package "AR.IO Node" { component "Express Handler" as H - component "SQLite DB" as DB + component "TX Offset Index" as TOI component "Composite Source" as CS - database "File Cache" as FC + database "Local Cache" as LC } cloud "External Sources" { - component "AR.IO Peers" as P - component "Arweave" as A - component "S3" as S + component "AR.IO Network" as ARIO + component "Arweave Network" as AR + component "S3" as S3 } -C --> H: HTTP Request -H --> DB: Lookup TX +B --> H: HTTP Request +P --> H: HTTP Request +H --> TOI: Lookup offset H --> CS: Get chunk -CS --> FC: Check cache -CS ..> P: Fetch -CS ..> A: Fetch -CS ..> S: Fetch +CS --> LC: Check cache +CS ..> ARIO: Fetch +CS ..> AR: Fetch +CS ..> S3: Fetch note right of CS: Configurable parallelism\n(e.g., 1-3 concurrent) From c2783fa807ff13482e321c9d919316738c7f2b60 Mon Sep 17 00:00:00 2001 From: David Whittington Date: Mon, 18 Aug 2025 16:04:51 -0500 Subject: [PATCH 3/6] docs: rename and improve chunk retrieval cascade diagram (PE-8468) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename diagram file from chunk-source-priority to chunk-retrieval-cascade - Update title to "Chunk Retrieval Cascade" for better clarity - Set DPI to 300 for improved readability - Clarify cache types: "in-memory & disk" instead of just "local disk" - Distinguish AR.IO network (frequently used) from Arweave (complete set) - Add rebroadcast flow from S3 back to Arweave network 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- ...urce-priority.puml => chunk-retrieval-cascade.puml} | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) rename docs/diagrams/src/{chunk-source-priority.puml => chunk-retrieval-cascade.puml} (68%) diff --git a/docs/diagrams/src/chunk-source-priority.puml b/docs/diagrams/src/chunk-retrieval-cascade.puml similarity index 68% rename from docs/diagrams/src/chunk-source-priority.puml rename to docs/diagrams/src/chunk-retrieval-cascade.puml index 32590c585..cae24d699 100644 --- a/docs/diagrams/src/chunk-source-priority.puml +++ b/docs/diagrams/src/chunk-retrieval-cascade.puml @@ -1,7 +1,8 @@ @startuml +skinparam dpi 300 skinparam backgroundColor white -title Chunk Retrieval Source Priority +title Chunk Retrieval Cascade left to right direction @@ -18,10 +19,11 @@ CR --> LC: First LC --> AP: If miss AP --> AN: If fail AN --> S3: Last resort +S3 ..> AN: Rebroadcast -note bottom of LC: Fastest\n(local disk) -note bottom of AP: Fast\n(nearby nodes) -note bottom of AN: Slower\n(blockchain) +note bottom of LC: Fastest\n(in-memory & disk) +note bottom of AP: Fast\n(frequently used) +note bottom of AN: Slower\n(complete set) note bottom of S3: Backup\n(cloud storage) @enduml \ No newline at end of file From 811c95449a7f36784229db605da615d6207b80df Mon Sep 17 00:00:00 2001 From: David Whittington Date: Mon, 18 Aug 2025 16:28:25 -0500 Subject: [PATCH 4/6] docs: add contiguous data retrieval cascade diagram (PE-8468) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add new diagram showing contiguous data source hierarchy - Shows 5-tier fallback system: Local Cache → Trusted Gateways → Chunk Reconstruction → TX Data → AR.IO Network - Consistent with chunk retrieval cascade diagram naming and style - Set to 200 DPI for optimal display size - Based on actual data source configuration and retrieval flow 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- .../contiguous-data-retrieval-cascade.puml | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 docs/diagrams/src/contiguous-data-retrieval-cascade.puml diff --git a/docs/diagrams/src/contiguous-data-retrieval-cascade.puml b/docs/diagrams/src/contiguous-data-retrieval-cascade.puml new file mode 100644 index 000000000..9c47af97e --- /dev/null +++ b/docs/diagrams/src/contiguous-data-retrieval-cascade.puml @@ -0,0 +1,31 @@ +@startuml +skinparam dpi 200 +skinparam backgroundColor white + +title Contiguous Data Retrieval Cascade + +left to right direction + +rectangle "Data Request" as DR + +rectangle "AR.IO Node" { + rectangle "1. Local Cache" as LC #90EE90 + rectangle "2. Trusted Gateways" as TG #87CEEB + rectangle "3. Chunk Reconstruction" as CR #FFB6C1 + rectangle "4. TX Data" as TD #DDA0DD + rectangle "5. AR.IO Network" as AN #FFFFE0 +} + +DR --> LC: First +LC --> TG: If miss +TG --> CR: If fail +CR --> TD: If fail +TD --> AN: If fail + +note bottom of LC: Fastest\n(verified data) +note bottom of TG: Fast\n(external gateways) +note bottom of CR: Reliable\n(from chunks) +note bottom of TD: Direct\n(Arweave nodes) +note bottom of AN: Fallback\n(AR.IO peers) + +@enduml \ No newline at end of file From 150e486c2c35b1f9f7c7b29e61d410b26b83b27c Mon Sep 17 00:00:00 2001 From: David Whittington Date: Mon, 18 Aug 2025 16:35:39 -0500 Subject: [PATCH 5/6] docs: add Release 47 contiguous data retrieval sequence diagram (PE-8468) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add sequence diagram showing new efficient data retrieval flow - Uses root parent offset to skip bundle hierarchy traversal - Shows offset source can be local index or network-based - Demonstrates direct chunk fetching using calculated offset + range - Includes caching and streaming response flow - Set to 200 DPI for consistent display sizing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- .../release-47-contiguous-data-retrieval.puml | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 docs/diagrams/src/release-47-contiguous-data-retrieval.puml diff --git a/docs/diagrams/src/release-47-contiguous-data-retrieval.puml b/docs/diagrams/src/release-47-contiguous-data-retrieval.puml new file mode 100644 index 000000000..65c737395 --- /dev/null +++ b/docs/diagrams/src/release-47-contiguous-data-retrieval.puml @@ -0,0 +1,34 @@ +@startuml +skinparam dpi 200 +skinparam backgroundColor white + +title Release 47 Contiguous Data Retrieval + +actor Client +participant "AR.IO Node" as Node +database "Local Cache" as Cache +participant "Offset Source" as Offset +participant "Chunk Sources" as Chunks +participant "Root Transaction" as Root + +Client -> Node: GET /tx/{id}/data +Node -> Cache: Check local cache +Cache --> Node: Cache miss + +Node -> Offset: Get root parent offset +Offset --> Node: Root transaction ID + offset + +note right of Offset: Could be local index\nor found in network\n(AR.IO peers, gateways) + +Node -> Root: Get transaction metadata +Root --> Node: TX size, data root + +Node -> Chunks: Fetch chunks for range\n(using offset + requested range) +Chunks --> Node: Chunk data + +Node -> Node: Assemble chunks into\ncontiguous data + +Node -> Cache: Store assembled data +Node -> Client: Stream response + +@enduml \ No newline at end of file From 123bfe90254f4acd95aa41df4fc24cf6b274f6a2 Mon Sep 17 00:00:00 2001 From: David Whittington Date: Mon, 18 Aug 2025 16:53:46 -0500 Subject: [PATCH 6/6] docs: add Release 47 bundle header parsing sequence diagram (PE-8468) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add alternative approach using on-demand ANS-104 header parsing - Shows root ID source providing only transaction ID (no pre-calculated offsets) - Demonstrates dynamic offset calculation by parsing bundle headers - Trades performance for reduced storage requirements - Maintains same chunk fetching and assembly flow as pre-calculated approach - Set to 200 DPI for consistent display sizing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- .../src/release-47-bundle-header-parsing.puml | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 docs/diagrams/src/release-47-bundle-header-parsing.puml diff --git a/docs/diagrams/src/release-47-bundle-header-parsing.puml b/docs/diagrams/src/release-47-bundle-header-parsing.puml new file mode 100644 index 000000000..7e642958e --- /dev/null +++ b/docs/diagrams/src/release-47-bundle-header-parsing.puml @@ -0,0 +1,39 @@ +@startuml +skinparam dpi 200 +skinparam backgroundColor white + +title Release 47 Bundle Header Parsing + +actor Client +participant "AR.IO Node" as Node +database "Local Cache" as Cache +participant "Root ID Source" as RootID +participant "Bundle Parser" as Parser +participant "Chunk Sources" as Chunks +participant "Root Transaction" as Root + +Client -> Node: GET /tx/{id}/data +Node -> Cache: Check local cache +Cache --> Node: Cache miss + +Node -> RootID: Get root transaction ID +RootID --> Node: Root transaction ID + +note right of RootID: Only stores root ID,\nnot pre-calculated offsets + +Node -> Root: Get root transaction header +Root --> Node: Bundle header data + +Node -> Parser: Parse ANS-104 bundle headers +Parser -> Parser: Calculate offset for data item +Parser --> Node: Calculated offset + +Node -> Chunks: Fetch chunks for range\n(using calculated offset + range) +Chunks --> Node: Chunk data + +Node -> Node: Assemble chunks into\ncontiguous data + +Node -> Cache: Store assembled data +Node -> Client: Stream response + +@enduml \ No newline at end of file