feat(arns): resolve ArNS names to IPFS CIDs (ArNS→IPFS, Phase 2)#793
feat(arns): resolve ArNS names to IPFS CIDs (ArNS→IPFS, Phase 2)#793vilenarios wants to merge 31 commits into
Conversation
Windowed GraphQL `transactions` queries against the ClickHouse-backed indexer could return a partial page with `pageInfo.hasNextPage = false` and no error, silently stranding every subsequent page — a cursor client paging until `hasNextPage` is false would under-report results for dense wallets/ranges. Root cause: the CH legs fetch `pageSize + 1` rows with a full-key `LIMIT 1 BY height, block_transaction_index, is_data_item, id`, but the composite then dedups by `id` alone. Stale rows that share an `id` and height while differing on `block_transaction_index` (a placeholder bti=0 left by an earlier indexing pass alongside the real bti) are not folded by the SQL `LIMIT 1 BY`, yet collapse in the id-dedup. When that shrinks the merged set to `pageSize` or fewer, `edges.length > pageSize` reads false even though the leg came back full. Derive `hasNextPage` from each leg's raw result before the cross-leg id-dedup: a CH leg returning more than `pageSize` rows, or the SQLite leg's own `hasNextPage`, means more matching rows exist. Erring toward true is safe — the worst case is one extra page fetch that comes back empty. No SQL or cursor changes. Adds a regression test reproducing the production shape (same id, bti 0 vs 12, both data items) collapsing a full page to `pageSize`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The function-level TSDoc still described the pre-fix behavior (hasNextPage computed against the deduped edge list) as intentional — that was the PE-9124 bug. Update it to describe deriving hasNextPage from each leg's raw result before id-dedup. Addresses CodeRabbit review feedback on PR #792. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ANT records now carry a targetProtocol (0=Arweave, 1=IPFS) and the record
target can be an IPFS CID instead of an Arweave TX ID (@ar.io/sdk 4.0.0).
Previously the on-demand resolver read only transactionId and validated it
as a 43-char Arweave ID, so a CID-targeted name failed to resolve.
- on-demand resolver: read targetProtocol; validate the target as a CID
when protocol is IPFS, else as an Arweave ID; surface protocol on the
resolution (cached transparently as part of NameResolution).
- NameResolution: optional protocol field ('arweave' | 'ipfs'); undefined
treated as 'arweave' for backward compat (e.g. trusted-gateway hops).
- arns middleware: when a name resolves to an IPFS CID and IPFS serving is
enabled, hand off to the IPFS handler (sets ipfsCid/ipfsPath, mirroring
the IPFS subdomain middleware) instead of the Arweave data handler.
Completes the 'ArNS -> IPFS CID' phase the IPFS PR was foundation for.
Verified: typecheck + lint clean, resolver unit tests pass, and live
on-demand Solana resolution of existing names still serves via Arweave.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(gql): truthful hasNextPage when id-dedup collapses a full ClickHouse page
The turbo-s3 on-demand data source shared the default awsClient (configured from AWS_REGION/AWS_ENDPOINT and the default credentials), so it could not target a separate AWS account or endpoint. Add a turboAwsClient that is instantiated as its own awsLite client only when BOTH TURBO_AWS_REGION and TURBO_AWS_ENDPOINT are set; otherwise it references the existing awsClient, preserving current behavior. Credentials follow the same paradigm: TURBO_AWS_ACCESS_KEY_ID / TURBO_AWS_SECRET_ACCESS_KEY / TURBO_AWS_SESSION_TOKEN are used when provided and otherwise fall back to the default AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN; when none are set, aws-lite resolves credentials from the ambient provider chain (e.g. IAM role), matching the default client. On init failure it falls back to the shared awsClient. Wire turboS3DataSource to use turboAwsClient and document the new vars in docs/envs.md and docker-compose.yaml. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final-review hardening of the ArNS->IPFS feature:
- fix(cache-control): ArNS->IPFS responses no longer send immutable/1-year.
The IPFS handler only sets `immutable` for direct /ipfs/{CID} (and {CID}.host)
requests; when reached via an ArNS name (mutable name->CID binding) it keeps
the ArNS-TTL Cache-Control the ArNS middleware set, so a record repoint isn't
pinned in caches for ~a year (cf. PE-9072).
- feat(headers): emit signed `X-ArNS-Protocol: arweave|ipfs` on resolutions and
add `protocol` (+ `resolvedId`) to the /ar-io/resolver/:name JSON, so clients
know whether X-ArNS-Resolved-Id is a TX ID or a CID. Added x-arns-protocol to
TRIGGER_HEADERS so it's part of the signature.
- feat(httpsig): body-bind IPFS responses with RFC 9530 Content-Digest. The
SHA-256 is computed at cache-write time and emitted on cache hits (in
CO_SIGNABLE_HEADERS, so HTTPSIG signs it). Misses stream without it; the
signed ETag=CID still attests identity.
- test(ipfs): cache digest round-trip + legacy (digest-less) entry coverage.
- docs: rewrite the ipfs-integration.md Phase 2 section to the shipped
targetProtocol design (was speculative), glossary Target Protocol entry,
CLAUDE.md note. Documented the trusted-gateway-resolver protocol limitation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## PE-9067-add-ipfs-cid #793 +/- ##
========================================================
+ Coverage 78.32% 78.69% +0.36%
========================================================
Files 132 136 +4
Lines 49403 50845 +1442
Branches 3691 3809 +118
========================================================
+ Hits 38697 40010 +1313
- Misses 10658 10787 +129
Partials 48 48 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Compose defaults like `${TURBO_AWS_REGION:-}` pass an empty string rather than
undefined when unset on the host, so the previous `!== undefined` gate and `??`
credential fallback would wrongly enable a misconfigured dedicated Turbo client
and suppress fallback to the AWS_* credentials. Route all TURBO_AWS_* / AWS_*
reads through the existing env.varOrUndefined helper, which treats empty/
whitespace strings as unset. Also add TSDoc to hasTurboAwsConfig and
turboAwsClient.
Addresses CodeRabbit review on PR #794.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
Warning Review limit reached
More reviews will be available in 10 minutes and 37 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (13)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The ArNS->IPFS routing decision hinges on classifying an ANT record's target as arweave vs ipfs and validating the id accordingly. Extracted that logic from OnDemandArNSResolver into a pure, SDK-free helper (classifyResolvedTarget) and unit-tested it: arweave/ipfs by targetProtocol, undefined+unknown protocol -> arweave (fail-closed), CIDv0/v1 acceptance, and cross-format rejection (CID under arweave, TX id under ipfs, garbage). The ArNS middleware routing itself can't be unit-tested in isolation (it imports system.ts, booting the DI graph — no middleware has unit tests for this reason); it stays covered by live e2e. Also documented the three root/apex cases in ipfs-integration.md: a name's @ record and apex-via-APEX_ARNS_NAME route to IPFS; apex-via-APEX_TX_ID is Arweave-only (bypasses protocol routing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(s3): optional dedicated Turbo AWS client via TURBO_AWS_* vars
…ection Owner-filtered GraphQL `transactions` queries (the ArDrive UserDriveEntities pattern) scan tens of millions of rows because `transactions` is height- ordered while the filter is on `owner_address`, so a sparse owner's rows scatter across the full height range and finding a page trips `max_rows_to_read` (Code 158) — measured 12.1M rows for an owner with only 22k rows / 46 drives. Route eligible owner queries through the owner-ordered `owner_projection` by emitting `optimize_use_projections = 1, optimize_read_in_order = 0`, which lets the optimizer seek the owner's contiguous slice and sort the small matched set in memory (measured 12.1M -> 451K rows). A reactive height- windowing fallback retries on Code 158 for whale owners whose footprint still exceeds the cap. Gated by CLICKHOUSE_GQL_OWNER_PROJECTION_ROUTING_ENABLED (default off; when off, queries plan exactly as before) and scoped to an Entity-Type allowlist (CLICKHOUSE_GQL_OWNER_PROJECTION_ENTITY_TYPES, default drive,folder,snapshot) so large-result `file` queries, bare-owner, and owner+other-tag queries are excluded. Verified against ClickHouse: the routed query reads 451K rows and returns the correct page, and multi-page cursor pagination reproduces the canonical ordered result exactly (no dup/gap, correct hasNextPage). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The new owner-projection routing vars were added to config.ts and docs/envs.md but not to docker-compose.yaml's core `environment:` allowlist, so setting them in `.env` had no effect — the container never received them and the feature stayed off. Add both vars alongside the other CLICKHOUSE_GQL_* passthroughs (CLAUDE.md: keep compose and envs.md in sync). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ownerProjectionApplies now requires that ALL tags are Entity-Type tags, so owner+other-tag shapes (e.g. owner + Entity-Type=drive + App-Name) fall back to the default plan instead of routing an untested shape through the projection (matches the documented allowlist contract). - The windowed fallback drains a dense height window via a running cursor before advancing the frontier. Previously a window returning only pageSize+1 raw rows whose dedup collapsed some ids would advance past the slice and strand the unique rows below it (short pages / wrong hasNextPage). - Refresh the findings-doc status (implemented + env-gated + canary-validated) and drop public gateway names from it. Adds tests for both behaviors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(gql): route owner-filtered ClickHouse queries through owner_projection
These six guides (~5,462 lines) were added in one batch on 2025-07-01 (43318d8, "docs: add comprehensive AI-generated documentation"). They were never linked from docs/INDEX.md, never reviewed, and never corrected in the ~year since. An audit against the code found them partially fabricated while presenting as authoritative "Complete Technical Documentation": invented env vars (CHUNK_POST_URLS — real is PREFERRED_CHUNK_POST_NODE_URLS; SECONDARY_CHUNK_POST_MIN_SUCCESS_COUNT; CHUNK_POST_TIMEOUT_MS — real are CHUNK_POST_RESPONSE_TIMEOUT_MS/_ABORT_TIMEOUT_MS), a non-existent "secondary broadcast tier", and wrong defaults — including copy-pasteable .env examples referencing vars that do nothing. They were also the only place the now-removed dead ARWEAVE_PEER_CHUNK_POST_* vars were "documented". Orphaned + unreviewed + net-misleading. Removing the series; accurate, maintained docs live under docs/ (indexed by docs/INDEX.md). Removed: - ar-io-01-architecture-overview.md - ar-io-02-data-retrieval-complete-guide.md - ar-io-03-arweave-connectivity-complete-guide.md - ar-io-04-arns-name-resolution-system.md - ar-io-05-centralization-analysis.md - ar-io-06-database-architecture.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XQPK4TXcVoXoyFp6sNW2Lr
These three vars (MIN_SUCCESS_COUNT, MAX_PEER_ATTEMPT_COUNT, CONCURRENCY_LIMIT) were introduced in PE-7945 (f0d127a) for the original background peer-broadcast path, which has since been superseded by ArweaveCompositeClient.broadcastChunk and the live CHUNK_POST_* family. They had zero code consumers but were still plumbed through docker-compose.yaml, so an operator setting them got silence. Worse, the startup validation (MAX < MIN throws) could crash boot on a "tuning" attempt that otherwise did nothing. Remove the consts + validation and the compose passthrough. (The only docs referencing these vars were the orphaned AI-generated ar-io-0X drafts, removed wholesale in a separate PR.) Verified: zero remaining code references; config test 20/20; eslint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XQPK4TXcVoXoyFp6sNW2Lr
TxChunksDataSource's full-stream read loop terminated only on byte
accounting (bytes < size), advancing by chunkData.chunk.length. A source
or cache returning a zero-length chunk left bytes unchanged, so the loop
re-requested the same offset forever — observed in production as a
5.1M-span Honeycomb trace for tx QY5bDvdGa9Q_GcdxEFlvTJxUW5UPrSyvf6yULjbsv5g
(a 2805-byte, single-chunk L1 tx) made up of ~1.7M repeated ~0.1ms cache
hits on one offset.
Add two forward-progress guards in the read loop:
- abort on a zero-length chunk (primary cause)
- abort once the chunk count exceeds ceil(size / MAX_CHUNK_SIZE) + 1,
a backstop against pathological tx geometry
Both increment chunk_stream_aborts_total{reason} and destroy the stream
with a descriptive error instead of spinning.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Honeycomb trace 7fd5b41a (tx QY5bDvdGa9Q…, a 2805-byte single-chunk L1
tx) showed the first ReadThroughChunkDataCache.getChunkDataByAny span as a
cache HIT at relative_offset 0 with tx_size 2805, followed by ~1.7M
identical-offset reads. The chunk data store held a poisoned 0-byte file
for (dataRoot, 0): FsChunkDataStore.has() reports a hit on a 0-byte file
and get() serves an empty chunk, so the TxChunksDataSource stream loop
never advanced `bytes` and re-requested the same offset forever.
Root cause: nothing validated chunk length, so a source that once returned
an empty chunk was persisted (set() writes unconditionally) and re-served
indefinitely.
Harden every layer:
- FsChunkDataStore.set: refuse to persist zero-length chunks
- FsChunkDataStore.get / getByAbsoluteOffset: treat an existing 0-byte
file as a miss, so already-poisoned entries self-heal on next refetch
- ReadThroughChunkDataCache: reject a zero-length chunk from the source
(don't cache it; throw so the retrieval cascade falls through)
- TxChunksDataSource: only treat a zero-length chunk as fatal when size > 0
New metric chunk_zero_length_total{stage} tracks rejections at
source_fetch / cache_read / cache_write. Updates the prior store test that
asserted empty chunks round-trip (the poison contract).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… absolute-offset zero-length paths Address CodeRabbit review on #799: - Document chunkStreamAbortsTotal and chunkZeroLengthTotal (label meanings). - Add FsChunkDataStore tests: a poisoned 0-byte by-absolute-offset entry reads as a miss, and a zero-length set() creates no absolute-offset index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fix(chunks): reject and self-heal zero-length chunk cache poison
Multi-id transactions(ids:[...]) queries are ~99% of production TOO_MANY_ROWS failures: `id` is the last sort-key column so `id IN (...)` has no seek and relies on id_bloom, which lights up most granules (~100 ids reads ~308M rows). An owner filter alone isn't enough (~13.6M on the main table for 100 ids), but seeking the owner's slice via owner_projection drops it to ~556K (measured; verified end-to-end at 590K rows / 35ms under the 10M cap). ownerProjectionApplies now routes owners+ids through the projection regardless of tags (the id list bounds the result). optimize_read_in_order=0 is a no-op here (id queries carry no ORDER BY); the win is purely the owner seek. The height-windowing fallback is disabled for id queries (it is height-ordered and needs the cursor predicate, which id queries don't carry), so a whale owner (>10M footprint) + ids still surfaces the 158 — rare, no worse than before. Gated by the existing CLICKHOUSE_GQL_OWNER_PROJECTION_ROUTING_ENABLED flag. Reduces live failures once clients add owners to their id queries; genuinely ownerless batches still need the schema-level id_bloom / id-ordered-table fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Findings doc: the Implementation section's eligibility wording said `ids` must be absent, contradicting the owners+ids extension; describe both qualifying shapes and note the windowing fallback is no-id only. - envs.md: call out that owners+ids does NOT get the height-windowing retry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
feat(gql): extend owner_projection routing to owners+ids queries
…ry search
Cold data retrieved by absolute offset must first locate the block
containing that offset. The chain binary search did this with
~log2(height) sequential GET /block/height/{h} requests to the trusted
node (~1.5s each), frequently exceeding CHUNK_SERVE_DEADLINE_MS and
turning into 504s.
Add a local index over stable_blocks.weave_size (getBlockByWeaveOffset,
backed by the new stable_blocks_weave_size_idx) and consult it first in
ArweaveCompositeClient.binarySearchBlocks. The local result is trusted
only when the immediately-preceding block is present and ends before the
offset (a tight bracket, so no missing block can hide the true
container), and the fetched block's weave_size is re-verified; any gap,
stale index, lookup error, or unstable-tip offset falls back to the
existing chain binary search. This changes only how the block is found
-- the block returned is identical.
Resolves offset->block only. Per-transaction offsets remain chain-
authoritative (/tx/{id}/offset): per-tx weave offsets follow the block's
binary tx-ID sort order (not block_transaction_index) and v1 inline data
is not captured by data_size, so they cannot be derived from local
columns.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ERJ5pSmgZoLj4jEHhweq5B
…nk-post-config chore(config): remove dead ARWEAVE_PEER_CHUNK_POST_* env vars
…drafts docs: remove orphaned AI-generated ar-io-0X draft guides
…c, abort test - offsets.sql: add `b.height ASC` tiebreaker so an offset that lands exactly on a weave_size shared by consecutive empty blocks deterministically resolves to the lowest such block. This matches the chain binary search's smallest-height selection and keeps the local fast path available on exact end-of-block offsets (without it the bracket guard would reject the tie and fall back). - Add TSDoc to BlockByWeaveOffsetResult and both getBlockByWeaveOffset methods documenting the bracket semantics and the undefined fallback contract. - Add an AbortError regression test asserting the fast path rethrows aborts instead of silently degrading into a slow chain walk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01ERJ5pSmgZoLj4jEHhweq5B
…st path
Adds a labeled counter recording the outcome of each offset->block
resolution in ArweaveCompositeClient.binarySearchBlocks: cache_hit,
local_index_hit, and fallback_{miss,untight,stale,error,no_index}. The
fast-path debug logs are only emitted at debug level, so without this
counter the local-index hit rate and fallback reasons are invisible at
the info level production runs at. Needed to measure the soak.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01ERJ5pSmgZoLj4jEHhweq5B
perf(chunk-offset): resolve offset→block locally to avoid chain binary search
Summary
Phase 2 of the IPFS work: an ArNS name whose ANT record targets an IPFS CID
(
targetProtocol: ipfs) now resolves and serves through the gateway's IPFS path— e.g.
my-name.gateway.tld→ IPFS content, no CID in the URL.What it does
@ar.io/sdk4.0.0 ANT records carrytargetProtocol(0=Arweave, 1=IPFS) andthe target may be a CID. The on-demand resolver reads
targetProtocol,validates the target as a CID (vs a 43-char Arweave id), and surfaces
protocolon the resolution (cached transparently).protocol === 'ipfs'resolutions to the sameIPFS handler used by the path/subdomain routes (sets
ipfsCid/ipfsPath);everything else serves via the Arweave data path as before.
NameResolution.protocolis optional (undefined ⇒ arweave) — backwardcompatible.
Review hardening (2nd commit)
binding), not
immutable— a direct/ipfs/{CID}staysimmutable. Preventspinning a stale CID for ~a year after a record update (cf. PE-9072).
X-ArNS-Protocol: arweave|ipfsresponse header (signed; added toTRIGGER_HEADERS) +protocol/resolvedIdin the/ar-io/resolver/:nameJSON, so clients know whether
X-ArNS-Resolved-Idis a TX id or a CID.cache-write time, emitted on cache hits, signed via
CO_SIGNABLE_HEADERS.digest round-trip unit test.
Known limitation
Protocol awareness lives in the on-demand resolver. The trusted-gateway
resolver doesn't yet propagate
targetProtocolacross hops, so keepon-demandahead of
gatewayinARNS_RESOLVER_PRIORITY_ORDERfor IPFS-targeted names.(Real fix: a protocol resolution header across gateways — follow-up.)
Verification
Verified end-to-end live (gateway + Kubo sidecar) against a real ArNS undername
pointed at a CID with
targetProtocol: 1:X-Ar-Io-Source: ipfs,X-ArNS-Resolved-Id= the CID,X-ArNS-Protocol: ipfsCache-Control: public, max-age=<ttl>via ArNS;immutablefor direct CIDContent-Digestpresent + signed on cache hits, matches served bytes🤖 Generated with Claude Code