Skip to content

fix(parser): require explicit provider factories#784

Closed
mariusvniekerk wants to merge 5 commits into
provider-db-backed-familyfrom
provider-explicit-registry
Closed

fix(parser): require explicit provider factories#784
mariusvniekerk wants to merge 5 commits into
provider-db-backed-familyfrom
provider-explicit-registry

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Every registered parser agent now has an explicit provider factory instead of relying on the legacy fallback. Claude.ai and ChatGPT are import-only parsers, so they use a small ProviderBase-backed import-only provider with no source discovery/watch behavior while preserving the standard unsupported parse response. The provider registry tests now fail if any future AgentDef falls through to legacyProviderFactory, which keeps new providers on the shared facade path.

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (a63405c)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m7s), codex_security (codex/security, done, 33s) | Total: 2m40s

@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from a63405c to 41637b8 Compare June 20, 2026 00:06
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (41637b8)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m15s), codex_security (codex/security, done, 25s) | Total: 1m40s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (555b938)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m54s), codex_security (codex/security, done, 11s) | Total: 3m5s

@mariusvniekerk

mariusvniekerk commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (30ae7e6)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m1s), codex_security (codex/security, done, 14s) | Total: 2m15s

@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch 3 times, most recently from 2fa10c1 to 089ddd7 Compare June 20, 2026 00:24
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (089ddd7)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 2m4s), codex_security (codex/security, done, 23s) | Total: 2m27s

@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (0671d6c)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m19s), codex_security (codex/security, done, 23s) | Total: 1m42s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 81b0cad to ae27e4f Compare June 20, 2026 00:42
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 0671d6c to b6de96f Compare June 20, 2026 00:42
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (b6de96f)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 1m55s), codex_security (codex/security, done, 17s) | Total: 2m12s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from ae27e4f to 0799e94 Compare June 20, 2026 00:53
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from b6de96f to 1ab0ad4 Compare June 20, 2026 00:53
@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 0799e94 to 82f9e12 Compare June 20, 2026 00:57
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 1ab0ad4 to b68ac14 Compare June 20, 2026 00:57
@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 82f9e12 to e8149e5 Compare June 20, 2026 01:02
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from b68ac14 to 42bcdb4 Compare June 20, 2026 01:02
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Combined Review (42bcdb4)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (claude-code/default, done, 3m11s), codex_security (claude-code/security, done, 1m21s) | Total: 4m32s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from e8149e5 to 83f8b73 Compare June 20, 2026 01:42
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 42bcdb4 to 53ea4b2 Compare June 20, 2026 01:42
@roborev-ci

roborev-ci Bot commented Jun 20, 2026

Copy link
Copy Markdown

roborev: Review Unavailable (53ea4b2)

The review agent repeatedly failed to run (likely an agent or configuration error). roborev will try again on the next commit.

Last error: agent: claude-code failed stream: stream errors: You've hit your session limit · resets 5:50am (UTC): exit status 1

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 83f8b73 to 1f17524 Compare June 21, 2026 00:41
@roborev-ci

roborev-ci Bot commented Jun 21, 2026

Copy link
Copy Markdown

roborev: Combined Review (9671172)

Synthesis unavailable. Showing individual review outputs.

claude-code — default (done)

Review Findings

Severity: Low
Location: internal/sync/engine.go:695-825 (attachProviderSources and its only callee discoverProviderSources)
Problem: These two methods are dead in production. The full-sync path uses discoverProviderFiles, and the changed-path path uses attachChangedPathProviderSources. The only caller of attachProviderSources (and therefore the only thing that reaches discoverProviderSources) is internal/sync/provider_process_test.go:1003. That is ~95 lines of unexercised production code that largely duplicates discoverProviderFiles, which will rot independently of the live path.
Fix: Remove both methods (and adjust the test), or wire attachProviderSources into the path it was intended for if it is meant to be used.


Severity: Low
Location: internal/sync/parsediff.go:614-620 (parseDiffPresencePath)
Problem: The helper is effectively return path. For a virtual path (base != path) it returns path; for a physical path (base == path) it returns base, which equals path. Both branches return the input unchanged, so the function name and structure imply a stripping/keeping distinction that does not exist. The correct exact-path behavior actually comes from storedByPath now being keyed by both full and base paths; this wrapper just obscures that.
Fix: Replace the call with job.path directly, or drop the helper.

Summary

A large, well-tested refactor that removes the legacy DiscoverFunc/FindSourceFunc adapters and makes per-agent providers authoritative for discovery, source lookup, watch planning, incremental parsing, export, and sync skip/caching; the only issues found are minor dead/redundant code.


claude-code — security (done)

I've reviewed the full diff. This is a large (65-commit, ~13k line) internal refactor that migrates the parser/sync system from per-agent function callbacks (DiscoverFunc, FindSourceFunc, WatchRootsFunc) to a unified provider abstraction, flipping all agents to provider-authoritative/import-only migration modes and removing the legacy adapter.

I focused on the changed surfaces that could touch a trust boundary:

  • cmd/agentsview/session_export.go — New provider-based source resolution. The Visual Studio Copilot multi-conversation trace filtering (the one path with an explicit disclosure concern) is preserved and extended: when the resolved source is a real trace file, the conversation ID is derived from the session ID and filtered, guarded by IsValidSessionID. The new visualStudioCopilotTraceHasConversation also gates on IsValidSessionID(rawID) before interpolating rawID into the trace-content match string, so a raw ID can't inject an arbitrary search substring. This is a local CLI over user-owned session files regardless.
  • internal/ssh/resolve.go — Only the filter predicate changed (ProviderSupportsSourceDiscovery vs DiscoverFunc != nil); the values interpolated into the generated remote shell script remain static parser.Registry constants, so no new injection.
  • internal/sync/engine.go — New provider source-discovery/attachment plumbing. The one new SQL query (countRootProviderVirtualDBSessions) is fully parameterized with ? placeholders. pathRewriter/idPrefix remote-sync machinery is moved but not semantically changed. No new exec/command sinks.
  • internal/parser/* providers — Codex incremental append parsing, fingerprint inode/device additions, factory wiring (panic on missing factory is a programmer-error guard). All operate on local, user-owned files that per the project threat model are non-adversarial.

The externally-reachable code (HTTP API, auth/bearer-token, CORS, managed Caddy config generation) and the DB/Postgres schema are not touched by this diff. No changes construct paths, queries, or commands from attacker-controlled input crossing a trust boundary, and no secrets are exposed.

No issues found.

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 4f27d46 to 39a2870 Compare June 23, 2026 23:56
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 9671172 to 245482d Compare June 23, 2026 23:56
@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (245482d)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 9m51s), codex_security (codex/security, done, 4m54s) | Total: 14m45s

@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 39a2870 to 7d46117 Compare June 25, 2026 05:49
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from 245482d to 5a2f40d Compare June 25, 2026 05:49
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (5a2f40d)

Summary verdict: two medium correctness findings remain; no security findings were reported.

Medium

  • cmd/agentsview/main.go:926
    Provider watch plans with only missing roots now fall through to legacy watch collection and often add neither a watcher nor an unwatched poll root. For providers like Zed before threads/ exists or Reasonix before its subdirs exist, new files are not picked up by the 2-minute unwatched poll and wait for the scheduled full sync.
    Fix: Treat a non-empty provider watch plan as authoritative even when all roots are currently missing; return true with the configured agent dir as an unwatched poll root when missingRoots is non-empty and no existing root covers creation.

  • internal/sync/engine.go:6644
    The provider lookup now runs before the existing Aider stored-path validation. Aider virtual paths are positional (<history>#<idx>), and the provider accepts stored paths structurally without checking that the index still maps to the requested raw session ID, so a shifted history can make FindSourceFile/SyncSingleSession parse the wrong run instead of falling through to raw-ID lookup.
    Fix: Keep the Aider AiderRawIDAt validation before provider lookup, or make the Aider provider reject stored virtual paths under RequireFreshSource when the indexed run does not match RawSessionID, then fall back to raw-ID resolution.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 8m57s), codex_security (codex/security, done, 3m29s) | Total: 12m36s

@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (b842c9e)

Medium severity issue found; no high or critical findings.

Medium

  • cmd/agentsview/main.go:926 - When a provider watch plan resolves but every planned root is missing, collectProviderWatchRoots returns false, nil and falls back to legacy watch handling. For provider-authoritative agents without legacy WatchSubdirs, the missing configured/default root is not added to unwatchedDirs, so newly created agent directories are not picked up by the 2-minute unwatched-dir poll and may wait until the slower periodic full sync.

    Suggested fix: Treat a provider plan with missing roots as provider-handled even if no roots were added, return the configured root or missing planned roots for unwatchedDirs, and add a regression test for a provider-authoritative agent whose root does not exist at startup.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 10m45s), codex_security (codex/security, done, 6m24s) | Total: 17m16s

Make the provider registry force every agent onto an explicit facade path
instead of silently inheriting a legacy fallback factory. Remove the
legacy provider fallback entirely so an unhandled AgentDef is a loud
construction error, and represent the non-filesystem export parsers
(Claude.ai, ChatGPT) with explicit import-only providers. Mark the
concrete providers authoritative in the migration manifest and drop the
legacy-only marker.

Route FindSourceFile and SourceMtime through provider FindSource and
Fingerprint so the stack tip exercises provider-owned source identities
and composite mtimes rather than parallel legacy dispatch.

Retire the test scaffolding that depended on the removed legacy types:
per-provider tests assert concrete construction, the obsolete
legacy-fallback and legacy-only-mode registry tests are dropped, and the
zero-legacy anti-shim gate runs with an empty pending list.

With codex's legacy ShallowWatchRootsFunc removed in favor of the provider
WatchPlan, fix collectProviderWatchRoots so a WatchPlan root that does not
exist yet but lives under an already-watched ancestor no longer marks the
whole directory for unwatched polling. A not-yet-created per-session
recursive root otherwise regressed parity by polling two codex dirs that
share a watched state-directory root; the ancestor watch observes the
target's creation and a later sync establishes the deeper watch.

refactor(parser): fold export parsers onto the import-only providers

ChatGPT and Claude.ai sessions never come from disk discovery; they enter
the archive only through a one-shot import. Move the ParseChatGPTExport and
ParseClaudeAIExport free functions onto the import-only provider as the
ChatGPTExportParser and ClaudeAIExportParser methods, and route the importer
and tests through NewProvider plus a type assertion.

This removes the last provider-specific legacy parse free functions, so the
parser package now owns every agent's parse behavior on provider receivers
rather than on standalone entrypoints.

refactor(sync): remove the dead provider shadow-compare harness

No provider runs in shadow-compare mode: every agent is now either
provider-authoritative or import-only. The shadow harness that dual-ran a
side-effect-free provider parse against the legacy result and recorded the
diff was therefore never invoked at runtime.

Delete the harness end to end: the ObserveProviderSource entry point and its
comparison machinery, the Engine.observeProviderShadow hook and its two call
sites, the ProviderShadowRecorder config/field wiring, and the
ProviderMigrationShadowCompare mode (collapsing every switch that paired it
with provider-authoritative). The provider outcome validation and effect
planning helpers that the live parse path still relies on move to
provider_effects.go, which is all that file ever held that was reachable.

test(parser): drop the facade-migration anti-shim scaffolding

With every provider folded onto receiver methods and zero provider-specific
legacy parse free functions left, the migration's enforcement tests have
served their purpose. They assert the absence of named functions and that
provider files do not shim legacy entrypoints, which is only meaningful
while the stack is mid-migration; after merge they are pure maintenance drag
that breaks whenever a symbol is legitimately renamed.

Delete the per-provider Test*ProviderOwnsLegacyEntrypoints guards and the
shared anti-shim scan (provider_shim_scan_test.go). The providers' behavioral
tests remain and are what actually protect the parse paths going forward.

feat(parser): migrate aider, omp, reasonix to providers

origin/main carries three agents the facade stack never migrated: Aider,
OhMyPi, and Reasonix. After rebasing onto it, the explicit provider registry
panicked on startup because those agents had no concrete factory, and the
migration manifest still listed them as legacy-only against a manifest that no
longer defines that mode.

Give each a concrete provider so the zero-legacy registry stays intact:

- OMP shares Pi's JSONL session format, so the Pi provider is parameterized by
  AgentDef (type and ID prefix) and serves both Pi and OhMyPi; ParseOMPSession
  is folded away.
- Reasonix gets a single-file provider whose composite fingerprint folds the
  .jsonl.meta sidecar (mirroring reasonixEffectiveInfo) and whose changed-path
  classifier reproduces the project/global/archive/subagent layouts and the
  sidecar-to-transcript mapping.
- Aider gets a multi-session provider that fans one history file out into one
  session per run under "<history>#<idx>" virtual paths and force-replaces on
  parse, mirroring the Shelley shape. Per-run skip is handled by
  dropUnchangedSharedSQLiteResults (content-hash compare); remote-sync identity
  stability is preserved by threading the path rewriter through ProviderConfig
  so per-run IDs stay stable across temp extraction dirs.

The three manifest entries flip to provider-authoritative and the legacy engine
methods (processAider, processReasonix, aiderFileUnchanged, aiderIdentityPath,
classifyAiderPath) plus the now-dead legacy processFile fall-through are
removed.

The two codex append regression tests that were re-pointed onto processFile no
longer consume their re-stat result; drop the unused assignment to satisfy
staticcheck.

test(sync): restore provider runtime regressions

The shadow-compare harness removal also deleted coverage for live provider-authoritative runtime behavior. Restore those checks against the final processFile and SyncSingleSession paths with a small fake provider so future migrations cannot drop source lookup, retry data-version, skip-cache, skip-reason, not-found, or force-parse behavior silently.\n\nAlso assert the checked-in migration manifest only contains final provider modes and remove stale shadow-compare wording from live sync comments.\n\nValidation: go test -tags "fts5" ./internal/sync -run 'TestProcessFileProvider|TestSyncSingleSessionProviderAuthoritativeBypassesProviderSkipCache' -count=1; go test -tags "fts5" ./internal/parser -run 'TestProviderMigrationModes' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check

fix(sync): make provider source lookup authoritative

Stored file_path values can be stale after provider migration, especially for virtual DB-backed sources and remote-canonical Aider histories. Treat them as lookup hints instead of source-of-truth paths so provider-owned identity and freshness decide what can be resynced.

Aider now resolves remote canonical physical and virtual hints using the same path-rewriter identity model as parse, Hermes verifies state.db contains the requested raw ID before claiming it, and DB-backed providers fall through from stale hints to raw-ID lookup while fresh deleted rows remain not found.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestDBBackedProviderStoredVirtualPathFreshness|TestDBBackedProviderRejectsInvalidStoredVirtualPaths|TestFindSourceFileProviderAuthoritativePrefersProviderOverStoredPath|TestSyncForgeMissingConversation' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check

fix(watch): preserve provider watch root semantics

Provider-authoritative agents must drive live watcher setup from their WatchPlan, otherwise legacy flags such as Cowork's ShallowWatch can silently narrow recursive coverage. Keep the provider root shape when collecting watcher roots so recursive provider roots stay recursive.

Missing provider roots are only treated as covered when an existing recursive watch covers the subtree or a shallow root can observe direct creation of that missing root. This preserves Codex's shallow parent behavior without letting shallow ancestors hide deeper missing recursive roots from polling.

Validation: go test -tags "fts5" ./cmd/agentsview -run 'TestCollectWatchRoots(UsesCoworkProviderRecursiveRoot|UsesGeminiProviderMetadataRoot|UsesAntigravityCLIHistoryRoot|PreservesDirsSharingWatchRoot|HermesSessionsWatchesStateDBParent)|TestMissingWatchRootCoverageDoesNotTreatShallowAncestorAsRecursive' -count=1; go test -tags "fts5" ./cmd/agentsview ./internal/parser -run 'TestCollectWatchRoots|TestMissingWatchRootCoverage|TestCoworkProviderSourceMethods|TestGeminiProvider|TestAntigravityCLI|TestSQLiteFanoutSourceSetUsesFindDBPathAsCanonical' -count=1; go test -tags "fts5" ./cmd/agentsview ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check

test(provider): cover non-sync provider callers

Provider migration removed legacy discovery and source lookup hooks, so non-sync callers need explicit coverage that they continue to use provider capabilities. Add contracts for parse-diff supported-agent resolution, token-use raw disk probing, and SSH remote directory resolution across the provider-authoritative agents called out by review.

The Cursor token-use case uses a real provider-owned source layout to prove an unsynced raw or canonical session ID resolves through provider FindSource. Comments now describe the shared disk source lookup instead of implying FindSourceFunc is still the only path.

Validation: go test -tags "fts5" ./cmd/agentsview ./internal/sync ./internal/ssh -run 'TestParseDiffSupportedAgentsIncludesProviderAuthoritativeAgents|TestParseDiffProviderAuthoritativeAgentsAreDiscoverable|TestParseDiffDiscoversProviderSources|TestResolveSessionID_ProviderAuthoritativeCursorOnDiskNotInDB|TestAgentHasDiskSourceLookupIncludesProviderAuthoritativeAgents|TestBuildResolveScript' -count=1; go test -tags "fts5" ./cmd/agentsview ./internal/sync ./internal/ssh -count=1; go fmt ./...; go vet ./...; git diff --check

docs(provider): spell out source identity contract

Provider implementers need the source-hint and freshness rules at the API boundary, not only embedded in migration review context. Document that stored file paths and fingerprint keys are advisory, provider lookup is authoritative, and persisted source identity must stay compatible with source metadata, skip-cache, data-version, PostgreSQL, and session metadata consumers.

Also update the facade design note so the current stack tip no longer claims shadow-compare mode still exists.

Validation: go test -tags "fts5" ./internal/parser -run 'TestProvider|TestProviderMigrationModes' -count=1; go fmt ./...; go vet ./...; git diff --check; mdformat applied by commit hook

test(provider): enforce anti-shim ownership policy

The migration should not rely on per-agent AgentDef hooks or exported provider-specific facade functions once a provider is authoritative. Add one maintained package-wide guard for that policy and remove the obsolete Kiro-only scan.

Aider and Reasonix were still wired through legacy DiscoverFunc/FindSourceFunc despite having provider-authoritative implementations. Remove those hook assignments and make their provider-specific parser/discovery/lookup helpers package-local so runtime callers go through the provider registry.

Validation: go test -tags "fts5" ./internal/parser -run 'TestProviderAuthoritativeAgentDefsDoNotExposeLegacyHooks|TestNoExportedProviderFacadeShims|TestNoProviderFacadeShimNamePolicyDocumentsAllowedHelpers|TestProviderAntiShimScanReadsExpectedPackage|TestAider|TestReasonix|TestProviderMigrationModes' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Aider|Reasonix|TestProviderAuthoritativeAgentDefsDoNotExposeLegacyHooks|TestNoExportedProviderFacadeShims|TestProviderMigrationModes' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; mdformat applied by commit hook

fix(parser): make import export capabilities agent-specific

ChatGPT and Claude.ai imports should advertise only the export parser surface that belongs to their agent. The shared import-only provider type made both interface assertions succeed, which hid capability loss during migration review and let callers treat either provider as supporting the other export format.

Validation: go test -tags "fts5" ./internal/parser -run 'TestImportOnlyProviderExportCapabilitiesAreAgentSpecific|TestParseChatGPTExport|TestParseClaudeAIExport' -count=1; go test -tags "fts5" ./internal/parser -count=1; go fmt ./...; go vet ./...; git diff --check

fix(parser): hash reasonix metadata sidecars

Reasonix metadata-only edits can change session fields without touching the transcript bytes. Provider fingerprinting already folded sidecar size and mtime, but the hash stayed transcript-only, so stored freshness state could miss metadata-only changes under hash-based comparisons.

Keep missing-sidecar hashes compatible with the existing transcript-only value, and add provider coverage for layout classification plus deleted sidecar/transcript behavior so the migration contract is explicit.

Validation: go test -tags "fts5" ./internal/parser -run 'TestReasonixProvider(Fingerprint|ChangedPath)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Reasonix' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check

fix(sync): keep aider off mtime-only skip cache

Aider histories fan out one physical file into multiple virtual run rows. Letting the generic skip cache key the physical history by mtime alone can bypass the provider fingerprint and per-run DB checks, hiding same-mtime content changes, missing run rows, stale hashes, or stale data versions.

Disable generic skip caching for Aider and rely on the provider parse plus dropUnchangedSharedSQLiteResults hash/data-version filtering. This preserves correctness at the cost of reparsing the shared history before dropping unchanged runs.

Validation: go test -tags "fts5" ./internal/sync -run 'TestProcessFileAiderProvider' -count=1; go test -tags "fts5" ./internal/parser -run 'TestAiderProviderFindSourceUsesCanonicalIdentity|TestAiderProviderRemoteIdentityStable' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Aider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check

refactor(parser): extract multi-session container base

Shelley and Aider both surface one physical container (a SQLite DB / a chat history file) as many virtual member sessions, and each hand-rolled the same source-set scaffolding: discovery, watch plan, changed-path classification, the StoredFilePath/FingerprintKey/RawSessionID FindSource tiers, fingerprinting, and the container/member parse fan-out.

Introduce multiSessionContainerSourceSet, a reusable source set, provider, and factory configured entirely through functional options (with*()), so a new special case is a new option rather than a wider signature or a new interface method. Aider's remote-sync identity (PathRewriter) and its canonical-path stored fallback, and Shelley's member-presence check, are all expressed as options.

Fold shelley_provider.go and aider_provider.go onto the base. Per-provider code drops ~45% each (shelley 532->289, aider 501->274); the 469-line base is a one-time cost the remaining container-family providers (zed, kiro, opencode, db-backed, copilot) can reuse.

refactor(parser): add generic SourceSet provider plumbing

The multi-session container base shipped with its own ProviderFactory and a delegating Provider that forwarded six methods to the source set. Every future reusable base (single-file sidecar, and the rest) would re-hand-roll that same factory + forwarding shell.

Extract it once: a SourceSet interface (the Provider source methods plus Parse, minus the Definition/Capabilities/config plumbing), a sourceSetProvider wrapper that supplies ProviderBase and applies the two normalizations every provider shares (raw-session-ID injection on FindSource, the request/config machine fallback on Parse), and a generic sourceSetFactory built from def + caps + a per-config SourceSet constructor.

multiSessionContainerSourceSet now implements SourceSet directly (gaining a Parse method); newMultiSessionProviderFactory becomes a thin adapter over newSourceSetFactory. ParseIncremental stays on the ProviderBase unsupported default until a base needs it.

refactor(parser): add single-file source-set base, fold reasonix

Add singleFileSourceSet, the second reusable SourceSet base, for providers whose physical source is one file that parses into exactly one session (no virtual member paths, no fan-out). Like multiSessionContainerSourceSet it is configured through functional options (withFile*()) and plugs into newSourceSetFactory. The sidecar/composite fingerprint variance across this family stays inside each provider's withFileFingerprint closure, so the base carries no sidecar knowledge until a shared helper is warranted.

Fold reasonix onto it as the validation provider: discovery, the .jsonl.meta sidecar changed-path mapping, the WatchSubdirs-aware changed-path resolution, the composite fingerprint, and the project-hint + fingerprint stamping in parse all become option closures. The provider file drops from 457 to ~280 lines; behavior is preserved (full parser and sync suites green).

refactor(parser): fold zed onto multi-session container base

Zed is a SQLite-DB-per-root thread container like Shelley. Replace its hand-rolled factory/provider/source set with multiSessionContainerSourceSet option closures, preserving the zed specifics: the container path is root/threads/threads.db, the member fingerprint mtime comes from ZedSQLiteSourceMtime, and the container fan-out stamps the DB's own content hash (computed in the parseContainer closure, not the request fingerprint, so missing-fingerprint parses still hash the rows). sqliteDBCompositeMtime stays put; it is shared with shelley and kiro.

refactor(parser): fold visual studio copilot onto container base

Visual Studio Copilot surfaces conversations from shared trace files, but unlike the SQLite containers it discovers one source per conversation (deduped across traces, newest wins) plus a bare source per unreadable trace. Two base generalizations support that: withSourceDiscovery lets a provider emit member-level matches at discovery time, and multiSessionMatch now carries a ProjectHint surfaced on the SourceRef.

The multi-session parse closures now receive the full ParseRequest instead of just the machine string, mirroring the single-file base. This lets vs_copilot honor req.Source.ProjectHint, which the engine sets to the DB-preserved project on single-session re-sync so a user's project override is not reverted. shelley, zed, and aider closures are updated to the new signature (they read req.Machine).

parseConversation becomes a free function and the test helpers call it directly. vs_copilot keeps its virtual-paths-always-strict changed-path classification and stamps the shared trace hash on every fanned-out conversation. Provider file drops 454->250; full parser and sync suites green.

refactor(parser): fold cowork onto single-file base

Cowork sources are single Claude-format transcripts, but one transcript can yield several sessions (main plus subagents) and a parse drives removals via excluded session IDs. Generalize singleFileSourceSet's parse contract to return ([]ParseResult, []string excluded) instead of one *ParseResult, and add withAlwaysCompleteResultSet so a parse that only excludes sessions still reports a complete (not skipped) result set. SourcesForChangedPath now derives allowMissing from jsonlMissingPathFallbackAllowed(req) rather than hardcoding true, which cowork (and vibe) require and reasonix is indifferent to.

reasonix's parse closure is updated to the new slice signature. Cowork's hand-rolled factory/provider/source set become option closures; parseSession becomes the free function parseCoworkSession; the metadata-to-transcript changed-path mapping, composite mtime, and project-hint-from-metadata are all preserved. Provider file drops 523->352; full parser and sync suites green.

refactor(parser): fold vibe onto single-file base

Vibe sessions are single messages.jsonl transcripts with a sibling meta.json. Fold onto singleFileSourceSet: discovery, the messages.jsonl/meta.json changed-path mapping (strict vs session-dir-name fallback under allowMissing), the composite fingerprint, and the fallback-ID exclusion become option closures. The single result plus exclusions and the skip-on-no-session behavior ride the base's multi-result parse contract without withAlwaysCompleteResultSet, since vibe still skips when no session is parsed.

parseSession and parseVibeResult become the free functions parseVibeSession and parseVibeResultFile; the test helpers call them and the provider's Discover directly instead of a concrete *vibeProvider. Provider file drops 507->300; full parser and sync suites green.

refactor(parser): make JSONLSourceSet a SourceSet, fold amp

Add a ParseFile option to JSONLSourceSetOptions and a Parse method to JSONLSourceSet, so the directory-of-files source set (and DirectoryJSONLSourceSet) implements the full SourceSet interface and rides the generic sourceSetFactory. Parse mirrors the single-file base: empty results with no exclusions is a clean no-session skip; req.Machine is resolved by sourceSetProvider. Amp is folded as the template: its hand-rolled factory, provider, five forwarding methods, and Parse collapse to newSourceSetFactory plus an ampParseFile closure; parseSession becomes the free function parseAmpSession. 139->63 lines.

refactor(parser): fold qwen provider onto SourceSet factory

Replace the hand-rolled qwenProvider struct, factory struct, var _
Provider assertion, forwarding methods, and Parse with the generic
newSourceSetFactory plus a ParseFile option on the JSONL source set.
The ParseFile closure passes req.Source.ProjectHint as the project
hint. Convert parseSession from a *qwenProvider method to the free
function parseQwenSession and update the test helper accordingly.

refactor(parser): fold workbuddy onto SourceSet convergence

Replace the hand-rolled workBuddyProvider struct, factory, and forwarding
methods with newSourceSetFactory plus a ParseFile option, matching the amp
provider. Convert parseSession from a method to the free function
parseWorkBuddySession.

Add an optional LookupIDValid predicate to JSONLSourceSetOptions so the
generic FindSource fallback accepts WorkBuddy composite subagent IDs
(<id>:subagent:<id>), which IsValidSessionID rejects. The option defaults to
IsValidSessionID, preserving behavior for all other source-set providers.

refactor(parser): fold deepseek_tui onto SourceSet convergence

refactor(parser): fold zencoder onto SourceSet convergence

refactor(parser): thread context through JSONLSourceSet ParseFile

ParseFile now receives the parse context so directory-of-files folds can do context-aware work such as git-root project resolution; the existing single-result closures ignore it. Also adds a RawSessionIDForLookup hook that normalizes a stored raw session ID before the FindSource discovery comparison, for providers whose stored IDs carry a suffix the discovered filename stem lacks.

refactor(parser): fold iflow onto SourceSet convergence

iFlow now rides the generic source-set factory via DirectoryJSONLSourceSet's ParseFile. Its multi-result parse, relationship inference, and git-aware project resolution move into the ParseFile closure using the threaded context, and the subagent-base-ID normalization its custom FindSource performed is carried by the RawSessionIDForLookup hook.

refactor(parser): fold kimi onto SourceSet convergence

Kimi rides the generic source-set factory via ParseFile. Its colon-joined raw session IDs cannot be matched by the filename-stem discovery scan, so the wire.jsonl path reconstruction (including the agents/ subagent layout) moves to a new RawSessionIDSourceFiles hook that FindSource consults before the scan.

refactor(parser): fold kiro_ide onto SourceSet convergence

Kiro IDE rides the generic source-set factory via ParseFile. Its two on-disk layouts (old <ws-hash>:<file-hash> .chat IDs and new UUID .json files under workspace-sessions/) are resolved by reconstructing candidate paths in the RawSessionIDSourceFiles hook, since the colon-joined old IDs cannot be matched by the filename-stem discovery scan.

refactor(parser): fold qwenpaw onto SourceSet convergence

QwenPaw rides the generic source-set factory via ParseFile. Its colon-joined raw IDs are reconstructed through RawSessionIDSourceFiles; a DB-recorded file_path outside the configured roots is honored via the new StoredPathFallbackRoot hook, which synthesizes the implicit <root>/<workspace>/sessions/ layout; and the wholesale-rewrite outcome is carried by the ForceReplace option.

refactor(parser): convert JSONLSourceSet to functional options

JSONLSourceSet and DirectoryJSONLSourceSet are now built with with*() option closures plus default bundles (withContentHashing, withSymlinkFollowing) instead of a struct literal, matching the multiSessionContainer and singleFile bases. Each source only states what differs from the zero-value defaults. Constructors are lowercased (newJSONLSourceSet / newDirectoryJSONLSourceSet) as they are package-internal. Behavior is unchanged; SiblingMetadataSourceSet keeps its struct-based constructor via the shared jsonlSourceSetFromOptions.

fix(parser): make aider discovery opt-in

Aider had no central session store, so the registry gave it DefaultDirs [""], which resolves to $HOME and drove an always-on bounded walk. For a passive viewer that is a poor default: background refreshes (usage reports, desktop launches) enumerate $HOME and trigger macOS privacy prompts for Documents/Downloads/Music/Photos. Drop the default so aider is discovered only when the user opts in via AIDER_DIR or aider_dirs; a configured broad root still gets the bounded, protected-folder-pruned walk. This aligns the provider-migration branch with the opt-in behavior shipped on main in #844, which this branch predates.

fix(ssh): resolve remote aider history without DefaultDirs

Making aider discovery opt-in emptied its DefaultDirs, but the SSH resolve
script only emitted the AIDER_DIR history-file snippet while iterating that
list, so an explicitly configured remote AIDER_DIR stopped resolving any
.aider.chat.history.md files for transfer. Handle aider independently of
DefaultDirs, emitting buildAiderResolveSnippet whenever its EnvVar is set while
still avoiding any default $HOME scan.

Also cover JSONLSourceSet.FindSource RawSessionIDForLookup normalization, which
runs before both the LookupIDValid gate and the SessionIDFromPath discovery
comparison.
Rebasing the provider stack onto current origin/main brought in new
upstream tests whose expectations met the migration's refactored APIs:

- parseDiffSourceReliableForRaced gated the raced reclassification on
  def.DiscoverFunc, which the provider migration leaves nil for every
  agent, so no file-based source was ever reliable and genuine live-write
  skew was reported as a plain change. Make it an Engine method that
  reuses parseDiffAgentDiscoverable, the same provider-aware on-disk
  source predicate resolveParseDiffAgents uses.

- A Kiro SQLite store is discovered as one container source but fans out
  into one session per row, so collectAndBatch counted it as a single
  file. Add the extra sessions it produces to TotalSessions, restoring the
  per-session tally the legacy syncKiroSQLite phase reported.

- Codex no longer advertises incremental append, so an appended transcript
  is re-parsed in full and stores the raw file size and hash. Update the
  former consumed-prefix incremental assertion to that full-parse reality;
  the parsed-snapshot vs partial-tail distinction is still enforced at
  parse-diff time via CodexTranscriptConsumedSize.
Every agent is now provider-authoritative, so no AgentDef sets DiscoverFunc
or FindSourceFunc. The fields, the always-false branches that consulted them
(full-sync discovery, parse-diff discovery and discoverability, SSH resolve,
source lookup, token-use disk probing), and the now-empty parseDiffDatabaseSources
were left behind by the staged migration. Provider discovery
(discoverProviderSources / parseDiffProviderSources) and provider lookup
(findProviderSourceFile) already own every one of these paths.

Also drop the migration-era scaffolding tests that only asserted this legacy
code no longer exists: the provider_anti_shim_test.go suite (legacy-hook nil
checks plus the AST scan for exported Discover/Find/Parse/Process facade
functions) and the per-agent require.Nil(def.DiscoverFunc) registry assertions.
With the fields gone, their absence is enforced by the compiler.
Hermes FindSource aborted the whole lookup when a state.db query
errored, even though parseArchive deliberately falls back to the
transcript parser when state.db is unreadable or schema-incompatible. A
valid transcript session sitting next to a corrupt or legacy state.db
could no longer be located for resync.

Treat a state.db lookup error as non-claiming for that root and continue
to the transcript lookup, matching the parser's documented fallback.
Guard provider.findRequests with require.Len before indexing so a
missing request fails as a clear length assertion rather than a panic.
Drive the parse-diff provider-authoritative contract from the registry
so it covers every current file-based provider-authoritative agent
instead of a hand-maintained subset. Add a Codex regression test pinning
PreferStoredSource to the stored archived duplicate, contrasted with the
canonicalize-to-live behavior it opts out of.
@mariusvniekerk mariusvniekerk force-pushed the provider-db-backed-family branch from 7d46117 to 225fe95 Compare June 25, 2026 17:29
@mariusvniekerk mariusvniekerk force-pushed the provider-explicit-registry branch from b842c9e to d28f042 Compare June 25, 2026 17:29
@roborev-ci

roborev-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

roborev: Combined Review (d28f042)

Reviewed provider migration outputs: one Medium correctness finding remains; no security findings.

Medium

  • internal/sync/engine.go:6693 - FindSourceFile passes archived StoredFilePath into provider lookup before raw-ID resolution for every provider. Some source sets prefer that hint even when RequireFreshSource is true, so stale paths can win. For Aider, a stored <history>#idx can point at a different run after earlier runs are inserted or removed, bypassing the prior AiderRawIDAt validation and causing SyncSingleSession or watching to parse the wrong run or fail instead of re-resolving by raw session ID.

    Suggested fix: Only pass or prefer stored paths for providers that validate them, or make provider lookup honor RequireFreshSource by verifying the stored source still exists and matches the requested raw ID. For Aider, fall back to raw-ID lookup unless the stored virtual path recomputes to that raw ID.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 10m1s), codex_security (codex/security, done, 5m13s) | Total: 15m22s

@mariusvniekerk

Copy link
Copy Markdown
Collaborator Author

Superseded by the family-grouped collapse of this stack into 10 green branches (#876#885), which preserves every commit while cutting the branch count for review. Closing in favor of that stack.

generated by a clanker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant