feat(parser): migrate bespoke providers (openhands, cursor, vibe, hermes, claude, cowork)#882
feat(parser): migrate bespoke providers (openhands, cursor, vibe, hermes, claude, cowork)#882mariusvniekerk wants to merge 7 commits into
Conversation
|
This change is part of the following stack:
Change managed by git-spice. |
OpenHands stores each conversation as a directory with metadata and event files, so the provider needs a directory source facade rather than a JSONL file wrapper. This keeps the legacy discovery and dashed/undashed ID lookup behavior while making the composite snapshot fingerprint explicit at the provider boundary. The provider uses the existing OpenHands parser and snapshot helpers so freshness, shallow watch planning, changed-path classification, and normalized parse output stay aligned with the legacy sync path. test(parser): opt openhands into provider shadow OpenHands now has a concrete facade provider on this branch, so its migration mode should enter shadow comparison instead of remaining legacy-only and additive. Earlier provider opt-ins stay inherited and later provider branches own their modes. Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check test(sync): compare openhands shadow parity OpenHands is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseOpenHandsSession. The test uses the directory snapshot source shape so the provider fingerprint path and planned data-version behavior stay visible while the branch migrates away from legacy dispatch. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesOpenHandsLegacyParser|TestOpenHandsProvider|TestParseOpenHands|TestDiscoverAndFindOpenHands|TestClassifyOnePath_OpenHands|TestProcessFileOpenHandsUsesSnapshotMtimeForRetryCache' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/... refactor(parser): fold openhands into provider Move OpenHands discovery, source lookup, and parse ownership onto the concrete provider and delete the package-level DiscoverOpenHandsSessions, FindOpenHandsSourceFile, and ParseOpenHandsSession free functions. Discovery now walks conversation roots directly in the provider source set, raw-session-ID lookup folds the literal/dash-stripped/normalized matching into sessionDirForID, and parsing runs on a provider receiver method. The provider-neutral snapshot, session-dir predicate, and event parse helpers stay as shared free functions. Make OpenHands provider-authoritative and remove its legacy sync dispatch: the classifyOnePath block, the processFile case arm, the OpenHands snapshot-mtime branch, and processOpenHands are gone. Sync now classifies and processes OpenHands through provider changed-path handling, which preserves the base_state.json/TASKS.json/events companion remap to the session directory and keeps the snapshot mtime driving the skip-retry cache via the provider fingerprint. Drop the OpenHands AgentDef DiscoverFunc/FindSourceFunc hooks, remove the shadow baseline test, exempt the provider file from the shim scan, and add a guard asserting the legacy entrypoints stay deleted.
Cursor transcript sources have two legacy layouts and select .jsonl over .txt when both exist for a session. Moving Cursor behind a concrete provider keeps that selection policy explicit at the provider boundary instead of relying on the legacy parser adapter.\n\nThe provider preserves recursive project discovery, raw/full ID lookup, stale .txt path promotion, changed-path classification, content-hash fingerprinting, and parser output normalization while using the same Cursor discovery and parsing helpers as the previous sync path. fix(parser): preserve cursor project-scoped source selection Cursor session IDs are only unique within an encoded project directory, but the provider was resolving stored and changed paths through a root-wide lookup. That could silently select the same transcript stem from a different project and drop valid sources during discovery. Resolve Cursor source promotion inside the project derived from the incoming path, add duplicate-stem coverage, and mark model output unsupported until the parser actually fills message models. This lets the Cursor branch enter shadow comparison as a real migration step. Validation: go test -tags "fts5" ./internal/parser -run 'Test(CursorProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check test(sync): compare cursor shadow parity Cursor is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseCursorSession. The test uses duplicate transcript stems in different encoded project directories to lock in the current parser ID behavior while proving provider source observation stays project-scoped. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCursorLegacyParser|TestCursorProvider|TestParseCursor|TestCursorSessionID' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/... test(sync): assert cursor provider hash parity Roborev job 2709 caught that the Cursor shadow parity fixture normalized the legacy session hash before proving the provider fingerprint matched the legacy parser hash. That left the test unable to detect a provider fingerprint regression that propagated into parsed output. Assert hash parity before normalizing the legacy session for the full struct comparison, keeping the existing duplicate-stem fixture focused on provider/legacy equivalence. Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesCursorLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check refactor(parser): fold cursor into provider Move Cursor source discovery, lookup, and parse ownership onto the concrete cursorProvider and remove the package-level DiscoverCursorSessions, FindCursorSourceFile, and ParseCursorSession free functions. Discovery and find-source bodies now live as provider-owned helpers (discoverTranscriptPaths, cursorAddSeen, cursorFindSourceFile) on the cursor source set, and parseSession is a receiver method. Make Cursor provider-authoritative and drop its legacy sync dispatch: the classifyOnePath transcript block, the processFile case arm, the processCursor method, and its now-orphaned validateCursorContainment and findContainingDir helpers. Source classification, containment, .txt/.jsonl precedence, and project-hint decoding are all reproduced through the provider's changed-path and discovery paths, so runtime behavior is preserved. ParseCursorTranscriptRelPath stays a shared provider-neutral path validator used by both the engine's project enrichment and the provider. Replace the shadow-baseline test with provider API coverage plus a guard asserting the legacy entrypoints stay gone, and remove cursor from the pending-shim list. fix(parser): cap cursor provider fingerprinting Cursor parsing already rejects transcripts over 10 MiB, but the migrated provider fingerprint path still hashed the full source before parse. That made oversized files pay an unbounded read cost in the provider freshness path even though parse would never accept them.\n\nKeep normal-size content hashing intact and return only metadata for oversized Cursor transcripts so parse remains the sole place that reads up to the guarded cap.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestCursorProvider' -count=1; go vet ./...; git diff --check
Vibe stores transcript content in messages.jsonl while canonical session identity, title, timestamps, model, and usage can live in a sibling meta.json. Moving it behind a concrete provider keeps that companion relationship explicit at the provider boundary.\n\nThe provider preserves recursive session discovery, symlinked session directories, raw and full ID lookup through meta.json, meta-sidecar changed-path classification, effective size and mtime freshness, transcript hashing, fallback-ID exclusion, and parser output normalization through the existing Vibe parser wrapper. fix(parser): classify removed vibe transcripts Vibe source events need to keep working after the primary messages.jsonl has already disappeared. Routing deletion and rename-style events through the existing file check meant the watcher could ignore the exact event that should refresh or remove the stored session. Synthesize source refs only for missing-path removal semantics, keep ordinary lookups existence-checked, and pin the intentionally shallow session directory layout in provider tests. This lets the Vibe provider enter shadow comparison as a real migration step. Validation: go test -tags "fts5" ./internal/parser -run 'Test(VibeProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check test(sync): compare vibe shadow parity Vibe is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseVibeSessionWrapper. The test includes meta.json canonical ID promotion, provider-adjusted fingerprint metadata, usage events, and excluded fallback IDs so reviewers can see the migration preserves the composite source behavior. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesVibeLegacyParser|TestVibeProvider|TestParseVibe|TestClassifyOnePath_Vibe|TestSyncVibe|TestSourceMtimeVibe|TestProcessVibe' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/... test(sync): cover vibe provider usage parity Roborev job 2711 caught that the Vibe shadow parity fixture compared empty usage slices, so it could not detect regressions in aggregate usage emission. Seed the fixture with real Vibe metadata fields for active model and nonzero stats, then assert both legacy and provider paths emit usage before comparing them. Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesVibeLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check refactor(parser): fold vibe into provider Move Vibe source discovery, lookup, and parse ownership onto the concrete vibeProvider and delete the package-level DiscoverVibeSessions, FindVibeSourceFile, and ParseVibeSessionWrapper free functions. Discovery and find-source bodies now live as provider-owned helpers (discoverSessionPaths, findSourceFile) on the vibe source set, the isVibeMessagesFile guard moves to the provider file, and the messages.jsonl parser becomes the provider parseVibeResult/parseSession methods. Make Vibe provider-authoritative and drop its legacy sync dispatch: the classifyContainerPath classifyVibePath call and method, the processFile case arm, the processVibe method, and its now-orphaned isSessionBlocked and isSessionTrashed helpers. vibeEffectiveInfo stays as a shared composite-mtime helper used by the skip-cache and fingerprint paths. Because a provider has no database handle, the engine reproduces Vibe's DB-aware, file-path-scoped bookkeeping in applyProviderFilePathPolicies for single-session-per-file providers: stale stored IDs at the same source path are excluded, and a freshly parsed row is suppressed when the user already removed (trashed or deleted) the session occupying that path, so a canonical ID flipping between the meta.json session_id and the directory-name fallback no longer resurrects a hidden session. This is a no-op for stable-ID providers and skipped for multi-session sources. Drop the Vibe AgentDef DiscoverFunc/FindSourceFunc hooks, remove it from the pending shim scan list, replace the shadow-baseline test with provider API coverage plus a guard that the legacy entrypoints stay gone, and route the package and engine tests through the provider methods. The obsolete classifyOnePath Vibe test is removed; the provider's SourcesForChangedPath coverage replaces it.
Hermes can represent a configured root as either individual transcript files or as a state.db archive that fans out into multiple sessions. Moving it behind a concrete provider makes that source choice explicit instead of leaving archive behavior inside the legacy adapter path.\n\nThe provider preserves transcript discovery and lookup while treating state.db as a multi-session, force-replace source. Its fingerprint covers the archive database plus sibling transcripts so transcript-quality changes can refresh the archive source that ParseHermesArchive reads. fix(parser): preserve hermes archive event coverage Hermes archive discovery can normalize a configured sessions directory or direct state.db path into a sibling archive source, but the watch plan and changed-path classifier still assumed the configured root was the only event root. That left state.db updates and removed primary files invisible to provider-path sync. Normalize archive watch roots, map delete and rename-style events syntactically when primary files are gone, and cover archive-parent, sessions-directory, and direct-state roots. This lets Hermes enter shadow comparison as an actual migration branch. Validation: go test -tags "fts5" ./internal/parser -run 'Test(HermesProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check fix(parser): watch hermes archive roots syntactically Hermes archive configs can point at the archive parent, its sessions directory, or the state.db file before the sibling archive components have been created. Watch planning needs to treat those shapes as archive roots from their paths, not from startup-time existence checks, otherwise late-created metadata or transcripts are invisible until a full sync. The transcript watch root is now retained for archive-shaped roots even when sessions/ is not present yet, while ordinary transcript-only roots keep their recursive file watch. Validation: go test -tags "fts5" ./internal/parser -run 'TestHermesProvider' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check fix(parser): feed hermes archive roots to runtime watcher Hermes provider watch planning now knows how to follow archive-shaped roots, but the actual serve-time watcher still reads registry watch resolvers. Without a matching Hermes resolver there, the default .hermes/sessions config can miss sibling state.db creation or updates in live sync. Expose Hermes shallow archive-parent watch roots through the registry while keeping transcript roots recursive, and add shadow parity coverage so this branch remains a migration rather than an additive provider implementation. Validation: go test -tags "fts5" ./cmd/agentsview ./internal/parser ./internal/sync -run 'TestCollectWatchRootsHermesSessionsWatchesStateDBParent|TestHermesProvider|TestParseHermes|TestProviderMigrationModes|TestObserveProviderSourceMatchesHermesLegacyParser' -count=1; go test -tags "fts5" ./cmd/agentsview ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./cmd/agentsview/... ./internal/parser/... ./internal/sync/...; git diff --check fix(sync): classify hermes archive watcher events Roborev jobs 2715 and 2716 caught that Hermes archive watch roots were subscribed but the legacy SyncPaths classifier still ignored sibling state.db events. That meant live sync could wait for a periodic full sync even though the watcher saw the change. Map configured Hermes archive roots, state.db events, and direct archive transcript events back to the state.db source that processHermes already parses, while preserving transcript-only root classification for standalone Hermes session files. Validation: go test -tags "fts5" ./internal/sync -run TestSyncPathsHermesStateDBEventRefreshesArchive -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Test(HermesProvider|ObserveProviderSourceMatchesHermesLegacyParser|SyncPathsHermesStateDBEventRefreshesArchive)' -count=1; go fmt ./...; go vet ./...; git diff --check fix(sync): include hermes transcripts in archive skips Roborev job 2803 caught that Hermes transcript watcher events could still be suppressed by state.db-only skip metadata after being routed to the archive source. In mixed state-db/transcript archives, state.db can be unchanged while a sibling transcript is new or updated. Use archive-effective size and mtime for state.db skip checks by folding direct transcript files from the sibling sessions directory into the snapshot, and add a regression where a transcript event refreshes an already-indexed archive. Validation: go test -tags "fts5" ./internal/sync -run 'TestSyncPathsHermes(ArchiveTranscriptEventRefreshesArchive|StateDBEventRefreshesArchive)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -run 'Test(HermesProvider|ObserveProviderSourceMatchesHermesLegacyParser|SyncPathsHermes)' -count=1; go fmt ./...; go vet ./...; git diff --check fix(sync): use aggregate hermes archive fingerprints Hermes archive freshness needs the state.db sync path to compare the same aggregate fingerprint it persists. Discovering through the public Hermes session lister reselected state.db and missed sibling transcripts, so state.db events could avoid real skip-cache parity.\n\nEnumerate direct transcript files for the archive snapshot and stamp archive parse results with the aggregate state.db fingerprint before writing. This keeps unchanged archive syncs comparable while still refreshing when sibling transcripts change.\n\nValidation: go test -tags "fts5" ./internal/parser ./internal/sync; go vet ./...; make nilaway fix(sync): apply hermes archive fingerprints consistently Hermes archive refresh paths need to compare and persist the same aggregate fingerprint for state.db plus sibling transcripts. Otherwise cached parse skips and single-session refreshes can fall back to raw state.db metadata and miss transcript-only archive changes. Use the aggregate archive file info before generic skip-cache checks and share the archive parse-and-stamp helper between full archive processing and single-session refreshes. The regression coverage now persists the metadata, checks unchanged archive skips, and covers transcript discovery/removal behavior. Validation: go test -tags "fts5" ./internal/sync -run 'TestHermesArchive|TestProcessFileHermes|TestProcessHermesArchive|TestSyncSingleHermesArchive' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync; go vet ./...; make nilaway refactor(parser): fold hermes into provider Move Hermes source discovery, lookup, and parse ownership onto the concrete hermesProvider and delete the package-level DiscoverHermesSessions, FindHermesSourceFile, ParseHermesArchive, and ParseHermesSession free functions. Discovery and find-source bodies now live as provider-owned helpers (discoverHermesSessions, findHermesSourceFile); parse, archive parse, the state-db reader, and the transcript-archive fallback become hermesProvider methods (parseSession, parseArchive, parseStateDB, parseTranscriptArchive). Reproduce Hermes archive behavior on the provider. The provider's archive Parse now stamps every state.db session with the state.db path plus the aggregate (state.db + direct transcripts) size and mtime, replacing the engine's stampHermesArchiveResults/hermesArchiveEffectiveInfo so a transcript-only change still refreshes the archive's stored freshness. The new provider helpers hermesArchiveEffectiveFileInfo and hermesArchiveTranscriptFiles mirror the legacy engine aggregation (every .jsonl and session_*.json directly under the sessions directory, no dedup). The existing composite archive Fingerprint and archive watch/ classify source-set methods already carried the rest. Make Hermes provider-authoritative and drop its legacy sync dispatch: remove classifyHermesPath (and its hermesSyncArchivePaths, hermesSyncDirExists, hermesSyncTranscriptPath helpers), the processFile hermesArchiveEffectiveInfo stat hook and case arm, processHermes, parseHermesArchive, stampHermesArchiveResults, hermesArchiveEffectiveInfo, hermesArchiveTranscriptFiles, hermesArchiveSourcePaths, and the syncSingleHermesArchive special-case plus its method. Single-session resync of an archive now falls through to the generic provider path, which reparses the whole archive (ForceReplace) the same way a full sync does. Drop the Hermes AgentDef DiscoverFunc/FindSourceFunc hooks (the provider-owned WatchRootsFunc/ShallowWatchRootsFunc stay), remove hermes_provider.go from the pending shim scan list, replace the shadow-baseline test with provider-API coverage plus a guard that the legacy entrypoints stay gone, and route the package and engine archive tests through provider methods and the provider-authoritative processFile/ SyncPaths paths. Add internal/sync/provider_shadow_support_test.go defining the shared writeProviderShadowSourceFile test helper that the remaining vibe shadow test still references, which was orphaned by a predecessor commit. test(sync): drop unused shadow source-file helper The hermes fold left writeProviderShadowSourceFile in a dedicated test support file, but every shadow test writes its fixtures inline, so the helper has no callers and trips the unused linter. Remove the dead scaffolding.
Claude has both regular project transcripts and nested subagent transcripts, plus an existing append-only incremental parser. Moving it behind a concrete provider keeps those source shapes and optional incremental capability explicit at the provider boundary.\n\nThe provider preserves recursive project discovery, symlinked project directories, standard and subagent raw-ID lookup, changed-path classification, content hashing, project-name normalization, excluded-session reporting, relationship inference, and incremental append parsing for linear JSONL growth. fix(parser): preserve claude provider edge events Claude provider sync must distinguish true append idleness from files that were truncated or replaced, and watcher classification must still identify deleted primary and subagent transcripts after the file is gone. Otherwise provider-path sync can retain stale messages or miss removals. Return full-parse status for truncated incremental inputs, add missing-path classification for valid Claude source shapes, and make raw subagent lookup follow symlinked project directories like discovery does. This branch now opts Claude into shadow comparison. Validation: go test -tags "fts5" ./internal/parser -run 'Test(ClaudeProvider|FindClaudeSourceFile|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check fix(sync): replace claude content after file rewrites Claude incremental parsing is append-oriented, so any fallback caused by truncation or file replacement must replace persisted messages instead of flowing through the append-preserving write path. Otherwise stale higher ordinals or stale tool rows can survive a full parse fallback. The provider now marks truncated incremental inputs as force-replace, and the legacy engine path carries forceReplace when file identity changes or the file shrinks before falling back to a full parse. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestClaudeProviderParseIncremental|TestIncrementalSync_Claude(FileReplaced|TruncatedFileReplacesStoredMessages|SameSizeFileReplaceUsesFullParse|MidStreamSplitFallsBackToFullParse|AgentIDFallbackUpdatesStoredToolCall)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check fix(sync): replace claude same-size rewrites A same-size rewrite can reach the full-parse fallback when the normal skip check did not skip the file, which means the content changed even though the byte count did not. That fallback must replace persisted rows, or stale higher ordinals and tool rows can survive the parse. The regression rewrites a Claude file in place to the same byte length with fewer logical messages and verifies the stale assistant row is deleted. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesClaudeLegacyParser|TestClaudeProviderParseIncremental|TestIncrementalSync_Claude(FileReplaced|TruncatedFileReplacesStoredMessages|SameSizeFileReplaceUsesFullParse|SameSizeInPlaceRewriteClearsStaleRows|MidStreamSplitFallsBackToFullParse|AgentIDFallbackUpdatesStoredToolCall)' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check test(sync): compare claude shadow parity Claude is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseClaudeSessionWithExclusions. The fixture exercises the project-directory source shape and verifies session, message, usage, exclusion, and data-version planning parity while preserving provider-computed file hashes. Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesClaudeLegacyParser -count=1 test(sync): cover claude provider usage exclusions Roborev job 2721 caught that the Claude shadow parity fixture only compared a plain exchange, so it did not prove provider parity for per-message token usage or /usage-only session exclusions. Add assistant message usage metadata to the normal fixture and a separate /usage-only source discovered by the provider, then assert non-empty token metadata and excluded IDs against the legacy parser. Validation: go test -tags "fts5" ./internal/sync -run TestObserveProviderSourceMatchesClaudeLegacyParser -count=1; go fmt ./...; go vet ./...; git diff --check refactor(parser): fold claude into provider Move Claude source discovery, lookup, full parse, exclusion handling, and append-only incremental parse ownership onto the concrete claudeProvider and delete the package-level DiscoverClaudeProjects, FindClaudeSourceFile, ParseClaudeSessionFrom, and ParseClaudeSessionWithExclusions free functions. The discover and find-source bodies stay as provider-neutral helpers (ClaudeProjectSessionFiles, claudeFindSourceFile) and the parse bodies become claudeParseWithExclusions and claudeParseSessionFrom; the public ParseClaudeSession wrapper and the Cowork parser (which reuses the Claude transcript format) call the shared helper, so no provider file references a legacy Discover/Find/Parse entrypoint. Make Claude provider-authoritative and drop its legacy sync dispatch: the classifyOnePath Claude block, the processFile case arm, and the processClaude method. Source classification, project resolution, and exclusion handling are reproduced through the provider's changed-path and parse paths. The provider's SourcesForChangedPath also reproduces the legacy "classify despite a transient stat error" behavior so a changed path under a momentarily unreadable parent is not dropped. Wire the provider-authoritative engine path to preserve Claude's DB-aware single-file semantics, which a stateless provider cannot do alone: - tryProviderIncrementalAppend drives the provider's ParseIncremental through the shared tryIncrementalJSONL bookkeeping (session lookup, data-version and inode/device identity guards, ordinal resume, cross-sync split detection, cumulative counters, and forceReplace fallback), so append-only syncs keep the stored file hash and append rows instead of recomputing and rewriting. - providerSingleSessionFresh reproduces the shouldSkipFile gate so an unchanged, already-synced session is skipped instead of re-parsed every full sync and a single-session resync does not reapply a worktree project mapping to an unchanged file. - stampProviderFileIdentity stamps inode/device on parsed results so the incremental path can later detect an atomic file replacement. - processProviderFile honors a caller-supplied file.Project as the source ProjectHint when no explicit ProviderSource was given, so a SyncSingleSession does not revert a user's project override. The engine's expandClaudeDuplicateCandidates and dedupeClaudeDiscoveredFiles stay as provider-neutral engine-level dedup plumbing; expansion now enumerates via ClaudeProjectSessionFiles. The duplicate-candidate expansion and session-ID dedup/precedence behavior is unchanged. Because dropping the Claude DiscoverFunc would otherwise remove Claude from surfaces that gate on DiscoverFunc != nil, parse-diff (engine and CLI flag validation) and the SSH remote resolve script now also include file-based agents that have left legacy-only mode through the provider facade, restoring Claude (and the other already-folded agents) to those surfaces. Drop the Claude AgentDef DiscoverFunc/FindSourceFunc hooks, set its provider migration mode to ProviderAuthoritative, remove claude_provider.go from the pending shim scan list, replace the shadow baseline test with provider-API coverage plus a guard asserting the four legacy entrypoints stay gone, and re-vehicle the generic shadow-mechanism caller tests onto the still-legacy Cowork agent since Claude no longer has a legacy process arm to observe in shadow. refactor(parser): fold ParseClaudeSession onto the Claude provider Delete the ParseClaudeSession free function and route its only production caller (the session upload handler) plus the test suite through the Claude provider's new ParseUploadedTranscript method, exposed via the ClaudeUploadParser interface. Uploads live outside any configured root, so the method parses the staged transcript directly under the caller-supplied project. That project stays authoritative rather than being overridden by the transcript's recorded cwd, matching the prior upload behavior and unlike the discovered-session Parse path. Unexport ClassifyClaudeSystemMessage to classifyClaudeSystemMessage; it is a Claude-internal classifier with no callers outside the package. Both removals clear the last provider-specific legacy parse/classify entrypoints this branch owned. fix(sync): skip fresh claude before fingerprinting The Claude provider migration preserved DB freshness skipping, but only after provider fingerprinting had already hashed the whole transcript. That lost the legacy cheap size/mtime/data-version gate for unchanged files.\n\nRun the single-session freshness check before provider fingerprinting, and pass the computed fingerprint into incremental parsing so truncation detection can distinguish appended files from zero-byte rewrites. Zero-byte truncation now forces a full replacement parse instead of reporting no new data.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestClaudeProviderParseIncremental(Truncated|EmptyTruncation)NeedsFullParse' -count=1; go test -tags "fts5" ./internal/sync -run 'TestIncrementalSync_ClaudeAppend|TestProcessFileProviderAuthoritativeSkipsFreshClaudeBeforeFingerprint' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
Cowork stores Claude-shaped transcripts behind local-agent metadata, so the provider boundary needs to preserve that metadata-to-transcript relationship instead of treating the files as plain Claude JSONL sources. The concrete provider keeps shallow metadata watching, metadata change classification, subagent transcript discovery, raw/full ID lookup, composite mtime freshness, and hash propagation explicit for the sync path. fix(parser): cover cowork nested watch events Cowork metadata and transcripts live below org/workspace/session directories, so a shallow root watch could not deliver the paths the provider claimed to classify. Deleted metadata also lost the JSON needed to resolve the transcript, leaving stale provider state after remove or rename events. Make the watch plan recursive for Cowork source globs, recover deleted metadata from the local session directory shape, cover removed metadata/main/subagent paths, and move Cowork into shadow comparison as its branch-local migration step. Validation: go test -tags "fts5" ./internal/parser -run 'Test(CoworkProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check fix(parser): reject ambiguous cowork metadata removal Deleted Cowork metadata can only be recovered from the local session directory shape. If that directory contains multiple main transcripts, choosing the first filesystem match would attach the event to an arbitrary source and leave the real stale source unresolved. Refuse ambiguous deleted-metadata recovery unless exactly one main transcript is present, and cover the multi-transcript case. The regular single-transcript metadata removal path remains supported. Validation: go test -tags "fts5" ./internal/parser -run 'Test(CoworkProvider|ProviderMigrationModes)' -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check fix(parser): validate cowork deleted metadata candidates Cowork metadata deletion recovery scans project directories after the metadata file is gone, so it cannot rely on the normal metadata-guided resolution path. It still needs the same transcript validity rules as normal discovery: regular files only, and symlink targets must stay inside the local session directory. Apply that validation before selecting or counting fallback candidates so symlink escapes are ignored and broken symlinks do not create false ambiguity. Validation: go test -tags "fts5" ./internal/parser -run 'TestCoworkProvider|TestResolveCoworkSessionRejectsSymlinkEscape|TestClassifyCoworkPath|TestParseCowork' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check test(sync): compare cowork shadow parity Cowork is a sidecar-backed Claude transcript provider, so add source-level migration coverage that compares provider observation with ParseCoworkSession. The fixture includes local-agent metadata plus the nested Claude transcript and verifies session, messages, usage, excluded IDs, and data-version planning parity while preserving provider-computed hashes. Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesCoworkLegacyParser|TestCoworkProvider|TestParseCowork|TestClassifyCoworkPath' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...; git diff --check refactor(parser): fold cowork into provider Move Cowork source discovery, lookup, parse, and changed-path classification onto the concrete coworkProvider and delete the package-level DiscoverCoworkSessions, FindCoworkSourceFile, ParseCoworkSession, and ClassifyCoworkPath free functions. Discovery and find-source bodies now live as provider-owned helpers (discoverTranscriptPaths, coworkFindSourceFile), parseSession is a receiver method, and the metadata-to-transcript classifier moves onto SourcesForChangedPath as classifyCoworkPath so a sibling local_<uuid>.json change still resolves to the session's main transcript. Make Cowork provider-authoritative and drop its legacy sync dispatch: the classifyOnePath cowork block, the processFile case arm, and the processCowork method. The sibling-meta composite freshness is preserved on the provider's Fingerprint, which already folds CoworkSessionMtime (the max of transcript and metadata mtime) into the freshness identity so a title-only rename triggers a reparse through processProviderFile. CoworkSessionMtime stays exported and the engine's skip-cache and SourceMtime watcher-fallback blocks keep calling it, mirroring how the commandcode fold retained commandCodeEffectiveInfo. Replace the legacy free-function tests with provider API coverage plus a guard asserting the four entrypoints stay gone, drop the shadow-baseline comparison test, relocate the shared writeProviderShadowSourceFile helper into provider_shadow_support_test.go, and remove cowork_provider.go from the pending-shim scan list. test(sync): drop obsolete cowork shadow-legacy tests Folding cowork into its provider removes its legacy processFile arm, so the two shadow-compare tests that built fixtures via the deleted parser.ParseCoworkSession and asserted a legacy result coexisting with the shadow provider can no longer pass: a non-authoritative cowork file now falls through to the unknown-agent default. The shadow machinery keeps coverage through provider_shadow_test.go and the cached-skip not-comparable case. fix(sync): skip fresh cowork provider sources Cowork moved behind the provider-authoritative sync path, but the migrated path still fingerprinted and parsed unchanged transcripts before checking the stored file metadata. That dropped the cheap DB freshness gate the legacy Cowork path relied on and made full syncs rewrite fresh sessions unnecessarily.\n\nRestore that gate for Cowork before provider fingerprinting, using the same transcript size plus CoworkSessionMtime identity stored in the database. Per-file force parses still bypass the gate so metadata-driven refreshes and explicit reparses continue to reach the provider.\n\nValidation: go test -tags "fts5" ./internal/sync -run 'TestProcessFileProviderAuthoritative(SkipsFreshCoworkBeforeFingerprint|ForceParseBypassesFreshCoworkSkip)|TestSyncAllSinceCoworkMetaUpdateTriggersResync|TestSyncPathsCoworkReplacesUpdatedMessageOrdinal' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go vet ./...; git diff --check
…scan list The Claude provider migration routes parse-diff through the provider path, which regressed live-write skew detection: a concurrently rewritten source was classified as Changed instead of Raced, tripping --fail-on-change on a daemon write. Gate raced-source reliability on parseDiffAgentDiscoverable so provider-folded agents keep the raced reclassification. Also clear pendingShimProviderFiles: every provider in this stack is folded on the branch that introduces it, so no provider file is a standing shim and the exempt list must be empty.
roborev: Combined Review (
|
fcaeb45 to
5ee99e5
Compare
f5a857f to
9aeeb1e
Compare
roborev: Combined Review (
|
Migrates the providers with bespoke source shapes onto concrete facade providers: openhands (directory snapshot), cursor, vibe, hermes, claude (recursive transcripts with incremental append parsing and subagents), and cowork (reuses the Claude transcript format).
Because Claude's migration routes parse-diff through the provider path, this branch also folds in the parse-diff raced-source reclassification — gating reliability on
parseDiffAgentDiscoverable— so a concurrent daemon write is classified Raced rather than Changed and cannot trip--fail-on-change.Where to look:
claude_provider.go(the largest change) and the parse-diff change ininternal/sync/parsediff.go.