Skip to content

feat(parser): migrate Qwen + QwenPaw providers#879

Open
mariusvniekerk wants to merge 3 commits into
fam/cli-jsonlfrom
fam/qwen-family
Open

feat(parser): migrate Qwen + QwenPaw providers#879
mariusvniekerk wants to merge 3 commits into
fam/cli-jsonlfrom
fam/qwen-family

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Migrates Qwen and QwenPaw onto facade providers, preserving their nested session-directory layouts and raw-session lookup behavior.

@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (56db670)

Medium findings remain for Qwen provider parity; no security issues were identified.

Medium

  • internal/parser/qwen_provider.go:23 - The migrated Qwen source set never enables content hashing. qwenParseFile only copies req.Fingerprint.Hash, but Fingerprint will not populate it, so Qwen syncs now clear or omit sessions.file_hash even though the removed legacy processQwen always computed it.

    • Fix: Add WithContentHashing() to newQwenSourceSet and cover the real provider.Fingerprint path in the provider test.
  • internal/parser/qwen_provider.go:23 - Qwen no longer supports explicit resync from an existing DB file_path outside the current configured roots. FindSourceFile can still return that stored path, but provider-authoritative processing then calls FindSource and JSONLSourceSet rejects paths not under Roots, causing SyncSingleSession to fail where legacy processQwen parsed the stored path directly.

    • Fix: Add a Qwen WithStoredPathFallbackRoot that derives the implicit projects root from <root>/<encoded-project>/chats/<session>.jsonl, mirroring the QwenPaw fallback behavior.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 5m52s), codex_security (codex/security, done, 2m4s) | Total: 8m5s

@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (e2596f2)

Summary verdict: Medium issues remain in the Qwen provider migration; no security regressions were reported.

Medium

  • Location: internal/parser/qwen_provider.go:23
    Problem: The Qwen provider does not enable content hashing, so real syncs call provider.Fingerprint() with an empty Hash, and qwenParseFile persists an empty file_hash. The legacy processQwen always computed ComputeFileHash, so this can overwrite existing Qwen hashes with NULL.
    Fix: Add WithContentHashing() to newQwenSourceSet and cover the provider Fingerprint -> Parse path in the test.

  • Location: internal/parser/qwen_provider.go:23
    Problem: Qwen provider lookup only resolves stored paths under currently configured roots. SyncSingleSessionContext still prefers the DB file_path, but if that existing stored path is outside the current QWEN_PROJECTS_DIR, processProviderFile cannot resolve it and returns qwen provider source not found; legacy Qwen parsed that stored path directly.
    Fix: Add a Qwen WithStoredPathFallbackRoot equivalent that derives the implicit root from <root>/<encoded-project>/chats/<session>.jsonl, validates the shape, and stats the file.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 5m53s), codex_security (codex/security, done, 3m53s) | Total: 9m55s

@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (9b703fc)

Summary verdict: one Medium issue remains; no security vulnerabilities were reported.

Medium

  • internal/parser/qwen_provider.go:23
    The Qwen provider cannot resolve a DB-stored file_path that exists outside the currently configured QWEN_PROJECTS_DIR. FindSourceFile can still return that stored path, but provider processing then fails because JSONLSourceSet only accepts paths under configured roots unless a stored-path fallback is registered.
    Fix: Add a Qwen WithStoredPathFallbackRoot implementation that derives the implicit projects root from <root>/<encoded-project>/chats/<session>.jsonl, validates the shape, and confirms the file exists.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 5m39s), codex_security (codex/security, done, 3m17s) | Total: 9m3s

@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (8e3814e)

Medium finding only:

  • Medium internal/parser/qwen_provider.go:23
    The Qwen provider cannot resolve a DB-stored file_path that still exists but is outside the currently configured QWEN_PROJECTS_DIR. SyncSingleSession still prefers the stored path, but JSONLSourceSet.FindSource only accepts paths under configured roots unless a StoredPathFallbackRoot is provided, so migrated Qwen sessions can fail to re-sync after config/root changes.
    Fix: Add a Qwen stored-path fallback that derives the implicit projects root from <root>/<encoded-project>/chats/<session>.jsonl, validates the shape and file existence, and add a single-session regression test like the QwenPaw one.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 7m42s), codex_security (codex/security, done, 2m16s) | Total: 10m5s

Qwen uses a nested project/chats JSONL source shape, so it is a good next provider facade slice after the shallow and directory JSONL migrations. Moving it behind a concrete provider keeps discovery, lookup, fingerprinting, and parse output explicit without introducing another source framework.

Legacy discovery accepts any one-level .jsonl file under chats while raw-session lookup still validates the requested ID before matching filename-derived IDs. The provider keeps that asymmetry, symlinked project directory and file behavior, project hints, and existing parser normalization intact.

Validation: go fmt ./...; go test -tags "fts5" ./internal/parser -run TestQwenProvider -count=1; go test -tags "fts5" ./internal/parser -count=1; make test-short; go vet ./...; git diff --check

test(parser): opt qwen into provider shadow

Qwen now has a concrete facade provider on this branch, so its migration mode should enter shadow comparison instead of remaining an additive legacy-only provider.

Lower provider opt-ins stay inherited and later branches remain responsible for their own concrete providers.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare qwen shadow parity

Qwen is shadow-compared on this branch, so add the source-level migration proof that provider observation matches ParseQwenSession for its nested project/chats layout.

This keeps reviewers focused on behavioral parity while later branches continue migrating their own provider shapes.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesQwenLegacyParser|TestQwenProvider|TestParseQwen' -count=1; go fmt ./...; go vet ./...; git diff --check; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold qwen into provider

Qwen already had a concrete provider, but the branch still kept exported legacy parser/source functions and legacy sync dispatch. That left the migration additive instead of making the provider shape authoritative.\n\nMove Qwen parsing behind the provider method, remove the registry callbacks and sync processor/classifier, and replace the shadow comparison with provider API coverage plus a guard that the old entrypoints stay gone.\n\nValidation: go test -tags "fts5" ./internal/parser -run 'TestQwen|TestParseQwenSession' -count=1 -v; go test -tags "fts5" ./internal/sync -run 'TestEngine_ClassifyPathsQwenSession|TestProviderMigration|TestObserveProvider|TestSyncSingle.*Qwen|TestQwen' -count=1 -v; go test -tags "fts5" ./internal/parser ./internal/sync ./cmd/agentsview -count=1; go vet ./...; git diff --check

fix(parser): thread ctx through qwen source lookups
QwenPaw stores rewritten JSON session snapshots under workspace-scoped sessions directories, with a second one-level subdirectory namespace for console sessions. Moving it behind a concrete provider keeps those raw ID shapes explicit while continuing to use the shared file-source mechanics where they fit.

The provider preserves workspace project hints, root and console source lookup, hidden/deeper layout rejection, symlinked workspace discovery, content-hash fingerprinting, and force-replace parse semantics for rewritten session files.

fix(parser): prune qwenpaw source traversal

QwenPaw legacy discovery only walks valid workspace directories, the sessions directory, and one real non-hidden namespace below sessions. The provider migration reused the generic recursive JSONL walker, which preserved emitted source filtering but still allowed traversal and event classification through deeper or symlinked session namespaces.

Add a shared traversal predicate to the JSONL source helper so providers can keep discovery and changed-path classification aligned when their source layouts are narrower than an unbounded recursive scan.

test(parser): opt qwenpaw into provider shadow

QwenPaw now has a concrete facade provider on this branch, so its migration mode should enter shadow comparison instead of remaining legacy-only and additive.

Earlier provider opt-ins stay inherited and later branches own their modes.

Validation: go test -tags "fts5" ./internal/parser -run TestProviderMigrationModes -count=1; go test -tags "fts5" ./internal/parser -count=1; go vet ./...; git diff --check

test(sync): compare qwenpaw shadow parity

QwenPaw is shadow-compared on this branch, so add source-level migration coverage that compares provider observation with ParseQwenPawSession.

The test covers both root session files and console subdirectory session files so the path-derived workspace/session IDs and planned data-version behavior stay visible during review.

Validation: go test -tags "fts5" ./internal/parser ./internal/sync -run 'TestObserveProviderSourceMatchesQwenPawLegacyParser|TestQwenPawProvider|TestParseQwenPaw|TestSyncSingleSession_QwenPaw|TestWriteBatchQwenPaw' -count=1; go test -tags "fts5" ./internal/parser ./internal/sync -count=1; go fmt ./...; go vet ./...; git diff --check; ./custom-gcl run --config .golangci.nilaway.yml ./internal/parser/... ./internal/sync/...

refactor(parser): fold qwenpaw into provider

Move QwenPaw parse and source-lookup ownership onto the concrete
qwenPawProvider and delete the package-level legacy entrypoints
(DiscoverQwenPawSessions, FindQwenPawSourceFile, ParseQwenPawSession)
plus their unexported discovery/traversal helpers.

ParseQwenPawSession becomes the provider parseSession method, and the
rawID resolution from FindQwenPawSourceFile moves onto the provider as
sourceFileForRawID with its traversal guard. The provider's existing
JSONLSourceSet already reproduces workspace/sessions/console discovery,
symlink handling, and hidden-subdir pruning, so the legacy
DiscoverQwenPawSessions free function is dropped entirely.

Route QwenPaw sync classification and processing through the
provider-neutral runtime by removing its legacy engine dispatch: the
classifyOnePath workspace/sessions block, the processFile case arm, and
the processQwenPaw method. The provider's SourcesForChangedPath and
forceReplace-on-parse capability preserve changed-path remapping and the
full-rewrite write semantics the legacy path provided.

Make QwenPaw provider-authoritative in the migration manifest, drop its
AgentDef DiscoverFunc/FindSourceFunc hooks, remove it from the pending
shim scan list, and replace the shadow-baseline test with provider API
coverage plus a guard that the legacy symbols stay gone.

To preserve single-session resync parity, FindSource now resolves a
DB-stored file_path that points outside any configured QWENPAW_DIR by
synthesizing a source from the path's implicit
<root>/<workspace>/sessions/ layout, recovering the workspace as
ProjectHint so a reparse keeps the canonical qwenpaw:<workspace>:<stem>
ID instead of orphaning it under an empty workspace.

fix(parser): thread ctx through qwenpaw source lookups
Update qwen and qwenpaw provider call sites to the exported source-set framework API.
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (f1b3f3a)

Summary verdict: two medium regressions need fixing; no high or critical findings were reported.

Medium

  • internal/parser/qwen_provider.go:23 - The migrated Qwen provider does not enable WithContentHashing(), so qwenParseFile receives an empty fingerprint hash and stops populating Session.File.Hash. The old processQwen path computed ComputeFileHash and persisted file_hash; this migration will clear existing Qwen file_hash values on the next sync.

    Fix: Add WithContentHashing() to newQwenSourceSet and cover provider.Fingerprint/parse hash propagation in Qwen provider tests.

  • internal/parser/qwen_provider.go:23 - Qwen single-session resyncs can no longer use an existing DB file_path that is outside the currently configured Qwen roots. FindSourceFile still returns the stored path when it exists, but the authoritative provider then re-resolves it through JSONLSourceSet, which only accepts paths under configured roots; the legacy parser processed the stored path directly.

    Fix: Add a Qwen WithStoredPathFallbackRoot implementation that derives the implicit projects root from <root>/<encoded-project>/chats/<session>.jsonl, mirroring the QwenPaw fallback behavior.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 5m38s), codex_security (codex/security, done, 3m43s) | Total: 9m30s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant