fix(cua-driver): let start_session revive an idle-reaped session (recover from "session ended") by LaZzyMan · Pull Request #2035 · trycua/cua

LaZzyMan · 2026-06-26T03:50:14Z

Problem

A caller-declared session reclaimed by the idle-TTL sweep is permanently dead, and start_session cannot bring it back — so a run that goes quiet for >5 min (the agent reasoning / writing code without calling an MCP tool) loses the driver entirely.

Seen in agent trajectories: a run does start_session{session:"X"}, works fine for ~80 tool calls (all carrying session:"X"), then idles past the TTL. After that every MCP tool — including start_session{session:"X"} itself — returns session ended; tool call ignored, with no way to recover.

Root cause

Three things compound (cua-driver-core/src/session.rs, cua-driver/src/serve.rs):

evict_idle marks an idle session ended by inserting it into ENDED_SESSIONS, which is never cleared.
touch_session no-ops on an ended id, so it can't re-arm one.
The daemon's resurrection guard (serve.rs "call" arm) rejects any call whose session is_session_ended — including start_session, which is itself a normal tool call routed through that same arm. So the one tool meant to (re)start a session is blocked before its invoke ever runs.

Net: once X is reaped, is_session_ended("X") is true forever and nothing flips it back.

Fix

session::revive_session — clears the ended tombstone + re-arms the idle-TTL clock. A superset of touch_session: equivalent for a live/new id; resumes an ended one.
start_session tool — calls revive_session instead of touch_session, so it can resume a reaped run.
serve.rs — exempts start_session from the resurrection guard in both dispatch arms so it actually runs (it is the recovery path); every other tool stays guarded so a stale in-flight call can't rebuild reaped state.
The guard message now tells the agent how to recover.

revive semantics: a later end_session re-fires the cleanup hooks (they key on ENDED_SESSIONS), which is correct — a fresh lifecycle for the id.

This is the core correctness fix and is independent of the TTL value: a short TTL just triggers reaping more often — with revival that's recoverable instead of fatal. Whether to also raise the 300 s default (CUA_DRIVER_RS_SESSION_IDLE_TTL_SECS) is a separate tuning call, intentionally left out.

Tests

New revive_resumes_an_ended_session proves an ended id is usable again after revive; the sweep-using session tests are serialized so a cross-test zero-TTL reap can't flake them.

Verification status

✅ cargo test -p cua-driver-core green (101 passed, incl. the new revive test).
✅ cargo check -p cua-driver green.
⚠️ No end-to-end daemon test: the cua-driver binary doesn't link on my machine (unrelated platform-macos Swift/Metal interop vs the local SDK), and I can't re-run the eval that surfaced this. The core logic is unit-tested and the serve.rs guard exemption is a one-line condition. A daemon-level smoke (reap a session → start_session → confirm subsequent calls work) would be a good addition on infra that can build the binary.

Summary by CodeRabbit

New Features
- You can now resume a previously ended or idle-reclaimed session by calling start_session again with the same session ID.
- Session recovery now works even after a session has been marked ended, instead of being permanently unavailable.
Bug Fixes
- Improved handling of ended sessions so non-resume tool calls return clearer guidance on how to continue.
- Fixed session lifecycle behavior so revived sessions can be used normally again and ended afterward.

A caller-declared session reclaimed by the idle-TTL sweep (the agent went quiet >5min while reasoning / writing code) was permanently dead: ENDED_SESSIONS is never cleared, touch_session no-ops on an ended id, and the daemon's resurrection guard rejected every subsequent call — including start_session itself, which routes through the same "call" arm. Once a run idled out, every MCP tool returned "session ended; tool call ignored" with no way back. - session::revive_session: clear the ended tombstone + re-arm the idle-TTL clock so an explicit start_session resumes the run (a superset of touch_session: no-op-equivalent for a live/new id). - start_session tool: call revive_session instead of touch_session. - serve.rs: exempt start_session from the resurrection guard in both dispatch arms so it actually runs (it is the recovery path); other tools stay guarded. - the guard message now tells the agent how to recover. revive is unit-tested (revive_resumes_an_ended_session); the sweep-using session tests are serialized so a cross-test zero-TTL reap can't re-end a freshly revived id. cargo test -p cua-driver-core green; cargo check -p cua-driver green.

chatgpt-codex-connector · 2026-06-26T03:50:18Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

vercel · 2026-06-26T03:50:20Z

@LaZzyMan is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-06-26T03:50:42Z

📝 Walkthrough

Walkthrough

The PR adds revive_session, updates start_session to call it, and changes daemon call dispatch so ended sessions can resume via start_session on Unix and Windows. Session tests now serialize TTL sweeps and cover revive/end lifecycle behavior.

Changes

Session revival flow

Layer / File(s)	Summary
Session revival state `libs/cua-driver/rust/crates/cua-driver-core/src/session.rs`	Adds `revive_session`, serializes TTL-sweep tests, and extends session lifecycle coverage for ending, reviving, sweeping, and ending again.
start_session revival `libs/cua-driver/rust/crates/cua-driver-core/src/session_tools.rs`	Updates the `start_session` description and switches invocation from `touch_session` to `revive_session`.
Ended-session dispatch `libs/cua-driver/rust/crates/cua-driver/src/serve.rs`	Allows `start_session` through the ended-session guard on Unix and Windows and changes the blocked-call response text for other tools.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant call_dispatch as "serve.rs call dispatch"
  participant start_tool as "StartSessionTool::invoke"
  participant revive_session as "session::revive_session"
  participant ENDED_SESSIONS
  participant SESSION_ACTIVITY

  Client->>call_dispatch: call(tool_name, effective_session=session_id)
  alt tool_name == "start_session"
    call_dispatch->>start_tool: invoke(session_id)
    start_tool->>revive_session: revive_session(session_id)
    revive_session->>ENDED_SESSIONS: remove tombstone
    revive_session->>SESSION_ACTIVITY: store fresh timestamp
    start_tool-->>call_dispatch: session resumes
    call_dispatch-->>Client: ok
  else tool_name != "start_session" and session ended
    call_dispatch-->>Client: ok + call start_session with same session
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibbled the tombstone, hop and cheer,
revived the session, crystal clear.
The idle clock gave one more beat,
and start_session hopped to its seat.
With a tiny thump, the run’s alive!
✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: allowing start_session to revive an idle-reaped session.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/cua-driver/rust/crates/cua-driver-core/src/session.rs`:
- Around line 138-152: revive_session currently clears the session-layer
tombstone but leaves the overlay-layer tombstone in place, so revived sessions
still get blocked by the cursor renderer. Update revive_session in session.rs to
also clear the overlay state for the same session_id, ideally by introducing or
invoking a dedicated revive hook that removes the key from RenderState::ended in
platform-macos/src/overlay.rs. Make sure the fix aligns with the existing
end-session cleanup path so revived sessions can start rendering cursors again
without changing the trackable-session guard behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a03f76d9-2fd4-452c-a837-f2286fe6501b

📥 Commits

Reviewing files that changed from the base of the PR and between e08574c and ad8cbb4.

📒 Files selected for processing (3)

libs/cua-driver/rust/crates/cua-driver-core/src/session.rs
libs/cua-driver/rust/crates/cua-driver-core/src/session_tools.rs
libs/cua-driver/rust/crates/cua-driver/src/serve.rs

coderabbitai · 2026-06-26T03:54:43Z

+pub fn revive_session(session_id: &str) {
+    if !is_trackable(session_id) {
+        return;
+    }
+    // Clear the permanent tombstone so `is_session_ended` reads false again and
+    // the daemon's resurrection guard stops rejecting calls for this id. A
+    // later `end_session` re-fires the cleanup hooks (fire_session_end is keyed
+    // on this set), which is correct — it's a fresh lifecycle for the id.
+    ended_sessions().lock().unwrap().remove(session_id);
+    // Re-arm the idle-TTL clock so the revived session is tracked again.
+    activity()
+        .lock()
+        .unwrap()
+        .insert(session_id.to_owned(), Instant::now());
+}


🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1. Find revive_session and any hooks it fires. rg -nP -C3 '\bfn revive_session\b' libs/cua-driver/rust/crates # 2. Show how the overlay `ended` tombstone is mutated — is it ever removed from? fd -t f overlay.rs libs/cua-driver/rust/crates/platform-macos \ --exec rg -nP -C3 '\bended\b' # 3. Look for any revive/resume hook registration that could clear overlay state. rg -nP -C2 'revive|resume_session|register_session_(begin|revive)_hook' \ libs/cua-driver/rust/crates

Repository: trycua/cua

Length of output: 10236

revive_session clears the session tombstone but not the overlay-layer cursor tombstone — revived sessions will fail to render cursors.

The cursor overlay (platform-macos/src/overlay.rs) maintains a separate ended: HashSet<CursorKey> that permanently blocks command processing for tombstoned keys. As seen in lines 143–145 and 255–257 of overlay.rs, any Cmd or seed_start for a key in this set is silently dropped to prevent "ghost-cursor resurrection."

revive_session in session.rs (lines 138–152) correctly clears the session-layer tombstone (ended_sessions) but fails to clear the overlay-layer tombstone. Consequently:

The daemon accepts tool calls for the revived session.

The overlay guard continues to drop all cursor commands for that session_id.

The cursor never reappears, violating the "RESUMES the session" contract in session_tools.rs (line 79).

Add a call to clear the overlay tombstone (e.g., via a fire_session_revive hook that removes the session_id from RenderState::ended) to ensure the render layer respects the session lifecycle reset.

Relevant code sections

session.rs:

pub fn revive_session(session_id: &str) { if !is_trackable(session_id) { return; } // ... ended_sessions().lock().unwrap().remove(session_id); // Clears session tombstone // MISSING: Clear overlay tombstone here activity() .lock() .unwrap() .insert(session_id.to_owned(), Instant::now()); }

overlay.rs (guard logic):

// Line 143-145 if map.ended.contains(&key) { return None; // Drops command silently }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libs/cua-driver/rust/crates/cua-driver-core/src/session.rs` around lines 138 - 152, revive_session currently clears the session-layer tombstone but leaves the overlay-layer tombstone in place, so revived sessions still get blocked by the cursor renderer. Update revive_session in session.rs to also clear the overlay state for the same session_id, ideally by introducing or invoking a dedicated revive hook that removes the key from RenderState::ended in platform-macos/src/overlay.rs. Make sure the fix aligns with the existing end-session cleanup path so revived sessions can start rendering cursors again without changing the trackable-session guard behavior.

Ports the fix from upstream trycua/cua#2035 into the vendored driver. When a session is reaped for idleness, a subsequent start_session with the same id failed instead of resuming it. Revive the ended session in place so the agent can continue rather than getting a hard error.

…coordinates (QwenLM#5896) * feat(cua-driver): vendor trycua/cua driver with 1000-normalized coordinate support Vendor libs/cua-driver from trycua/cua into packages/cua-driver as the basis for qwen-code's computer-use backend, adding an opt-in relative (1000x1000 normalized) coordinate mode for Qwen-VL clients. - coord_norm.rs: 0-1000 <-> pixel conversion, per-(pid,window_id) size cache, tools/list description rewrite (TDD, 27 tests) - ToolRegistry: normalized field + invoke input/output hooks - protocol.rs: system-instruction coordinate wording switched by mode - serve.rs: daemon list path description rewrite (input_schema aware) - main.rs: CUA_DRIVER_RS_COORDINATE_SPACE env seed Default coordinate_space=pixels => zero behavior change for existing pixel clients. Set CUA_DRIVER_RS_COORDINATE_SPACE=normalized_1000 to enable. Excludes rust/target build output. * feat(cua-driver): make normalized coordinate scale configurable Add CUA_DRIVER_RS_COORDINATE_SCALE (default 1000) so the normalization full-scale can absorb the Qwen 999-vs-1000 cookbook ambiguity without a recompile. norm_to_px/px_to_norm now take an explicit scale; denormalize_args reads the process-wide COORDINATE_SCALE seeded once at startup from env. * ci(cua-driver): add cross-platform release workflow for vendored driver Standalone GitHub Action that builds, signs, and releases the vendored cua-driver under packages/cua-driver. Adapted from upstream trycua/cua cd-rust-cua-driver.yml: - macOS: universal binary (lipo arm64+x86_64), codesigned + notarized into CuaDriver.app using qwen-code's existing secrets (MAC_CSC_LINK cert + App Store Connect API key notarization); Developer ID identity is auto-discovered from the imported cert. - Linux: x86_64 + arm64, built in debian:11 for a glibc 2.31 floor. - Windows: x86_64 + arm64, unsigned (no EV cert, matches upstream). - Release: softprops/action-gh-release on cua-driver-rs-v* tags or manual dispatch, prerelease. Triggered by tag push (cua-driver-rs-v*) or workflow_dispatch. * chore(cua-driver): rebrand vendored driver as qwen-cua-driver Rename the vendored trycua/cua driver so the fork installs and runs independently of any upstream trycua install: - binary cua-driver -> qwen-cua-driver - bundle CuaDriver.app -> QwenCuaDriver.app - bundle id com.trycua.driver -> com.qwencode.cua-driver Updates the cargo/uia manifests, Info.plist, bundle/proxy launch paths, permission/health-report wording, the install/build scripts, and the cross-platform release workflow. * feat(cua-driver): finish relative-coordinate mode — toggle, scale, zoom/move_cursor - CUA_DRIVER_RS_COORDINATE_SPACE is now a 1/0 toggle (via is_env_truthy); default off keeps pixel mode byte-identical to upstream. - Thread CUA_DRIVER_RS_COORDINATE_SCALE through every coordinate surface (was hardcoded 1000): input denormalization already used it; now the rewritten screenshot dims, the tool/param descriptions, and the agent instructions track the configured scale too. - Normalize zoom (window basis) and move_cursor (screen basis) inputs and rewrite their descriptions, alongside click/double_click/right_click/drag. - Fix zoom on downscaled (Retina) windows: apply the get_window_state resize ratio so the crop lands on the region the agent saw. Normalized mode only; pixel-mode zoom unchanged. All coordinate behavior stays gated on the normalized flag, so the default (pixels) path is unchanged from upstream. * chore(cua-driver): add upstream-sync script (git subtree unusable here) `git subtree split --prefix=libs/cua-driver` hangs on a commit deep in trycua/cua's history, so the subtree add/pull workflow isn't usable for the vendored driver (and a pull would re-split + re-hang every time). Add scripts/sync-from-upstream.sh instead: it git-diffs two upstream refs (never walks the full history, so it dodges the hang), reprefixes the libs/cua-driver delta to packages/cua-driver, and `git apply --reject`s it on top of our local changes — conflicts land as *.rej for manual fixup. Record the vendored version in .vendored-from and document the migration + sync method in the design doc. * chore(cua-driver): exclude vendored driver from qwen-code ESLint The vendored packages/cua-driver tree carries upstream JS (e.g. the test-harness Electron app) that doesn't follow qwen-code's lint rules and fails CI. It is not a workspace package (no package.json) and is not qwen-code TypeScript, so add it to eslint.config.js global ignores — alongside packages/desktop/** — the standard treatment for vendored code. * fix(cua-driver): let start_session revive an idle-reaped session Ports the fix from upstream trycua/cua#2035 into the vendored driver. When a session is reaped for idleness, a subsequent start_session with the same id failed instead of resuming it. Revive the ended session in place so the agent can continue rather than getting a hard error. * fix(cua-driver): retry daemon socket writes on EAGAIN Ports the fix from upstream trycua/cua#2036 into the vendored driver. A non-blocking daemon socket can return EAGAIN/EWOULDBLOCK mid-write when the peer's receive buffer is momentarily full. The driver treated that as fatal and dropped the connection. Add a bounded retry/poll loop (mirror of the read-side socket_io helper) so transient back-pressure no longer kills the session; only a real timeout or hard error fails the write. * fix(cua-driver/linux): stop reporting bare "Clicked" for X11 synthetic clicks Ports the fix from upstream trycua/cua#2025 into the vendored driver. On X11, clicks are delivered via XSendEvent synthetic events, which many toolkits (GTK/SDL/Allegro) ignore because send_event is set. The driver still reported a flat success ("Clicked"), masking that nothing happened. Report the synthetic-delivery caveat honestly so the agent can fall back instead of assuming the click landed. (platform-linux crate is not built on macOS; verified by clean upstream apply and covered by upstream + release-workflow Linux CI.) * fix(cua-driver/windows): list empty-/null-title top-level windows Ports the fix from upstream trycua/cua#2021 into the vendored driver. list_windows filtered out any top-level window whose title was empty or null, so legitimate targets (splash screens, some Electron/game windows, tool windows) were invisible to the agent and unclickable. Include empty-title windows, using class name / process as a fallback label. (platform-windows crate is not built on macOS; verified by clean upstream apply and covered by upstream + release-workflow Windows CI.) * chore(cua-driver): track cherry-picked upstream PRs; fix vendored-from The vendored copy is actually at cua-driver-rs-v0.6.7 (workspace version and all 0.6.7->0.6.8 delta files confirm it), but .vendored-from had drifted to 0.6.8 during an earlier sync-script trial whose code delta was not kept. Left as-is it would make a future sync diff 0.6.8->newer and silently skip the real 0.6.7->0.6.8 fixes. Correct it back to 0.6.7. Also record the four not-yet-merged upstream PRs we carry as cherry-picks (trycua/cua#2021/QwenLM#2025/QwenLM#2035/QwenLM#2036) in .vendored-patches.md, and have sync-from-upstream.sh point at it so the next sync reconciles them. * ci(cua-driver): satisfy repo yamllint on the release workflow The vendored-driver release workflow tripped 114 quoted-strings violations under the repo's .yamllint (quote-type: single, required). Single-quote all string scalars to match every other workflow in .github/workflows. While reformatting, the release-notes body also got its paragraph blank lines collapsed and still referenced the old CUA_DRIVER_RS_COORDINATE_SPACE= normalized_1000 value — restore the blank lines and update it to the current 0/1 toggle (default 0 = off; optional CUA_DRIVER_RS_COORDINATE_SCALE=1000). * chore(cua-driver): sync vendored driver to cua-driver-rs-v0.6.8 First real run of scripts/sync-from-upstream.sh: it 3-way-applied the upstream 0.6.7->0.6.8 delta onto our local fork. 10/12 files applied cleanly; the 2 rejects (install.ps1, _install-rust.sh) were already-applied baked-version bumps (0.6.6->0.6.7, our copies were already at 0.6.7), i.e. no real conflict. 0.6.8 brings: Wayland input path (platform-linux), linux health_report + overlay tweaks, a platform-macos build.rs step, and dependency bumps. Version moved to 0.6.8 across the workspace. Verified our work survived the sync untouched: the relative-coordinate shim (coord_norm/protocol) and all four cherry-picked PRs (socket_io/session + linux/windows) are intact — in particular the 0.6.8 edit to platform-linux tools/impl_.rs landed alongside our QwenLM#2025 change with no collision. macOS cargo check + 132 core tests green. (platform-linux/windows + the binary integration test build only on their own runners; upstream CI covers those.) * ci(cua-driver): add a dry_run gate to the release workflow Mirror the desktop-release / release dry-run pattern: a workflow_dispatch dry_run boolean input (default true). The cross-platform build + package jobs always run and upload their artifacts; the GitHub Release job now publishes only on a tag push or an explicit dry_run=false dispatch. Lets us rehearse the whole build/package pipeline (dry_run=true, notarize=false) and inspect the produced artifacts without cutting a release. A branch push (no tag, not a dispatch) likewise builds without releasing.

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(cua-driver): let start_session revive an idle-reaped session (recover from "session ended")#2035

fix(cua-driver): let start_session revive an idle-reaped session (recover from "session ended")#2035
LaZzyMan wants to merge 1 commit into
trycua:mainfrom
LaZzyMan:fix/session-revive-after-idle

LaZzyMan commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 26, 2026

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

LaZzyMan commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Tests

Verification status

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector Bot commented Jun 26, 2026

Uh oh!

vercel Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LaZzyMan commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading