fix(cua-driver/linux): stop reporting bare "✅ Clicked" for X11 synthetic clicks (#2022)#2025
fix(cua-driver/linux): stop reporting bare "✅ Clicked" for X11 synthetic clicks (#2022)#2025LaZzyMan wants to merge 1 commit into
Conversation
…tic clicks (trycua#2022) X11 pixel clicks are injected via XSendEvent, whose events carry a send_event flag that GTK menus/popups (under a pointer grab), SDL and Allegro deliberately ignore — the click is dispatched but never reaches the app, yet the driver reported a bare "✅ Clicked", so an agent believes it landed. Keyboard moved to XTEST for this exact reason (trycua#1805); the pointer path stays on XSendEvent by design (no-focus-steal contract). Until clicks can land focus-free (trycua#2022), expose the uncertainty on the X11 synthetic path: the click result now names the mechanism and points at the reliable AT-SPI element_index route. The Wayland virtual-pointer path and AT-SPI element clicks deliver for real and keep the plain success line. Adds unit tests for the message builder and records the pointer limitation in the input module doc + LINUX.md.
|
@LaZzyMan is attempting to deploy a commit to the Cua Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughLinux click guidance now notes that synthetic X11 pointer clicks can be filtered by some toolkits and points callers to element-indexed clicks. The Linux click tool also changes its coordinate success text for X11 synthetic delivery and adds tests for both message variants. ChangesLinux click guidance and X11 click result text
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…c clicks Ports the fix from upstream trycua/cua#2025 into the vendored driver. On X11, clicks are delivered via XSendEvent synthetic events, which many toolkits (GTK/SDL/Allegro) ignore because send_event is set. The driver still reported a flat success ("Clicked"), masking that nothing happened. Report the synthetic-delivery caveat honestly so the agent can fall back instead of assuming the click landed. (platform-linux crate is not built on macOS; verified by clean upstream apply and covered by upstream + release-workflow Linux CI.)
…coordinates (QwenLM#5896) * feat(cua-driver): vendor trycua/cua driver with 1000-normalized coordinate support Vendor libs/cua-driver from trycua/cua into packages/cua-driver as the basis for qwen-code's computer-use backend, adding an opt-in relative (1000x1000 normalized) coordinate mode for Qwen-VL clients. - coord_norm.rs: 0-1000 <-> pixel conversion, per-(pid,window_id) size cache, tools/list description rewrite (TDD, 27 tests) - ToolRegistry: normalized field + invoke input/output hooks - protocol.rs: system-instruction coordinate wording switched by mode - serve.rs: daemon list path description rewrite (input_schema aware) - main.rs: CUA_DRIVER_RS_COORDINATE_SPACE env seed Default coordinate_space=pixels => zero behavior change for existing pixel clients. Set CUA_DRIVER_RS_COORDINATE_SPACE=normalized_1000 to enable. Excludes rust/target build output. * feat(cua-driver): make normalized coordinate scale configurable Add CUA_DRIVER_RS_COORDINATE_SCALE (default 1000) so the normalization full-scale can absorb the Qwen 999-vs-1000 cookbook ambiguity without a recompile. norm_to_px/px_to_norm now take an explicit scale; denormalize_args reads the process-wide COORDINATE_SCALE seeded once at startup from env. * ci(cua-driver): add cross-platform release workflow for vendored driver Standalone GitHub Action that builds, signs, and releases the vendored cua-driver under packages/cua-driver. Adapted from upstream trycua/cua cd-rust-cua-driver.yml: - macOS: universal binary (lipo arm64+x86_64), codesigned + notarized into CuaDriver.app using qwen-code's existing secrets (MAC_CSC_LINK cert + App Store Connect API key notarization); Developer ID identity is auto-discovered from the imported cert. - Linux: x86_64 + arm64, built in debian:11 for a glibc 2.31 floor. - Windows: x86_64 + arm64, unsigned (no EV cert, matches upstream). - Release: softprops/action-gh-release on cua-driver-rs-v* tags or manual dispatch, prerelease. Triggered by tag push (cua-driver-rs-v*) or workflow_dispatch. * chore(cua-driver): rebrand vendored driver as qwen-cua-driver Rename the vendored trycua/cua driver so the fork installs and runs independently of any upstream trycua install: - binary cua-driver -> qwen-cua-driver - bundle CuaDriver.app -> QwenCuaDriver.app - bundle id com.trycua.driver -> com.qwencode.cua-driver Updates the cargo/uia manifests, Info.plist, bundle/proxy launch paths, permission/health-report wording, the install/build scripts, and the cross-platform release workflow. * feat(cua-driver): finish relative-coordinate mode — toggle, scale, zoom/move_cursor - CUA_DRIVER_RS_COORDINATE_SPACE is now a 1/0 toggle (via is_env_truthy); default off keeps pixel mode byte-identical to upstream. - Thread CUA_DRIVER_RS_COORDINATE_SCALE through every coordinate surface (was hardcoded 1000): input denormalization already used it; now the rewritten screenshot dims, the tool/param descriptions, and the agent instructions track the configured scale too. - Normalize zoom (window basis) and move_cursor (screen basis) inputs and rewrite their descriptions, alongside click/double_click/right_click/drag. - Fix zoom on downscaled (Retina) windows: apply the get_window_state resize ratio so the crop lands on the region the agent saw. Normalized mode only; pixel-mode zoom unchanged. All coordinate behavior stays gated on the normalized flag, so the default (pixels) path is unchanged from upstream. * chore(cua-driver): add upstream-sync script (git subtree unusable here) `git subtree split --prefix=libs/cua-driver` hangs on a commit deep in trycua/cua's history, so the subtree add/pull workflow isn't usable for the vendored driver (and a pull would re-split + re-hang every time). Add scripts/sync-from-upstream.sh instead: it git-diffs two upstream refs (never walks the full history, so it dodges the hang), reprefixes the libs/cua-driver delta to packages/cua-driver, and `git apply --reject`s it on top of our local changes — conflicts land as *.rej for manual fixup. Record the vendored version in .vendored-from and document the migration + sync method in the design doc. * chore(cua-driver): exclude vendored driver from qwen-code ESLint The vendored packages/cua-driver tree carries upstream JS (e.g. the test-harness Electron app) that doesn't follow qwen-code's lint rules and fails CI. It is not a workspace package (no package.json) and is not qwen-code TypeScript, so add it to eslint.config.js global ignores — alongside packages/desktop/** — the standard treatment for vendored code. * fix(cua-driver): let start_session revive an idle-reaped session Ports the fix from upstream trycua/cua#2035 into the vendored driver. When a session is reaped for idleness, a subsequent start_session with the same id failed instead of resuming it. Revive the ended session in place so the agent can continue rather than getting a hard error. * fix(cua-driver): retry daemon socket writes on EAGAIN Ports the fix from upstream trycua/cua#2036 into the vendored driver. A non-blocking daemon socket can return EAGAIN/EWOULDBLOCK mid-write when the peer's receive buffer is momentarily full. The driver treated that as fatal and dropped the connection. Add a bounded retry/poll loop (mirror of the read-side socket_io helper) so transient back-pressure no longer kills the session; only a real timeout or hard error fails the write. * fix(cua-driver/linux): stop reporting bare "Clicked" for X11 synthetic clicks Ports the fix from upstream trycua/cua#2025 into the vendored driver. On X11, clicks are delivered via XSendEvent synthetic events, which many toolkits (GTK/SDL/Allegro) ignore because send_event is set. The driver still reported a flat success ("Clicked"), masking that nothing happened. Report the synthetic-delivery caveat honestly so the agent can fall back instead of assuming the click landed. (platform-linux crate is not built on macOS; verified by clean upstream apply and covered by upstream + release-workflow Linux CI.) * fix(cua-driver/windows): list empty-/null-title top-level windows Ports the fix from upstream trycua/cua#2021 into the vendored driver. list_windows filtered out any top-level window whose title was empty or null, so legitimate targets (splash screens, some Electron/game windows, tool windows) were invisible to the agent and unclickable. Include empty-title windows, using class name / process as a fallback label. (platform-windows crate is not built on macOS; verified by clean upstream apply and covered by upstream + release-workflow Windows CI.) * chore(cua-driver): track cherry-picked upstream PRs; fix vendored-from The vendored copy is actually at cua-driver-rs-v0.6.7 (workspace version and all 0.6.7->0.6.8 delta files confirm it), but .vendored-from had drifted to 0.6.8 during an earlier sync-script trial whose code delta was not kept. Left as-is it would make a future sync diff 0.6.8->newer and silently skip the real 0.6.7->0.6.8 fixes. Correct it back to 0.6.7. Also record the four not-yet-merged upstream PRs we carry as cherry-picks (trycua/cua#2021/QwenLM#2025/QwenLM#2035/QwenLM#2036) in .vendored-patches.md, and have sync-from-upstream.sh point at it so the next sync reconciles them. * ci(cua-driver): satisfy repo yamllint on the release workflow The vendored-driver release workflow tripped 114 quoted-strings violations under the repo's .yamllint (quote-type: single, required). Single-quote all string scalars to match every other workflow in .github/workflows. While reformatting, the release-notes body also got its paragraph blank lines collapsed and still referenced the old CUA_DRIVER_RS_COORDINATE_SPACE= normalized_1000 value — restore the blank lines and update it to the current 0/1 toggle (default 0 = off; optional CUA_DRIVER_RS_COORDINATE_SCALE=1000). * chore(cua-driver): sync vendored driver to cua-driver-rs-v0.6.8 First real run of scripts/sync-from-upstream.sh: it 3-way-applied the upstream 0.6.7->0.6.8 delta onto our local fork. 10/12 files applied cleanly; the 2 rejects (install.ps1, _install-rust.sh) were already-applied baked-version bumps (0.6.6->0.6.7, our copies were already at 0.6.7), i.e. no real conflict. 0.6.8 brings: Wayland input path (platform-linux), linux health_report + overlay tweaks, a platform-macos build.rs step, and dependency bumps. Version moved to 0.6.8 across the workspace. Verified our work survived the sync untouched: the relative-coordinate shim (coord_norm/protocol) and all four cherry-picked PRs (socket_io/session + linux/windows) are intact — in particular the 0.6.8 edit to platform-linux tools/impl_.rs landed alongside our QwenLM#2025 change with no collision. macOS cargo check + 132 core tests green. (platform-linux/windows + the binary integration test build only on their own runners; upstream CI covers those.) * ci(cua-driver): add a dry_run gate to the release workflow Mirror the desktop-release / release dry-run pattern: a workflow_dispatch dry_run boolean input (default true). The cross-platform build + package jobs always run and upload their artifacts; the GitHub Release job now publishes only on a tag push or an explicit dry_run=false dispatch. Lets us rehearse the whole build/package pipeline (dry_run=true, notarize=false) and inspect the produced artifacts without cutting a release. A branch push (no tag, not a dispatch) likewise builds without releasing.
Summary
First, conservative step on #2022. On Linux/X11 the pixel
clicktool injectsButtonPress/ButtonReleaseviaXSendEvent, whose events carry asend_event = Trueflag that some toolkits deliberately ignore — GTKmenus/popups while they hold a pointer grab, SDL, and Allegro are the known
cases. The synthetic click is dispatched but never reaches the application's
event loop, yet the driver returned a bare
✅ Clicked at (x, y). Same silentfalse-success the project already treats as a real bug for the keyboard
(#1805, XSendEvent→XTEST) and for Wayland (#1921→#1994, #1982→#1992) — but here
the cause is the
send_eventflag on X11.This PR does not try to make the click land (that's #2022's real fix, which
needs the no-focus-steal machinery — MPX master + shield grabs — already used by
parallel_mouse_drag). It just stops over-claiming: on the X11 synthetic paththe
clickresult now names the delivery mechanism and points at the reliableAT-SPI
element_indexroute. The Wayland virtual-pointer path and the AT-SPIelement-click path deliver for real, so they keep the plain success line.
Why message-only, and why not gated per-app
XSendEventgives no per-click delivery confirmation, so — unlike the Wayland"no
$DISPLAY" case in #1994 — we can't know a given click was dropped. AWM_CLASS deny-list (mirroring
terminal.rs) was considered but rejected: thisrepo has no SDL/Allegro fixtures to seed accurate
WM_CLASSsubstrings from, andguessing them would be worse than honest. So the caveat is attached to the whole
X11 synthetic path. Happy to narrow it (e.g. gate on "target exposes no AT-SPI
tree", or a seeded deny-list once real
WM_CLASSsamples exist) if you'd prefer.Changes
tools/impl_::coord_click_result_text(x, y, count, x11_synthetic)— pure,unit-tested message builder. Plain
✅ Clicked …when the click was deliveredby a path that actually lands (Wayland / AT-SPI element); on the X11 synthetic
path it appends the uncertainty + the
element_indexremediation.ClickToolcoordinate path passes!wayland::is_wayland()asx11_synthetic.cfg(target_os = "linux")).input/mod.rsmodule comment now records the pointer-click limitation(counterpart to the existing keyboard/terminal note) and cross-refs cua-driver (Linux/X11): pixel
clickvia XSendEvent is silently ignored by toolkits that check thesend_eventflag (GTK menus/popups, SDL, Allegro) — reports ✅ Clicked but no input lands #2022;LINUX.mdcapability table no longer presents pixel click as unconditionallyreliable.
Verification
The message builder was compiled and checked at the workspace edition (2021,
clean). The crate is
cfg(target_os = "linux")and cannot be built on themacOS box this was authored on — the Linux compile of the call site + the unit
tests rely on this PR's CI. Opening as draft until CI is green.
Refs #2022. Related: #1805 (keyboard XTEST), #1994/#1992 (Wayland fail-loud).
Summary by CodeRabbit
Documentation
Bug Fixes