Skip to content

fix(cua-driver/linux): stop reporting bare "✅ Clicked" for X11 synthetic clicks (#2022)#2025

Open
LaZzyMan wants to merge 1 commit into
trycua:mainfrom
LaZzyMan:lazzy/linux-x11-synthetic-click-honesty
Open

fix(cua-driver/linux): stop reporting bare "✅ Clicked" for X11 synthetic clicks (#2022)#2025
LaZzyMan wants to merge 1 commit into
trycua:mainfrom
LaZzyMan:lazzy/linux-x11-synthetic-click-honesty

Conversation

@LaZzyMan

@LaZzyMan LaZzyMan commented Jun 25, 2026

Copy link
Copy Markdown

Summary

First, conservative step on #2022. On Linux/X11 the pixel click tool injects
ButtonPress/ButtonRelease via XSendEvent, whose events carry a
send_event = True flag that some toolkits deliberately ignore — GTK
menus/popups while they hold a pointer grab, SDL, and Allegro are the known
cases. The synthetic click is dispatched but never reaches the application's
event loop, yet the driver returned a bare ✅ Clicked at (x, y). Same silent
false-success the project already treats as a real bug for the keyboard
(#1805, XSendEvent→XTEST) and for Wayland (#1921#1994, #1982#1992) — but here
the cause is the send_event flag on X11.

This PR does not try to make the click land (that's #2022's real fix, which
needs the no-focus-steal machinery — MPX master + shield grabs — already used by
parallel_mouse_drag). It just stops over-claiming: on the X11 synthetic path
the click result now names the delivery mechanism and points at the reliable
AT-SPI element_index route. The Wayland virtual-pointer path and the AT-SPI
element-click path deliver for real, so they keep the plain success line.

Why message-only, and why not gated per-app

XSendEvent gives no per-click delivery confirmation, so — unlike the Wayland
"no $DISPLAY" case in #1994 — we can't know a given click was dropped. A
WM_CLASS deny-list (mirroring terminal.rs) was considered but rejected: this
repo has no SDL/Allegro fixtures to seed accurate WM_CLASS substrings from, and
guessing them would be worse than honest. So the caveat is attached to the whole
X11 synthetic path. Happy to narrow it (e.g. gate on "target exposes no AT-SPI
tree", or a seeded deny-list once real WM_CLASS samples exist) if you'd prefer.

Changes

Verification

The message builder was compiled and checked at the workspace edition (2021,
clean). The crate is cfg(target_os = "linux") and cannot be built on the
macOS box this was authored on
— the Linux compile of the call site + the unit
tests rely on this PR's CI. Opening as draft until CI is green.

Refs #2022. Related: #1805 (keyboard XTEST), #1994/#1992 (Wayland fail-loud).

Summary by CodeRabbit

  • Documentation

    • Clarified Linux click behavior for background input, including cases where synthetic clicks may be ignored by some apps.
    • Added guidance to use element-based clicking when coordinate clicks are unreliable.
  • Bug Fixes

    • Updated click status messages to better reflect when a click may not reach the target.
    • Preserved the standard success message for normal click paths.

…tic clicks (trycua#2022)

X11 pixel clicks are injected via XSendEvent, whose events carry a
send_event flag that GTK menus/popups (under a pointer grab), SDL and
Allegro deliberately ignore — the click is dispatched but never reaches
the app, yet the driver reported a bare "✅ Clicked", so an agent
believes it landed. Keyboard moved to XTEST for this exact reason
(trycua#1805); the pointer path stays on XSendEvent by design (no-focus-steal
contract).

Until clicks can land focus-free (trycua#2022), expose the uncertainty on the
X11 synthetic path: the click result now names the mechanism and points
at the reliable AT-SPI element_index route. The Wayland virtual-pointer
path and AT-SPI element clicks deliver for real and keep the plain
success line. Adds unit tests for the message builder and records the
pointer limitation in the input module doc + LINUX.md.
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@LaZzyMan is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 322e8d2d-441f-4ec4-9920-e7af2f723f2f

📥 Commits

Reviewing files that changed from the base of the PR and between 9e84c1d and 4c90b95.

📒 Files selected for processing (3)
  • libs/cua-driver/rust/Skills/cua-driver/LINUX.md
  • libs/cua-driver/rust/crates/platform-linux/src/input/mod.rs
  • libs/cua-driver/rust/crates/platform-linux/src/tools/impl_.rs

📝 Walkthrough

Walkthrough

Linux click guidance now notes that synthetic X11 pointer clicks can be filtered by some toolkits and points callers to element-indexed clicks. The Linux click tool also changes its coordinate success text for X11 synthetic delivery and adds tests for both message variants.

Changes

Linux click guidance and X11 click result text

Layer / File(s) Summary
Click guidance docs
libs/cua-driver/rust/Skills/cua-driver/LINUX.md, libs/cua-driver/rust/crates/platform-linux/src/input/mod.rs
The Linux click documentation adds the synthetic-event caveat and directs callers toward the AT-SPI element_index route.
Synthetic click result text
libs/cua-driver/rust/crates/platform-linux/src/tools/impl_.rs
ClickTool uses a helper to vary coordinate click success text for X11 synthetic delivery, and tests cover both message variants.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

  • trycua/cua issue 2022: Both changes describe X11 synthetic click delivery being filtered and steer callers toward element_index.

Possibly related PRs

  • trycua/cua#1789: Both PRs modify the Linux click tool implementation in libs/cua-driver/rust/crates/platform-linux/src/tools/impl_.rs.
  • trycua/cua#1871: Both PRs touch the Linux X11 synthetic click path and related input/mod.rs docs.
  • trycua/cua#1949: Both PRs update the Linux skill documentation for pixel/synthetic click expectations.

Poem

I’m a rabbit with twitchy ears,
Hopping past X11 woes and fears.
If clicks go shy, I nudge the way,
To element-indexed hops today. 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: changing Linux X11 synthetic click reporting to avoid a misleading bare success message.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@LaZzyMan LaZzyMan marked this pull request as ready for review June 25, 2026 06:16
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

LaZzyMan added a commit to QwenLM/qwen-code that referenced this pull request Jun 26, 2026
…c clicks

Ports the fix from upstream trycua/cua#2025 into the vendored driver.

On X11, clicks are delivered via XSendEvent synthetic events, which many
toolkits (GTK/SDL/Allegro) ignore because send_event is set. The driver still
reported a flat success ("Clicked"), masking that nothing happened. Report
the synthetic-delivery caveat honestly so the agent can fall back instead of
assuming the click landed.

(platform-linux crate is not built on macOS; verified by clean upstream apply
and covered by upstream + release-workflow Linux CI.)
shenyankm pushed a commit to shenyankm/qwen-code that referenced this pull request Jun 26, 2026
…coordinates (QwenLM#5896)

* feat(cua-driver): vendor trycua/cua driver with 1000-normalized coordinate support

Vendor libs/cua-driver from trycua/cua into packages/cua-driver as the
basis for qwen-code's computer-use backend, adding an opt-in relative
(1000x1000 normalized) coordinate mode for Qwen-VL clients.

- coord_norm.rs: 0-1000 <-> pixel conversion, per-(pid,window_id) size
  cache, tools/list description rewrite (TDD, 27 tests)
- ToolRegistry: normalized field + invoke input/output hooks
- protocol.rs: system-instruction coordinate wording switched by mode
- serve.rs: daemon list path description rewrite (input_schema aware)
- main.rs: CUA_DRIVER_RS_COORDINATE_SPACE env seed

Default coordinate_space=pixels => zero behavior change for existing
pixel clients. Set CUA_DRIVER_RS_COORDINATE_SPACE=normalized_1000 to
enable. Excludes rust/target build output.

* feat(cua-driver): make normalized coordinate scale configurable

Add CUA_DRIVER_RS_COORDINATE_SCALE (default 1000) so the normalization
full-scale can absorb the Qwen 999-vs-1000 cookbook ambiguity without a
recompile. norm_to_px/px_to_norm now take an explicit scale; denormalize_args
reads the process-wide COORDINATE_SCALE seeded once at startup from env.

* ci(cua-driver): add cross-platform release workflow for vendored driver

Standalone GitHub Action that builds, signs, and releases the vendored
cua-driver under packages/cua-driver. Adapted from upstream trycua/cua
cd-rust-cua-driver.yml:

- macOS: universal binary (lipo arm64+x86_64), codesigned + notarized into
  CuaDriver.app using qwen-code's existing secrets (MAC_CSC_LINK cert +
  App Store Connect API key notarization); Developer ID identity is
  auto-discovered from the imported cert.
- Linux: x86_64 + arm64, built in debian:11 for a glibc 2.31 floor.
- Windows: x86_64 + arm64, unsigned (no EV cert, matches upstream).
- Release: softprops/action-gh-release on cua-driver-rs-v* tags or manual
  dispatch, prerelease.

Triggered by tag push (cua-driver-rs-v*) or workflow_dispatch.

* chore(cua-driver): rebrand vendored driver as qwen-cua-driver

Rename the vendored trycua/cua driver so the fork installs and runs
independently of any upstream trycua install:
- binary cua-driver -> qwen-cua-driver
- bundle CuaDriver.app -> QwenCuaDriver.app
- bundle id com.trycua.driver -> com.qwencode.cua-driver

Updates the cargo/uia manifests, Info.plist, bundle/proxy launch paths,
permission/health-report wording, the install/build scripts, and the
cross-platform release workflow.

* feat(cua-driver): finish relative-coordinate mode — toggle, scale, zoom/move_cursor

- CUA_DRIVER_RS_COORDINATE_SPACE is now a 1/0 toggle (via is_env_truthy);
  default off keeps pixel mode byte-identical to upstream.
- Thread CUA_DRIVER_RS_COORDINATE_SCALE through every coordinate surface
  (was hardcoded 1000): input denormalization already used it; now the
  rewritten screenshot dims, the tool/param descriptions, and the agent
  instructions track the configured scale too.
- Normalize zoom (window basis) and move_cursor (screen basis) inputs and
  rewrite their descriptions, alongside click/double_click/right_click/drag.
- Fix zoom on downscaled (Retina) windows: apply the get_window_state resize
  ratio so the crop lands on the region the agent saw. Normalized mode only;
  pixel-mode zoom unchanged.

All coordinate behavior stays gated on the normalized flag, so the default
(pixels) path is unchanged from upstream.

* chore(cua-driver): add upstream-sync script (git subtree unusable here)

`git subtree split --prefix=libs/cua-driver` hangs on a commit deep in
trycua/cua's history, so the subtree add/pull workflow isn't usable for
the vendored driver (and a pull would re-split + re-hang every time).

Add scripts/sync-from-upstream.sh instead: it git-diffs two upstream refs
(never walks the full history, so it dodges the hang), reprefixes the
libs/cua-driver delta to packages/cua-driver, and `git apply --reject`s it
on top of our local changes — conflicts land as *.rej for manual fixup.
Record the vendored version in .vendored-from and document the migration +
sync method in the design doc.

* chore(cua-driver): exclude vendored driver from qwen-code ESLint

The vendored packages/cua-driver tree carries upstream JS (e.g. the
test-harness Electron app) that doesn't follow qwen-code's lint rules and
fails CI. It is not a workspace package (no package.json) and is not
qwen-code TypeScript, so add it to eslint.config.js global ignores —
alongside packages/desktop/** — the standard treatment for vendored code.

* fix(cua-driver): let start_session revive an idle-reaped session

Ports the fix from upstream trycua/cua#2035 into the vendored driver.

When a session is reaped for idleness, a subsequent start_session with the
same id failed instead of resuming it. Revive the ended session in place so
the agent can continue rather than getting a hard error.

* fix(cua-driver): retry daemon socket writes on EAGAIN

Ports the fix from upstream trycua/cua#2036 into the vendored driver.

A non-blocking daemon socket can return EAGAIN/EWOULDBLOCK mid-write when the
peer's receive buffer is momentarily full. The driver treated that as fatal
and dropped the connection. Add a bounded retry/poll loop (mirror of the
read-side socket_io helper) so transient back-pressure no longer kills the
session; only a real timeout or hard error fails the write.

* fix(cua-driver/linux): stop reporting bare "Clicked" for X11 synthetic clicks

Ports the fix from upstream trycua/cua#2025 into the vendored driver.

On X11, clicks are delivered via XSendEvent synthetic events, which many
toolkits (GTK/SDL/Allegro) ignore because send_event is set. The driver still
reported a flat success ("Clicked"), masking that nothing happened. Report
the synthetic-delivery caveat honestly so the agent can fall back instead of
assuming the click landed.

(platform-linux crate is not built on macOS; verified by clean upstream apply
and covered by upstream + release-workflow Linux CI.)

* fix(cua-driver/windows): list empty-/null-title top-level windows

Ports the fix from upstream trycua/cua#2021 into the vendored driver.

list_windows filtered out any top-level window whose title was empty or null,
so legitimate targets (splash screens, some Electron/game windows, tool
windows) were invisible to the agent and unclickable. Include empty-title
windows, using class name / process as a fallback label.

(platform-windows crate is not built on macOS; verified by clean upstream
apply and covered by upstream + release-workflow Windows CI.)

* chore(cua-driver): track cherry-picked upstream PRs; fix vendored-from

The vendored copy is actually at cua-driver-rs-v0.6.7 (workspace version and
all 0.6.7->0.6.8 delta files confirm it), but .vendored-from had drifted to
0.6.8 during an earlier sync-script trial whose code delta was not kept. Left
as-is it would make a future sync diff 0.6.8->newer and silently skip the real
0.6.7->0.6.8 fixes. Correct it back to 0.6.7.

Also record the four not-yet-merged upstream PRs we carry as cherry-picks
(trycua/cua#2021/QwenLM#2025/QwenLM#2035/QwenLM#2036) in .vendored-patches.md, and have
sync-from-upstream.sh point at it so the next sync reconciles them.

* ci(cua-driver): satisfy repo yamllint on the release workflow

The vendored-driver release workflow tripped 114 quoted-strings violations
under the repo's .yamllint (quote-type: single, required). Single-quote all
string scalars to match every other workflow in .github/workflows.

While reformatting, the release-notes body also got its paragraph blank lines
collapsed and still referenced the old CUA_DRIVER_RS_COORDINATE_SPACE=
normalized_1000 value — restore the blank lines and update it to the current
0/1 toggle (default 0 = off; optional CUA_DRIVER_RS_COORDINATE_SCALE=1000).

* chore(cua-driver): sync vendored driver to cua-driver-rs-v0.6.8

First real run of scripts/sync-from-upstream.sh: it 3-way-applied the upstream
0.6.7->0.6.8 delta onto our local fork. 10/12 files applied cleanly; the 2
rejects (install.ps1, _install-rust.sh) were already-applied baked-version
bumps (0.6.6->0.6.7, our copies were already at 0.6.7), i.e. no real conflict.

0.6.8 brings: Wayland input path (platform-linux), linux health_report +
overlay tweaks, a platform-macos build.rs step, and dependency bumps. Version
moved to 0.6.8 across the workspace.

Verified our work survived the sync untouched: the relative-coordinate shim
(coord_norm/protocol) and all four cherry-picked PRs (socket_io/session +
linux/windows) are intact — in particular the 0.6.8 edit to platform-linux
tools/impl_.rs landed alongside our QwenLM#2025 change with no collision. macOS
cargo check + 132 core tests green. (platform-linux/windows + the binary
integration test build only on their own runners; upstream CI covers those.)

* ci(cua-driver): add a dry_run gate to the release workflow

Mirror the desktop-release / release dry-run pattern: a workflow_dispatch
dry_run boolean input (default true). The cross-platform build + package jobs
always run and upload their artifacts; the GitHub Release job now publishes
only on a tag push or an explicit dry_run=false dispatch.

Lets us rehearse the whole build/package pipeline (dry_run=true, notarize=false)
and inspect the produced artifacts without cutting a release. A branch push
(no tag, not a dispatch) likewise builds without releasing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant