feat(driver): add Windows element query geometry and verified actions by mustbearnold · Pull Request #1993 · trycua/cua

mustbearnold · 2026-06-23T09:07:38Z

Summary

Adds reliable Windows GUI automation surfaces that let agents address UI elements semantically and verify intended state changes instead of treating low-level OS dispatch success as task success.

Windows driver surfaces

Enriches get_window_state element records with stable geometry and metadata while preserving legacy fields.
Adds get_element_geometry for cached element-token/index geometry lookups.
Adds find_element for focused semantic element queries by label, role, automation id, class name, and text.
Adds click_verified, a verified click transaction with pre/post UIA snapshots and explicit expected-label predicates.
Adds set_value_verified, a verified text/value transaction that wraps set_value and verifies the requested value/label appears in post-state.
Adds compact verified-action state deltas (added_labels/removed_labels, added_texts/removed_texts, plus total counts) so callers get a bounded explanation of what changed without dumping the full UI tree.

Safety / reliability

Hardens model-supplied coordinate parsing in the Python UI-TARS loop.
Separates os_dispatch_success from state_changed, verified, expected_change_satisfied, and final success in verified action tools.
Returns structured diagnostics and error results when dispatch succeeds but the expected post-state is not observed.

Verification

Source checks on branch pr/windows-10x-agent-runtime at 705585d:

cargo check -p platform-windows                                      passed
cargo test -p platform-windows --lib                                 80 passed
cargo build -p cua-driver                                            passed

Harness/unit checks:

py -m pytest tests/test_find_geometry_smoke.py tests/test_bench_cleanup_metrics.py -q  13 passed

Runtime smoke:

py scripts/cua_driver_smoke.py --runs 2 --task all --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe

Result:

6/6 passed

Covers:

find_element_geometry_smoke
click_verified_smoke
set_value_verified_smoke

Post-smoke cleanup:

{"Calculator": [], "cua-mcp-bench-setv": []}

Latest raw smoke artifact:

reports/smoke/find-geometry-smoke-20260623-090217.jsonl

Additional local runtime samples also passed earlier:

py scripts/cua_driver_smoke.py --runs 5 --task all --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  15/15 passed
py scripts/cua_driver_bench.py --runs 3 --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe             12/12 passed across Calculator, Notepad, Terminal, Explorer

Diff stat

.../rust/crates/platform-windows/src/msaa.rs       |  10 +
 .../crates/platform-windows/src/tools/impl_.rs     | 675 +++++++++++++++++++--
 .../rust/crates/platform-windows/src/uia/mod.rs    |  19 +-
 .../agent/cua_agent/loops/coordinate_parser.py     |  36 ++
 libs/python/agent/cua_agent/loops/uitars.py        |  13 +-
 .../agent/tests/test_uitars_coordinate_parser.py   |  40 ++
 6 files changed, 749 insertions(+), 44 deletions(-)

Commit stack

705585d feat(driver): summarize verified action state diffs
c26e2e8 feat(driver): add verified set value transaction
7533ad8 feat(driver): add verified click transaction
5616363 feat(driver): add find element query tool
723a2da feat(driver): expose cached element geometry
5175d3f feat(driver): enrich windows structured element geometry
3d9e5ad security: harden model-supplied coordinate parsing

Notes for reviewers

The verified transaction tools intentionally wrap existing primitive implementations rather than replacing them, preserving existing routing/token/cache behavior.
success in verified tools means os_dispatch_success && verified; primitive dispatch success alone is exposed separately.
Added/removed state-delta samples are capped; count fields report total unique additions/removals.
Smoke and benchmark harnesses use deterministic, isolated Calculator, Notepad, Terminal, and Explorer temp resources and verify cleanup.

Summary by CodeRabbit

New Features
- Added verified interaction tools for UI clicks and value changes with pre/post-action state validation.
- Added new tools for element discovery and geometry (bounds in screen and window coordinates).
Improvements
- Enhanced Windows UI elements with class name plus enabled/visible/selected/focused metadata.
- Enriched exported element records with richer identifiers, text fields, and geometry.
- Replaced unsafe coordinate parsing with a validated parser that rejects non-literals, non-numerics, wrong shapes, and non-finite values.
Tests
- Added unit tests for coordinate parsing and UI state flag derivation.

vercel · 2026-06-23T09:07:44Z

Someone is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-06-23T09:08:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0bbfce67-c127-4d29-907f-c975f6257b73

📥 Commits

Reviewing files that changed from the base of the PR and between 705585d and e78213e.

📒 Files selected for processing (4)

libs/cua-driver/rust/crates/platform-windows/src/msaa.rs
libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs
libs/python/agent/cua_agent/loops/coordinate_parser.py
libs/python/agent/tests/test_uitars_coordinate_parser.py

🚧 Files skipped from review as they are similar to previous changes (3)

libs/python/agent/cua_agent/loops/coordinate_parser.py
libs/python/agent/tests/test_uitars_coordinate_parser.py
libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs

📝 Walkthrough

Walkthrough

Two independent changes: (1) The Windows UIA/MSAA layer gains class_name, enabled, visible, selected, and focused fields on UiaNode; a shared structured_element_record helper replaces inline element serialization; four new tools (find_element, click_verified, get_element_geometry, set_value_verified) are added, registered, and tested. (2) A new parse_uitars_coordinates helper replaces eval() in the Python UITARS action loop with ast.literal_eval-based safe parsing, validated and tested.

Changes

Windows UIA Node Enrichment and New Tools

Layer / File(s)	Summary
UiaNode model: class_name and state fields `libs/cua-driver/rust/crates/platform-windows/src/uia/mod.rs`	`UiaNode` gains `class_name`, `enabled`, `visible`, `selected`, `focused` fields; `UIA_ClassNamePropertyId` is added to the bulk property prefetch loop; the cached UIA walker reads cached class name and populates all new fields when constructing both actionable and non-actionable nodes.
MSAA state-bit derivation and node enrichment `libs/cua-driver/rust/crates/platform-windows/src/msaa.rs`	MSAA state flag constants and helper functions convert optional `accState` values into `enabled`, `visible`, `selected`, and `focused` booleans; `state_int` is extracted from `acc.get_accState`; both actionable and non-emitting MSAA node construction branches populate the enriched `UiaNode` state fields; unit tests verify state-flag derivation logic.
`structured_element_record` helper and `get_window_state` refactor `libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs`	Introduces shared helpers to convert `UiaNode` into enriched JSON element records with stable IDs, label/value/text fields, backend, depth/parent info, screen and window-relative geometry, and explicit null/error fields for missing rects; `get_window_state` is refactored to derive `target_window_bounds`, call the helper for all element records, and include `capture_scope`, `capture_mode`, and screenshot metadata fields.
New tools: `find_element`, `click_verified`, `get_element_geometry`, `set_value_verified` `libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs`	Four new tool structs are defined and implemented: `find_element` filters a bounded UIA walk and returns enriched element records; `click_verified` captures pre/post snapshots around a click and returns verification booleans; `get_element_geometry` reads cached bounds in screen and window coordinate spaces; `set_value_verified` diffs UIA text around a `set_value` call and reports verification results.
Tool registration and unit tests `libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs`	`build_registry` registers all four new tools; unit tests cover click-verified expectation transitions, UIA walk timeout behavior, screenshot metadata coordinate-space/scale, `structured_element_record` stable ID/token/geometry error handling, tool registry schema advertising, label diff summarization, `set_value_verified` text extraction, and click button enum invariants.

Safe Coordinate Parsing for UITARS

Layer / File(s)	Summary
`parse_uitars_coordinates`: implementation, integration, and tests `libs/python/agent/cua_agent/loops/coordinate_parser.py`, `libs/python/agent/cua_agent/loops/uitars.py`, `libs/python/agent/tests/test_uitars_coordinate_parser.py`	`parse_uitars_coordinates` uses `ast.literal_eval` with boolean rejection, float coercion, `math.isfinite` validation, and 2-to-4 value normalization; all `eval()` calls for coordinate parsing in `uitars.py` (click, double-click, right-click, scroll, drag) are replaced with this helper; parametrized tests verify accepted 2/4-element formats and rejected syntax/non-numeric/non-finite inputs.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(135, 206, 235, 0.5)
    Note over Caller,UiaWalker: click_verified flow
    Caller->>ClickVerifiedTool: invoke(element_token, expected_label_present)
    ClickVerifiedTool->>UiaWalker: pre-snapshot label extraction
    UiaWalker-->>ClickVerifiedTool: pre_labels set
    ClickVerifiedTool->>click: execute OS click dispatch
    click-->>ClickVerifiedTool: os_dispatch_success
    ClickVerifiedTool->>UiaWalker: post-snapshot label extraction
    UiaWalker-->>ClickVerifiedTool: post_labels set
    ClickVerifiedTool->>ClickVerifiedTool: compute label deltas, verify expected_label_present/absent
    ClickVerifiedTool-->>Caller: {os_dispatch_success, state_changed, verified, success}
  end
  rect rgba(144, 238, 144, 0.5)
    Note over Caller,UiaWalker: set_value_verified flow
    Caller->>SetValueVerifiedTool: invoke(value, expected_value, expected_label_present)
    SetValueVerifiedTool->>UiaWalker: pre-snapshot text extraction
    UiaWalker-->>SetValueVerifiedTool: pre_texts
    SetValueVerifiedTool->>set_value: delegate set_value call
    set_value-->>SetValueVerifiedTool: result
    SetValueVerifiedTool->>UiaWalker: post-snapshot text extraction
    UiaWalker-->>SetValueVerifiedTool: post_texts
    SetValueVerifiedTool->>SetValueVerifiedTool: diff texts, evaluate expected substrings
    SetValueVerifiedTool-->>Caller: {value_found, label_found, state_changed, success}
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 A rabbit once feared the sly eval() call,
So it parsed with literal_eval—no tricks at all!
Then MSAA nodes got class_name and visible too,
Four shiny new tools for the Windows UI crew.
Each click now comes verified, each value confirmed—
The warren is safer, the code firmly firmed! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 49.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main objective: adding Windows element query and geometry tools plus verified action capabilities for reliable GUI automation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mustbearnold · 2026-06-23T09:12:07Z

Posted PR and added a more complex local benchmark slice for the 10–20x computer-use goal.

PR: #1993

New complex benchmark task added locally:

terminal_explorer_file_workflow: Cua Driver launches cmd.exe to create a unique sentinel file in an isolated temp folder, launches File Explorer to that generated folder, verifies both the file contents and Explorer window, then closes/removes the benchmark resources.

Latest local benchmark command:

py scripts/cua_driver_bench.py --runs 2 --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe

Result: 10/10 passed across five task classes:

{
  "calculator_basic_click": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "explorer_temp_folder": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "notepad_edit_save": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_explorer_file_workflow": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_sentinel_file": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  }
}

Cleanup verified: no Calculator, benchmark Notepad, or benchmark Explorer windows remained.

Note: Vercel check currently reports Authorization required to deploy; CodeRabbit was pending at last check.

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/cua-driver/rust/crates/platform-windows/src/msaa.rs`:
- Around line 182-187: The UiaNode construction in the MSAA implementation
hard-codes the enabled, visible, selected, and focused fields instead of
querying the actual accessibility state. Replace these hard-coded values by
calling get_accState() on the accessible object (using the same self_var
parameter pattern used for get_accRole, get_accName, and get_accDefaultAction),
define STATE_SYSTEM constants for the relevant bit flags, check the returned
VARIANT to extract the correct state values, and populate the enabled, visible,
selected, and focused fields based on the actual state flags. Apply this same
change to both locations where these hard-coded values appear, following the
existing error-handling pattern used for the other get_accXxx method calls.

In `@libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs`:
- Around line 1102-1103: The find_element function (and similarly click_verified
and set_value_verified at lines 1223-1236 and 3703-3714) awaits the
spawn_blocking UIA walk without applying a timeout, which can cause the tools to
hang indefinitely if a UIA provider blocks. Apply the same timeout guard that is
already implemented in get_window_state to these functions by wrapping the
.await call with a timeout mechanism, so that if the UIA walk takes too long, it
returns a structured error instead of hanging the tool.
- Around line 3648-3655: The searchable_texts_from_nodes function currently
collects a broad set of text fields including metadata like automation_id,
class_name, and control_type alongside actual value/text content (name, value,
help_text). This causes expected_value verification to match against control
type names or class identifiers instead of actual content. Create two separate
lists: a narrower list containing only name, value, and help_text for
expected_value verification, and keep the broader list including automation_id,
class_name, and control_type for expected_label_present searches. Update the
callers of searchable_texts_from_nodes (including those around lines 3718-3719)
to use the appropriate narrower or broader list depending on whether they are
verifying actual values or searching for label presence.
- Around line 1240-1244: The current logic for present_ok and absent_ok only
checks the final post_labels state against expectations without verifying an
actual state change occurred from before to after the click. If a label was
already absent before the click, expected_absent will still be satisfied even if
the click had no effect. Capture the label state before the click operation
(pre_labels), then modify the present_ok check to verify the label transitioned
from absent to present (or was already present) and the absent_ok check to
verify the label transitioned from present to absent (or was already absent),
ensuring that success only returns true when the requested state change actually
occurred, not just when the final state happens to match expectations.
- Line 596: The stable_id field is incorrectly named and computed since it
includes idx and name parameters that can change across snapshots when the UI
tree reorders or labels change. Either refactor the stable_id calculation in the
format string to only include truly stable provider identifiers like backend,
pid, hwnd, and automation_id (removing idx and name), or rename this field to
something like snapshot_id or debug_id to accurately reflect that it is a
snapshot-local identifier rather than a durable stable identifier that clients
can persist across snapshots.
- Around line 876-882: The screenshot metadata being populated in the structured
JSON uses a hardcoded scale_factor of 1.0, but this value becomes incorrect when
the image is resized by resize_png_if_needed. Capture the original width and
height before calling resize_png_if_needed, then after the resize operation
completes and returns the new width (w) and height (h), calculate the actual
scale_factor by dividing the original width by the new width (orig_w / w).
Update the scale_factor field in the structured JSON with this calculated value
instead of the hardcoded 1.0 to accurately reflect the resize operation that
occurred.

In `@libs/python/agent/cua_agent/loops/coordinate_parser.py`:
- Around line 30-33: After converting the item to float in the coords append
operation, add validation to ensure the resulting float value is finite before
appending it to the coords list. Check that the float value is not infinity or
NaN (which can occur when coercing pathological numerics like 1e309), and raise
a ValueError with an appropriate message if the value is not finite. This
ensures downstream integer pixel operations do not fail due to infinite
coordinate values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: da9ca25c-427a-4071-a598-77bcba67000b

📥 Commits

Reviewing files that changed from the base of the PR and between c898d7b and 705585d.

📒 Files selected for processing (6)

libs/cua-driver/rust/crates/platform-windows/src/msaa.rs
libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs
libs/cua-driver/rust/crates/platform-windows/src/uia/mod.rs
libs/python/agent/cua_agent/loops/coordinate_parser.py
libs/python/agent/cua_agent/loops/uitars.py
libs/python/agent/tests/test_uitars_coordinate_parser.py

mustbearnold · 2026-06-23T09:19:47Z

Added a browser benchmark slice locally as the next step toward harder 10–20x computer-use evals.

New task:

browser_local_html_open: creates an isolated local HTML file with a unique sentinel, launches Microsoft Edge with a dedicated benchmark-only user-data-dir and --new-window, verifies the browser window and sentinel text through Cua Driver accessibility queries, then closes/kills only that benchmark-owned Edge process and removes temp resources.

Latest command:

py scripts/cua_driver_bench.py --runs 2 --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe

Result: 12/12 passed across six task classes:

{
  "browser_local_html_open": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "calculator_basic_click": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "explorer_temp_folder": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "notepad_edit_save": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_explorer_file_workflow": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_sentinel_file": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  }
}

Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained.

Local helper tests now pass 15/15.

mustbearnold · 2026-06-23T09:25:59Z

Added a harder multi-step browser interaction benchmark locally.

New task:

browser_button_state_change: creates an isolated local HTML page with a unique ready/done sentinel, launches Edge with a benchmark-only profile, verifies the initial ready text, finds the button through accessibility, invokes it using click_verified, verifies the post-click done text through the verified transaction's pre/post state delta, and cleans the benchmark-owned browser profile/window.

Latest command:

py scripts/cua_driver_bench.py --runs 2 --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe

Result: 14/14 passed across seven task classes:

{
  "browser_button_state_change": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "browser_local_html_open": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "calculator_basic_click": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "explorer_temp_folder": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "notepad_edit_save": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_explorer_file_workflow": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_sentinel_file": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  }
}

Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained.

Local helper tests now pass 16/16.

mustbearnold · 2026-06-24T03:37:34Z

Ignoring Vercel as requested, I ran a longer full local complex benchmark sample.

Command:

py scripts/cua_driver_bench.py --runs 5 --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe

Result: 35/35 passed across seven task classes:

{
  "browser_button_state_change": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  },
  "browser_local_html_open": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  },
  "calculator_basic_click": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  },
  "explorer_temp_folder": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  },
  "notepad_edit_save": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  },
  "terminal_explorer_file_workflow": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  },
  "terminal_sentinel_file": {
    "total": 5,
    "passed": 5,
    "verified": 5,
    "success": 5,
    "cleanup_success": 5
  }
}

Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained.

Raw artifact:

reports/benchmarks/cua-driver-baseline-20260624-033221.jsonl

mustbearnold · 2026-06-24T04:13:49Z

Pushed CodeRabbit finite-coordinate hardening fix.

Commit: f2f1fb2 fix(agent): reject non-finite UITARS coordinates

Verification run before commit:

py -m pytest libs/python/agent/tests/test_uitars_coordinate_parser.py -q  # 13 passed
py -m pytest tests/test_bench_cleanup_metrics.py tests/test_find_geometry_smoke.py -q  # 16 passed
git diff --check  # passed
py scripts/cua_driver_smoke.py --runs 1 --task all --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  # 3/3 passed

mustbearnold · 2026-06-24T04:22:33Z

Pushed CodeRabbit stable-id follow-up fix.

Commit: a25bbeb fix(driver): distinguish stable and snapshot element ids

What changed:

stable_id now uses durable provider identity when available: backend:pid:hwnd:automation_id:<id>.
Snapshot-local identity moved to snapshot_debug_id.
Existing element_token remains the snapshot token for within-snapshot lookup.

Verification:

cargo test -p platform-windows structured_element_record_tests --lib  # 3 passed
cargo test -p platform-windows --lib  # 80 passed
cargo check -p platform-windows  # passed
git diff --check  # passed

mustbearnold · 2026-06-24T04:25:00Z

Pushed CodeRabbit screenshot metadata fix.

Commit: 4c8635b fix(driver): report scaled screenshot metadata

What changed:

Added screenshot_metadata helper.
screenshot.scale_factor now reflects original_width / resized_width when the screenshot is resized.
screenshot.coordinate_space reports scaled_window_pixels when resizing occurred.
screenshot.original_width is included for clients that need to map image coordinates back to window coordinates.

Verification:

cargo test -p platform-windows screenshot_metadata_tests --lib  # 2 passed
cargo test -p platform-windows --lib  # 82 passed
cargo check -p platform-windows  # passed
py scripts/cua_driver_smoke.py --runs 1 --task find_geometry --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  # 1/1 passed
git diff --check  # passed

mustbearnold · 2026-06-24T04:28:19Z

Pushed CodeRabbit UIA timeout fix.

Commit: ef0f8d9 fix(driver): bound verified UIA snapshot walks

What changed:

Added shared bounded UIA walk await helper with a 4s timeout.
Applied it to find_element, click_verified pre/post snapshots, and set_value_verified pre/post snapshots.
Timeout returns a structured tool error instead of allowing a provider hang to stall the tool indefinitely.

Verification:

cargo test -p platform-windows uia_walk_timeout_tests --lib  # 1 passed
cargo test -p platform-windows --lib  # 83 passed
cargo check -p platform-windows  # passed
py scripts/cua_driver_smoke.py --runs 1 --task all --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  # 3/3 passed
git diff --check  # passed

mustbearnold · 2026-06-24T04:30:19Z

Pushed CodeRabbit set-value verification fix.

Commit: 3fa8e57 fix(driver): verify set value against value text

What changed:

Added a narrow value_texts_from_nodes path for expected_value.
set_value_verified.expected_value no longer succeeds by matching role/class/automation metadata such as Button, Edit, or MSAA.
expected_label_present continues to use the broader searchable text path.

Verification:

cargo test -p platform-windows set_value_verified_text_tests --lib  # 1 passed
cargo test -p platform-windows --lib  # 84 passed
cargo check -p platform-windows  # passed
py scripts/cua_driver_smoke.py --runs 1 --task set_value_verified --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  # 1/1 passed
git diff --check  # passed

mustbearnold · 2026-06-24T04:32:43Z

Pushed CodeRabbit click expectation transition fix.

Commit: 97e5fdd fix(driver): require click expectation transitions

What changed:

click_verified now evaluates expected present/absent labels against pre/post state, not only final state.
Already-satisfied expectations are surfaced as already_satisfied and do not count as expected_change_satisfied.
Prevents wrong-window/no-op clicks from passing just because an expected absent label was absent before the click.

Verification:

cargo test -p platform-windows click_verified_expectation_tests --lib  # 2 passed
cargo test -p platform-windows --lib  # 86 passed
cargo check -p platform-windows  # passed
py scripts/cua_driver_smoke.py --runs 1 --task click_verified --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  # 1/1 passed
git diff --check  # passed

mustbearnold · 2026-06-24T04:36:12Z

Pushed CodeRabbit MSAA state fix.

Commit: e78213e fix(driver): derive MSAA element state flags

What changed:

Reads get_accState(CHILDID_SELF) in the MSAA walker.
Derives enabled, visible, selected, and focused from STATE_SYSTEM_* flags instead of hard-coding/defaulting them.
Keeps rect presence as part of visibility so offscreen/invisible/no-rect MSAA nodes do not appear visible.

Verification:

cargo test -p platform-windows state_flag_tests --lib  # 2 passed
cargo test -p platform-windows --lib  # 88 passed
cargo check -p platform-windows  # passed
py scripts/cua_driver_smoke.py --runs 1 --task find_geometry --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe  # 1/1 passed
git diff --check  # passed

mustbearnold · 2026-06-24T04:36:53Z

@coderabbitai review

coderabbitai · 2026-06-24T04:37:02Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

mustbearnold · 2026-06-24T04:38:53Z

Post-review-fix full local benchmark run.

Command:

py scripts/cua_driver_bench.py --runs 2 --driver-bin upstream/cua/libs/cua-driver/rust/target/debug/cua-driver.exe

Result: 14/14 passed across seven task classes after the latest CodeRabbit follow-up fixes.

{
  "browser_button_state_change": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "browser_local_html_open": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "calculator_basic_click": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "explorer_temp_folder": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "notepad_edit_save": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_explorer_file_workflow": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  },
  "terminal_sentinel_file": {
    "total": 2,
    "passed": 2,
    "verified": 2,
    "success": 2,
    "cleanup_success": 2
  }
}

Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained.

Raw artifact:

reports/benchmarks/cua-driver-baseline-20260624-043654.jsonl

CUA Windows 10x Agent added 7 commits June 23, 2026 20:04

security: harden model-supplied coordinate parsing

3d9e5ad

feat(driver): enrich windows structured element geometry

5175d3f

feat(driver): expose cached element geometry

723a2da

feat(driver): add find element query tool

5616363

feat(driver): add verified click transaction

7533ad8

feat(driver): add verified set value transaction

c26e2e8

feat(driver): summarize verified action state diffs

705585d

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

fix(agent): reject non-finite UITARS coordinates

f2f1fb2

fix(driver): distinguish stable and snapshot element ids

a25bbeb

fix(driver): report scaled screenshot metadata

4c8635b

fix(driver): bound verified UIA snapshot walks

ef0f8d9

fix(driver): verify set value against value text

3fa8e57

fix(driver): require click expectation transitions

97e5fdd

fix(driver): derive MSAA element state flags

e78213e

Uh oh!

Uh oh!

Conversation

mustbearnold commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Windows driver surfaces

Safety / reliability

Verification

Diff stat

Commit stack

Notes for reviewers

Summary by CodeRabbit

Uh oh!

vercel Bot commented Jun 23, 2026

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

mustbearnold commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mustbearnold commented Jun 23, 2026

Uh oh!

mustbearnold commented Jun 23, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mustbearnold commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mustbearnold commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading