feat(driver): add Windows element query geometry and verified actions#1993
feat(driver): add Windows element query geometry and verified actions#1993mustbearnold wants to merge 14 commits into
Conversation
|
Someone is attempting to deploy a commit to the Cua Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughTwo independent changes: (1) The Windows UIA/MSAA layer gains ChangesWindows UIA Node Enrichment and New Tools
Safe Coordinate Parsing for UITARS
Sequence Diagram(s)sequenceDiagram
rect rgba(135, 206, 235, 0.5)
Note over Caller,UiaWalker: click_verified flow
Caller->>ClickVerifiedTool: invoke(element_token, expected_label_present)
ClickVerifiedTool->>UiaWalker: pre-snapshot label extraction
UiaWalker-->>ClickVerifiedTool: pre_labels set
ClickVerifiedTool->>click: execute OS click dispatch
click-->>ClickVerifiedTool: os_dispatch_success
ClickVerifiedTool->>UiaWalker: post-snapshot label extraction
UiaWalker-->>ClickVerifiedTool: post_labels set
ClickVerifiedTool->>ClickVerifiedTool: compute label deltas, verify expected_label_present/absent
ClickVerifiedTool-->>Caller: {os_dispatch_success, state_changed, verified, success}
end
rect rgba(144, 238, 144, 0.5)
Note over Caller,UiaWalker: set_value_verified flow
Caller->>SetValueVerifiedTool: invoke(value, expected_value, expected_label_present)
SetValueVerifiedTool->>UiaWalker: pre-snapshot text extraction
UiaWalker-->>SetValueVerifiedTool: pre_texts
SetValueVerifiedTool->>set_value: delegate set_value call
set_value-->>SetValueVerifiedTool: result
SetValueVerifiedTool->>UiaWalker: post-snapshot text extraction
UiaWalker-->>SetValueVerifiedTool: post_texts
SetValueVerifiedTool->>SetValueVerifiedTool: diff texts, evaluate expected substrings
SetValueVerifiedTool-->>Caller: {value_found, label_found, state_changed, success}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Posted PR and added a more complex local benchmark slice for the 10–20x computer-use goal. PR: #1993 New complex benchmark task added locally:
Latest local benchmark command: Result: 10/10 passed across five task classes: {
"calculator_basic_click": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"explorer_temp_folder": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"notepad_edit_save": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_explorer_file_workflow": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_sentinel_file": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
}
}Cleanup verified: no Calculator, benchmark Notepad, or benchmark Explorer windows remained. Note: Vercel check currently reports |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@libs/cua-driver/rust/crates/platform-windows/src/msaa.rs`:
- Around line 182-187: The UiaNode construction in the MSAA implementation
hard-codes the enabled, visible, selected, and focused fields instead of
querying the actual accessibility state. Replace these hard-coded values by
calling get_accState() on the accessible object (using the same self_var
parameter pattern used for get_accRole, get_accName, and get_accDefaultAction),
define STATE_SYSTEM constants for the relevant bit flags, check the returned
VARIANT to extract the correct state values, and populate the enabled, visible,
selected, and focused fields based on the actual state flags. Apply this same
change to both locations where these hard-coded values appear, following the
existing error-handling pattern used for the other get_accXxx method calls.
In `@libs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rs`:
- Around line 1102-1103: The find_element function (and similarly click_verified
and set_value_verified at lines 1223-1236 and 3703-3714) awaits the
spawn_blocking UIA walk without applying a timeout, which can cause the tools to
hang indefinitely if a UIA provider blocks. Apply the same timeout guard that is
already implemented in get_window_state to these functions by wrapping the
.await call with a timeout mechanism, so that if the UIA walk takes too long, it
returns a structured error instead of hanging the tool.
- Around line 3648-3655: The searchable_texts_from_nodes function currently
collects a broad set of text fields including metadata like automation_id,
class_name, and control_type alongside actual value/text content (name, value,
help_text). This causes expected_value verification to match against control
type names or class identifiers instead of actual content. Create two separate
lists: a narrower list containing only name, value, and help_text for
expected_value verification, and keep the broader list including automation_id,
class_name, and control_type for expected_label_present searches. Update the
callers of searchable_texts_from_nodes (including those around lines 3718-3719)
to use the appropriate narrower or broader list depending on whether they are
verifying actual values or searching for label presence.
- Around line 1240-1244: The current logic for present_ok and absent_ok only
checks the final post_labels state against expectations without verifying an
actual state change occurred from before to after the click. If a label was
already absent before the click, expected_absent will still be satisfied even if
the click had no effect. Capture the label state before the click operation
(pre_labels), then modify the present_ok check to verify the label transitioned
from absent to present (or was already present) and the absent_ok check to
verify the label transitioned from present to absent (or was already absent),
ensuring that success only returns true when the requested state change actually
occurred, not just when the final state happens to match expectations.
- Line 596: The stable_id field is incorrectly named and computed since it
includes idx and name parameters that can change across snapshots when the UI
tree reorders or labels change. Either refactor the stable_id calculation in the
format string to only include truly stable provider identifiers like backend,
pid, hwnd, and automation_id (removing idx and name), or rename this field to
something like snapshot_id or debug_id to accurately reflect that it is a
snapshot-local identifier rather than a durable stable identifier that clients
can persist across snapshots.
- Around line 876-882: The screenshot metadata being populated in the structured
JSON uses a hardcoded scale_factor of 1.0, but this value becomes incorrect when
the image is resized by resize_png_if_needed. Capture the original width and
height before calling resize_png_if_needed, then after the resize operation
completes and returns the new width (w) and height (h), calculate the actual
scale_factor by dividing the original width by the new width (orig_w / w).
Update the scale_factor field in the structured JSON with this calculated value
instead of the hardcoded 1.0 to accurately reflect the resize operation that
occurred.
In `@libs/python/agent/cua_agent/loops/coordinate_parser.py`:
- Around line 30-33: After converting the item to float in the coords append
operation, add validation to ensure the resulting float value is finite before
appending it to the coords list. Check that the float value is not infinity or
NaN (which can occur when coercing pathological numerics like 1e309), and raise
a ValueError with an appropriate message if the value is not finite. This
ensures downstream integer pixel operations do not fail due to infinite
coordinate values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: da9ca25c-427a-4071-a598-77bcba67000b
📒 Files selected for processing (6)
libs/cua-driver/rust/crates/platform-windows/src/msaa.rslibs/cua-driver/rust/crates/platform-windows/src/tools/impl_.rslibs/cua-driver/rust/crates/platform-windows/src/uia/mod.rslibs/python/agent/cua_agent/loops/coordinate_parser.pylibs/python/agent/cua_agent/loops/uitars.pylibs/python/agent/tests/test_uitars_coordinate_parser.py
|
Added a browser benchmark slice locally as the next step toward harder 10–20x computer-use evals. New task:
Latest command: Result: 12/12 passed across six task classes: {
"browser_local_html_open": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"calculator_basic_click": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"explorer_temp_folder": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"notepad_edit_save": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_explorer_file_workflow": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_sentinel_file": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
}
}Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained. Local helper tests now pass 15/15. |
|
Added a harder multi-step browser interaction benchmark locally. New task:
Latest command: Result: 14/14 passed across seven task classes: {
"browser_button_state_change": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"browser_local_html_open": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"calculator_basic_click": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"explorer_temp_folder": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"notepad_edit_save": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_explorer_file_workflow": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_sentinel_file": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
}
}Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained. Local helper tests now pass 16/16. |
|
Ignoring Vercel as requested, I ran a longer full local complex benchmark sample. Command: Result: 35/35 passed across seven task classes: {
"browser_button_state_change": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
},
"browser_local_html_open": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
},
"calculator_basic_click": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
},
"explorer_temp_folder": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
},
"notepad_edit_save": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
},
"terminal_explorer_file_workflow": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
},
"terminal_sentinel_file": {
"total": 5,
"passed": 5,
"verified": 5,
"success": 5,
"cleanup_success": 5
}
}Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained. Raw artifact: |
|
Pushed CodeRabbit finite-coordinate hardening fix. Commit: Verification run before commit: |
|
Pushed CodeRabbit stable-id follow-up fix. Commit: What changed:
Verification: |
|
Pushed CodeRabbit screenshot metadata fix. Commit: What changed:
Verification: |
|
Pushed CodeRabbit UIA timeout fix. Commit: What changed:
Verification: |
|
Pushed CodeRabbit set-value verification fix. Commit: What changed:
Verification: |
|
Pushed CodeRabbit click expectation transition fix. Commit: What changed:
Verification: |
|
Pushed CodeRabbit MSAA state fix. Commit: What changed:
Verification: |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
Post-review-fix full local benchmark run. Command: Result: 14/14 passed across seven task classes after the latest CodeRabbit follow-up fixes. {
"browser_button_state_change": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"browser_local_html_open": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"calculator_basic_click": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"explorer_temp_folder": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"notepad_edit_save": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_explorer_file_workflow": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
},
"terminal_sentinel_file": {
"total": 2,
"passed": 2,
"verified": 2,
"success": 2,
"cleanup_success": 2
}
}Cleanup verified: no Calculator, benchmark Notepad, benchmark Explorer, or benchmark Browser windows remained. Raw artifact: |
Summary
Adds reliable Windows GUI automation surfaces that let agents address UI elements semantically and verify intended state changes instead of treating low-level OS dispatch success as task success.
Windows driver surfaces
get_window_stateelement records with stable geometry and metadata while preserving legacy fields.get_element_geometryfor cached element-token/index geometry lookups.find_elementfor focused semantic element queries by label, role, automation id, class name, and text.click_verified, a verified click transaction with pre/post UIA snapshots and explicit expected-label predicates.set_value_verified, a verified text/value transaction that wrapsset_valueand verifies the requested value/label appears in post-state.added_labels/removed_labels,added_texts/removed_texts, plus total counts) so callers get a bounded explanation of what changed without dumping the full UI tree.Safety / reliability
os_dispatch_successfromstate_changed,verified,expected_change_satisfied, and finalsuccessin verified action tools.Verification
Source checks on branch
pr/windows-10x-agent-runtimeat705585d:Harness/unit checks:
Runtime smoke:
Result:
Covers:
find_element_geometry_smokeclick_verified_smokeset_value_verified_smokePost-smoke cleanup:
{"Calculator": [], "cua-mcp-bench-setv": []}Latest raw smoke artifact:
Additional local runtime samples also passed earlier:
Diff stat
Commit stack
Notes for reviewers
successin verified tools meansos_dispatch_success && verified; primitive dispatch success alone is exposed separately.Summary by CodeRabbit
New Features
Improvements
Tests