Grapheme-cluster aware cursor, hit-testing and soft wrap on top of #5922#5988
Grapheme-cluster aware cursor, hit-testing and soft wrap on top of #5922#5988xyos wants to merge 20 commits into
Conversation
… calculations # Conflicts: # src/virtual_renderer.js
… rendering - add getGraphemeCluster/getGraphemeBoundaries helpers (Intl.Segmenter) to lang - snap cursor to grapheme cluster boundaries in moveCursorTo (ZWJ, combining marks) - use fontMetrics.textWidth in textToScreenCoordinates instead of column * charWidth - iterate grapheme clusters in $pixelToColumn hit-testing; treat span seams inclusively; clamp clicks left of line start to column 0 - stop rendering U+200D as invalid invisible so ZWJ emoji stay intact (ajaxorg#5813) - render line-start RLE as plain text when rtl/rtlText enabled instead of red dot (ajaxorg#5423) - add experiments/widthchar.html browser test harness for variable-width rendering covering ajaxorg#460 ajaxorg#4142 ajaxorg#5431 ajaxorg#5813 ajaxorg#4602 ajaxorg#5436 ajaxorg#5423 ajaxorg#3753 ajaxorg#3866 ajaxorg#3617
- $getDisplayTokens marks continuation code units of grapheme clusters (surrogate pairs, combining marks, ZWJ sequences) with CHAR_CONT so wrap-split candidates can see cluster boundaries - $computeWrapSplits/addSplit moves any split falling inside a cluster back to the cluster start, or past it when the cluster is wider than the wrap limit, so soft wrap never tears a grapheme apart - unit tests for ZWJ/combining-mark/surrogate wrap splits - widthchar harness: wrapped editor with emoji/CJK/hebrew lines checking splits land on grapheme boundaries, caret round-trips, and arrow walks
|
One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5988 +/- ##
==========================================
+ Coverage 93.73% 93.90% +0.17%
==========================================
Files 640 641 +1
Lines 137743 138873 +1130
Branches 14487 14802 +315
==========================================
+ Hits 129113 130411 +1298
+ Misses 8630 8462 -168
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Screen columns now count grapheme clusters instead of UTF-16 code units, so a ZWJ emoji occupies one logical column instead of eleven: - lang.mayContainGraphemeClusters: cheap regexp gate (marks, ZWJ, variation selectors, astral code points, conjoining jamo) so plain ASCII/CJK/hebrew lines skip segmentation entirely - $getStringScreenWidth walks grapheme boundaries; doc<->screen mapping (documentToScreenPosition/screenToDocumentPosition) follows since both funnel through it - $getDisplayTokens emits one display token per cluster and carries a parallel docLengths array; $computeWrapSplits consumes it so wrap budget is measured in visual columns and doc split offsets stay in code units; CHAR_CONT hack removed - text layer $renderToken returns grapheme column counts and computes tab stops from grapheme columns - font_metrics $findColumnPosition/$pixelToColumn map screen columns to text-node offsets via grapheme boundaries - widthchar harness: graphemeColumns check; updated wrap/surrogate test expectations to the new column semantics
|
Pushed a third commit (423a67e) completing the column model rework: 1 grapheme cluster = 1 screen column. Screen columns now count grapheme clusters instead of UTF-16 code units, addressing the "column must be column" model discussed in #5431:
Verified in Chrome via the |
|
One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR. |
Found by benchmarking against the pre-grapheme baseline: - $getStringScreenWidth: iterate the segmenter lazily (lang.forEachGrapheme) when maxScreenColumn is bounded so the early exit stays O(maxScreenColumn); bounded query on a 54k-unit emoji line: 3.7ms -> 0.03ms per call. Without a bound use countGraphemes/getGraphemeBoundaries (no per-cluster callback). - lang.getGraphemeBoundaries: memoize the last few segmentations; cursor rendering and doc<->screen mapping repeatedly segment equal line prefixes (documentToScreenPosition on a 54k-unit emoji line: 8.1ms -> 0.2ms per call) - $renderToken: segment the token value once and reuse a monotonic boundary cursor for tab stops instead of re-segmenting the prefix per tab (9.6k-unit emoji TSV line render: 356ms -> 3ms) - $pixelToColumn: gate per-text-node segmentation on mayContainGraphemeClusters - document why line-start RLE renders as plain text only in rtl modes
- mayContainGraphemeClusters missed cluster-forming characters outside
\p{M}: prepend chars (arabic number signs U+0600-0605 U+06DD U+070F
U+0890 U+0891 U+08E2, malayalam dot reph U+0D4E), spacing marks
(thai/lao AM U+0E33 U+0EB3), ZWNJ and halfwidth katakana voicing marks.
Gated fast paths counted these as code units while ungated paths counted
clusters, giving two different column systems for the same line.
Verified against an exhaustive BMP scan of Intl.Segmenter behavior.
- text layer: align token boundaries to grapheme cluster boundaries before
rendering () so a tokenizer splitting inside a
cluster (keycap emoji digit+FE0F+20E3) cannot desync rendered spans
from the session's whole-line segmentation
|
One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR. |
|
Ran a multi-agent adversarial review of the branch (4 review dimensions — correctness, consumer regressions, performance, edge cases — each finding verified by re-running executable repros against both this branch and the pre-grapheme baseline Confirmed and fixed (5dc32fb, 1d8fb13)Performance (measured, worst-case scenarios):
Correctness:
Verified as sound by the review (with executable repros)
Known trade-off (documented in code, feedback welcome)Rendering a line-start RLE (U+202B) as plain text when Remaining pre-existing gaps (unchanged from base): |
|
One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR. |
|
One of the public type files has been updated, plase make sure there are no backwards incompatible changes done in the PR. |
Builds on #5922 (initial
range.getClientRects()-based text measurement). This adds the two commits on top of that branch making the cursor/column model and soft wrap grapheme-cluster aware, and fixing related rendering issues found while testing against a real browser.Changes on top of #5922
Grapheme-cluster cursor model & pixel-accurate coordinates (50aff8d)
lang.getGraphemeCluster()/lang.getGraphemeBoundaries()helpers usingIntl.Segmenter(grapheme granularity), with a surrogate-pair fallback when unavailable.Selection.moveCursorTosnaps to grapheme cluster boundaries instead of only surrogate pairs, so arrow keys traverse ZWJ emoji (👨👩👧👦) and combining marks (ã̤) in one step.VirtualRenderer.textToScreenCoordinatesusesfontMetrics.textWidth()instead ofcolumn × characterWidth, so forward and inverse coordinate mappings agree.FontMetrics.$pixelToColumnhit-testing iterates real grapheme clusters, treats span seams inclusively (fixes clicks withshowInvisiblesfalling between spans), and clamps clicks left of line start to column 0.rtl/rtlTextis enabled (fixes ace.js: placing a red dot when clicking "enter" #5423, same approach as Fix: make RTL new line a valid character #5434); Trojan-Source highlighting still applies outside RTL mode.Grapheme-aware soft wrap (5d2bc1d)
$getDisplayTokensmarks continuation code units of grapheme clusters with a newCHAR_CONTdisplay token (only for lines containing chars ≥ U+0300).$computeWrapSplitsnever splits inside a cluster: a split landing mid-cluster moves back to the cluster start, or past the cluster when it is wider than the wrap limit. Before this, a wrap limit crossing 👨👩👧👦 split its 11 code units across screen rows.Issues this addresses
#460, #4142, #5431, #5813, #4602, #5423, #5436, #3753, #3866, #3617
Testing
experiments/widthchar.html— a browser harness with monospace/proportional/RTL/wrapped editors over pathological lines (BMP & ZWJ emoji, combining marks, CJK, tabs, Hebrew, U+FE0F, APL fallback glyphs). It verifies cursor DOM position againstRangeground truth,textToScreen→screenToTextround-trips at every grapheme boundary, arrow-walk stops, selection-marker coverage, wrap-split boundaries, and the RTL red-dot/backspace scenarios: 2,400+ assertions passing in Chrome, 0 failures (baseline before these commits: ~280 failures).wrapLine split never breaks grapheme clustersinedit_session_test.js; full suite unchanged otherwise (1,480 passing).Testing Page
Open kitchen-sink @ 1d8fb139b0e049a066490d7ebf07015e4565e088
Open kitchen-sink @ 3d5b9198934ec623b161b2d55faa5b3390c0dbd7