Improve .strings handling: unquoted strings, inline comments, and merge prefixing#741
Improve .strings handling: unquoted strings, inline comments, and merge prefixing#741jkmassel wants to merge 11 commits into
.strings handling: unquoted strings, inline comments, and merge prefixing#741Conversation
`StringsFileValidationHelper.find_duplicated_keys` tokenized only quoted keys, so a valid ASCII-plist file using unquoted strings — `CFBundleName = WordPress;`, common in `InfoPlist.strings` — raised `Invalid character`. The state machine now also tokenizes unquoted keys and values, accepting the same character set `plutil` allows for unquoted strings (alphanumerics plus `_ . - $ : /`). The scanner only ever runs on input `plutil` has already accepted, so matching its grammar keeps a parseable file from tripping it.
`L10nHelper.strings_file_type` gains an opt-in `assume_valid:` flag that skips the `plutil -lint` validity check when the caller has already parsed the file, avoiding a redundant `plutil` invocation. Default behavior is unchanged for every existing caller.
`ios_lint_localizations` open-coded a `strings_file_type == :text ? scan : warn` gate. That logic now lives in `StringsFileValidationHelper.scan_for_duplicate_keys`, which detects the file format and returns a `[:scanned | :unsupported_format | :unscannable, payload]` tri-state; each caller maps it to its own policy. As a side effect, `ios_lint_localizations`' `check_duplicate_keys` no longer crashes the lane on a valid-but-not-flat-`.strings` file — it warns via `UI.important` and skips it.
`StringsFileValidationHelper`'s tokenizer only recognized comments in its `:root` context, so a comment placed between the tokens of a statement — `"key" /* note */ = "value";`, `"key" = /* note */ "value";`, or `"key" = "value" /* note */;` — raised `Invalid character` even though `plutil` accepts all three. The scanner now carries a `resume_context` so a comment returns to the state it interrupted instead of always dropping back to `:root`. The one ambiguous position — a `/` right after `=`, which could begin a comment or a `/usr/bin`-style unquoted value — is disambiguated one character later via a new `:maybe_comment_or_value` state, so `/`-leading unquoted values still parse.
`L10nHelper.merge_strings` rewrites each key to add the caller's prefix, but its line matcher only handled `[A-Z0-9_]` keys with a *quoted* value. An unquoted key containing `. - $ : /`, or any line with an *unquoted* value (`CFBundleName = WordPress;`), was written to the merged file without the prefix while still being bookkept *with* it — an inconsistency that can resurface the collisions the prefix avoids and break downstream key extraction. The matcher now accepts the full unquoted-string character set `plutil` allows. The unquoted-value case keys on the fact that an unquoted string can't contain spaces, so it won't mistake comment prose (`and = a red herring ...`) for a key/value pair — a case the fixtures deliberately guard.
.strings duplicate-key detection (unquoted values, shared scan helper).strings handling: unquoted strings, inline comments, and merge prefixing
A temporal localization guardrail: compares two versions of a base-language `.strings` file and fails when an existing key's value changes its format placeholders (count, position, or argument type), which silently breaks every translation filed under that key. - `StringPlaceholdersHelper` — a pure-Ruby, platform-agnostic primitive that reduces a value's printf/`NSString` specifiers to a canonical signature. - `ios_lint_localization_placeholder_changes` — the iOS action wrapping it, with fail-closed guards for duplicate keys and mixed positional/non-positional specifiers. Builds on the `.strings` parsing and `scan_for_duplicate_keys` helper from #741.
These specs fail against the current `merge_strings`, pinning a gap left by PR #741: the duplicate-key scanner and `plutil`-based bookkeeping both accept `.strings` comments *between* a statement's tokens (e.g. `CFBundleName /* c */ = WordPress;`), but the line-based key rewrite only prefixes when `=` and the value are adjacent. A key behind such a comment is written unprefixed while bookkept *with* the prefix — resurfacing the very key collisions the prefix exists to prevent, as the second spec demonstrates by merging two files into a single collapsed key. --- Generated with the help of Claude Code, https://claude.com/claude-code Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Assert parsed keys so the regression captures the merge contract without depending on raw output formatting. --- Generated with the help of Codex, https://openai.com/codex Co-Authored-By: Codex GPT-5 <noreply@openai.com>
| duplicates = Fastlane::Helper::Ios::StringsFileValidationHelper.find_duplicated_keys(file: path) | ||
| duplicate_keys[language] = duplicates.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless duplicates.empty? | ||
| else | ||
| status, payload = Fastlane::Helper::Ios::StringsFileValidationHelper.scan_for_duplicate_keys(file: path) |
| # The value is required to be a single unquoted token ending in `;` — an unquoted string can't contain spaces, so | ||
| # this won't mistake comment prose like `and = a red herring for when…` (spaces in the "value") for a key/value pair. | ||
| line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*)([a-zA-Z0-9_.$:/-]+\s*;)}u, "\\1\"#{prefix}\\2\"\\3\\4") | ||
| end |
There was a problem hiding this comment.
My AI noticed that merge_strings doesn't handle inter-token comments.
Could this logic handle the same inter-token comment placements that StringsFileValidationHelper now accepts? (https://github.com/wordpress-mobile/release-toolkit/pull/741/changes#diff-6e464dfe8ccd3974026649f8aaa02f9f8518261ef391f13bdb9535ef46d2b908R111-R116)
For example, CFBundleName /* c */ = WordPress; is accepted by plutil and by the duplicate-key scanner after this PR, but merge_strings still writes it unprefixed while bookkeeping it as prefixed.
That can reintroduce collisions in the merged file.
I added failing specs in #742 to pin the gap.
AI suggests: A tokenizer-backed rewrite would be ideal, but a comment-aware interim rewrite would also close the regression.
`merge_strings` prefixed keys with line-based regexes that assume the key, `=`, and value are adjacent, so an inter-token `.strings` comment — `AppName /* c */ = WordPress;` or `CFBundleName = WordPress /* trailing */;` — broke the match and the key was written to the merged file without the prefix, while the `plutil`-derived bookkeeping still recorded it *with* the prefix. The inconsistency could resurface the very collisions the prefix exists to avoid (two files sharing a key behind a comment merged to a single colliding key). Replace the regexes with `StringsFileValidationHelper.prefix_keys`, a comment-aware tokenizer that reuses the same state machine `find_duplicated_keys` uses: it prefixes a key wherever the tokenizer enters one, regardless of surrounding comments, and leaves `key = value`-looking text inside a comment alone. Bookkeeping and rewriting now agree by construction. Fixes the failing specs added in #742.
…alizations.rb Co-authored-by: Gio Lodi <gio@mokacoding.com>
|
Nicely done @jkmassel ! I got my agent to poke around more and it found this #750 To be honest, I can't tell whether that's a legit concern for our use case because If you need this addressed ASAP, then by all means let's merge and ship. But if you have the time to investigate further, then maybe it'd be nice to address? |
Summary
Related improvements to the existing
.stringshandling — the duplicate-key scanner (StringsFileValidationHelper) and themerge_stringshelper — extracted from #738 so the newios_lint_localization_placeholder_changesaction, which builds on these, can be reviewed on its own. This PR stands alone and is independently useful.The common thread: each change accepts more of the unquoted-string grammar
plutilitself allows, so legitimateInfoPlist.strings-style input — unquoted keys/values, inter-token comments — is handled instead of rejected.Changes
1. Parse unquoted
.stringsvalues and widen the unquoted grammarStringsFileValidationHelperalready tokenized unquoted keys; it now also tokenizes unquoted values and accepts the full character setplutilallows for unquoted strings (alphanumerics plus_ . - $ : /). So anInfoPlist.strings-style file likeCFBundleName = WordPress;parses instead of raisingInvalid character. The scanner only ever runs on inputplutilhas already accepted, so matching its grammar keeps a parseable file from tripping it; a genuinely un-tokenizable plist (e.g. a nested-dictionary value) still fails closed.2. Accept comments placed between a statement's tokens
StringsFileValidationHelperrecognized comments only in its top-level:rootcontext, so a comment between a statement's tokens —"key" /* note */ = "value";,"key" = /* note */ "value";, or"key" = "value" /* note */;— raisedInvalid character, even thoughplutilaccepts all three. The scanner now threads aresume_contextso a comment returns to the state it interrupted instead of always dropping back to:root. The lone ambiguous spot — a/right after=, which could begin a comment or a/usr/bin-style unquoted value — is disambiguated one character later via a new:maybe_comment_or_valuestate, so/-leading unquoted values still parse. ✅3. Centralize duplicate-key detection behind
scan_for_duplicate_keysios_lint_localizationsopen-coded astrings_file_type == :text ? scan : warngate. That logic now lives inStringsFileValidationHelper.scan_for_duplicate_keys, which detects the file format and returns a[:scanned | :unsupported_format | :unscannable, payload]tri-state; each caller maps it to its own policy. As a side effect,ios_lint_localizations'check_duplicate_keysno longer crashes the lane on a valid-but-not-flat-.stringsfile — it warns viaUI.importantand skips it.4.
assume_valid:to skip a redundantplutil -lintL10nHelper.strings_file_typegains an opt-inassume_valid:flag that skips theplutil -lintvalidity check when the caller has already parsed the file. Default behavior is unchanged for every existing caller. (Used by #738.)5. Apply the
merge_stringskey prefix to unquoted keys and valuesL10nHelper.merge_stringsrewrites each key to add the caller's prefix, but its line matcher only handled[A-Z0-9_]keys with a quoted value. An unquoted key containing. - $ : /(e.g. a reverse-DNS key), or any line with an unquoted value (CFBundleName = WordPress;), was written to the merged file without the prefix — while still being bookkept with it. That inconsistency can resurface the very collisions the prefix exists to avoid and break downstream key extraction. The matcher now accepts the full unquoted-string setplutilallows; the unquoted-value case keys on the fact that an unquoted string can't contain spaces, so it won't mistake comment prose likeand = a red herring …for a key/value pair (a case the fixtures deliberately guard). ✅Test Plan
bundle exec rubocop— no violations.bundle exec rspec— 914 examples, 0 failures; new coverage for unquoted values, the widened key set,scan_for_duplicate_keys, the graceful warn-and-skip,assume_valid, inter-token comments, andmerge_stringsunquoted-key/value prefixing.Related
ios_lint_localization_placeholder_changesaction #738 will be rebased to depend on this.