diff --git a/CHANGELOG.md b/CHANGELOG.md index 5f693ce7f..691391c0f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,12 +14,19 @@ _None_ ### Bug Fixes +- `StringsFileValidationHelper.find_duplicated_keys` now parses *unquoted* keys and values (valid `.strings` syntax, common in `InfoPlist.strings`) instead of raising `Invalid character`, matching the character set `plutil` accepts for unquoted strings (alphanumerics plus `_ . - $ : /`). This lets `ios_lint_localizations`' `check_duplicate_keys` work on `InfoPlist.strings`-style files. [#741] +- `StringsFileValidationHelper.find_duplicated_keys` now also accepts comments placed *between* the tokens of a statement (e.g. `"key" /* note */ = "value";`), which `plutil` allows, instead of raising `Invalid character` on the `/`. [#741] +- `ios_lint_localizations`' `check_duplicate_keys` no longer crashes the lane on a file that parses as a property list but isn't a flat `.strings` the scanner can tokenize (e.g. a `` value); it now warns via `UI.important` and skips that file. [#741] +- `L10nHelper.merge_strings` now applies the key prefix via a comment-aware tokenizer, so it correctly prefixes unquoted keys containing `. - $ : /`, lines with an *unquoted* value (e.g. `CFBundleName = WordPress;`), and keys sitting behind an inter-token `.strings` comment (e.g. `CFBundleName /* note */ = WordPress;`) — matching the grammar `plutil` accepts. Previously the line-based matcher left those keys written to the merged file without the prefix while still bookkeeping them *with* it, leaving the output inconsistent with the reported keys (which could resurface the very collisions the prefix avoids and break downstream key extraction). [#741] +- `StringsFileValidationHelper.find_duplicated_keys` (and `scan_for_duplicate_keys`) now tokenize dictionary- and array-valued entries (`"k" = { … };`, `"k" = ( … );`, nesting allowed), skipping the container body — so a `:text` file that uses them is scanned for duplicate *top-level* keys instead of returning `:unscannable`. [#750] +- `L10nHelper.merge_strings` now prefixes the outer key of a dictionary/array-valued entry and copies the value through verbatim, instead of crashing on it — valid input `plutil` accepts that the flat-`.strings` tokenizer previously couldn't rewrite. If a file still holds a construct the tokenizer can't rewrite, its lines are copied through unprefixed with a `UI.important` warning (and its keys are bookkept unprefixed to match, so a genuine cross-file collision is still reported rather than silently collapsed) instead of aborting the whole merge. [#750] - Bump `faraday` and `nokogiri` to address security vulnerabilities. [#749] - Bump `concurrent-ruby` to address CVE-2026-54904 / GHSA-h8w8-99g7-qmvj. [#751] ### Internal Changes -_None_ +- Centralized `.strings` duplicate-key detection behind `StringsFileValidationHelper.scan_for_duplicate_keys`, which detects the file format and returns a `[:scanned | :unsupported_format | :unscannable, payload]` tri-state that callers map to their own policy (warn-and-skip vs fail-closed). `ios_lint_localizations` now uses it. [#741] +- `L10nHelper.strings_file_type` (and `StringsFileValidationHelper.scan_for_duplicate_keys`) accept `assume_valid:` to skip a redundant `plutil -lint` when the caller has already parsed the file. [#741] ## 14.9.0 diff --git a/lib/fastlane/plugin/wpmreleasetoolkit/actions/ios/ios_lint_localizations.rb b/lib/fastlane/plugin/wpmreleasetoolkit/actions/ios/ios_lint_localizations.rb index 42b051d55..c93481502 100644 --- a/lib/fastlane/plugin/wpmreleasetoolkit/actions/ios/ios_lint_localizations.rb +++ b/lib/fastlane/plugin/wpmreleasetoolkit/actions/ios/ios_lint_localizations.rb @@ -56,15 +56,20 @@ def self.find_duplicated_keys(params) language = File.basename(File.dirname(file), '.lproj') path = File.join(params[:input_dir], file) - file_type = Fastlane::Helper::Ios::L10nHelper.strings_file_type(path: path) - if file_type == :text - duplicates = Fastlane::Helper::Ios::StringsFileValidationHelper.find_duplicated_keys(file: path) - duplicate_keys[language] = duplicates.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless duplicates.empty? - else + status, payload = Fastlane::Helper::Ios::StringsFileValidationHelper.scan_for_duplicate_keys(file: path) + case status + when :scanned + duplicate_keys[language] = payload.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless payload.empty? + when :unsupported_format UI.important <<~WRONG_FORMAT - File `#{path}` is in #{file_type} format, while finding duplicate keys only make sense on files that are in ASCII-plist format. - Since your files are in #{file_type} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call. + File `#{path}` is in #{payload} format, while finding duplicate keys can only occur on files that are in ASCII-plist format. + Since your files are in #{payload} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call. WRONG_FORMAT + when :unscannable + UI.important <<~UNSCANNABLE + Could not check `#{path}` for duplicate keys: #{payload.strip} + The file parses as a property list but isn't a flat `.strings` file the duplicate-key scanner understands, so duplicate detection was skipped for it. + UNSCANNABLE end end diff --git a/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_l10n_helper.rb b/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_l10n_helper.rb index 26ed9ef66..5f6a63792 100644 --- a/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_l10n_helper.rb +++ b/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_l10n_helper.rb @@ -15,18 +15,24 @@ class L10nHelper # Returns the type of a `.strings` file (XML, binary or ASCII) # # @param [String] path The path to the `.strings` file to check + # @param [Boolean] assume_valid Skip the `plutil -lint` validity check when the caller has already + # confirmed the file parses (e.g. via `read_strings_file_as_hash`), avoiding a redundant + # `plutil` invocation. Only the format detection (`file`) then runs. # @return [Symbol] The file format used by the `.strings` file. Can be one of: # - `:text` for the ASCII-plist file format (containing typical `"key" = "value";` lines) # - `:xml` for XML plist file format (can be used if machine-generated, especially since there's no official way/tool to generate the ASCII-plist file format as output) # - `:binary` for binary plist file format (usually only true for `.strings` files converted by Xcode at compile time and included in the final `.app`/`.ipa`) # - `nil` if the file does not exist or is neither of those format (e.g. not a `.strings` file at all) # - def self.strings_file_type(path:) + def self.strings_file_type(path:, assume_valid: false) return :text if File.empty?(path) # If completely empty file, consider it as a valid `.strings` files in textual format - # Start by checking it seems like a valid property-list file (and not e.g. an image or plain text file) - _, status = Open3.capture2('/usr/bin/plutil', '-lint', path) - return nil unless status.success? + # Start by checking it seems like a valid property-list file (and not e.g. an image or plain text file). + # A caller that has already parsed the file can skip this redundant check via `assume_valid: true`. + unless assume_valid + _, status = Open3.capture2('/usr/bin/plutil', '-lint', path) + return nil unless status.success? + end # If it is a valid property-list file, determine the actual format used format_desc, status = Open3.capture2('/usr/bin/file', path) @@ -72,6 +78,11 @@ def self.read_utf8_lines(file) # @note The method is able to handle input files which are using different encodings, # guessing the encoding of each input file using the BOM (and defaulting to UTF8). # The generated file will always be in utf-8, by convention. + # @note Dictionary- and array-valued entries (`"k" = { … };`, `"k" = ( … );`, nesting allowed) are + # prefixed on their outer key with the value preserved verbatim. If a file still holds some + # construct the tokenizer can't rewrite, its lines are copied through unprefixed with a warning + # (and its keys are then bookkept unprefixed too, so the reported duplicates stay accurate) + # rather than aborting the whole merge. # # @raise [RuntimeError] If one of the paths provided is not in text format (but XML or binary instead), or if any of the files are missing. # @@ -88,20 +99,37 @@ def self.merge_strings(paths:, output_path:) raise "The file `#{input_file}` does not exist or is of unknown format." if fmt.nil? raise "The file `#{input_file}` is in #{fmt} format but we currently only support merging `.strings` files in text format." unless fmt == :text - string_keys = read_strings_file_as_hash(path: input_file).keys.map { |k| "#{prefix}#{k}" } - duplicates += (string_keys & all_keys_found) # Find duplicates using Array intersection, and add those to duplicates list - all_keys_found += string_keys + raw_keys = read_strings_file_as_hash(path: input_file).keys tmp_file.write("/* MARK: - #{File.basename(input_file)} */\n\n") - # Read line-by-line to reduce memory footprint during content copy - read_utf8_lines(input_file).each do |line| - unless prefix.nil? || prefix.empty? - # The `/u` modifier on the RegExps is to make them UTF-8 - line.gsub!(/^(\s*")/u, "\\1#{prefix}") # Lines starting with a quote are considered to be start of a key; add prefix right after the quote - line.gsub!(/^(\s*)([A-Z0-9_]+)(\s*=\s*")/ui, "\\1\"#{prefix}\\2\"\\3") # Lines starting with an identifier followed by a '=' are considered to be an unquoted key (typical in InfoPlist.strings files for example) - end - tmp_file.write(line) + # Add the prefix to every key. We tokenize via `StringsFileValidationHelper.prefix_keys` rather than + # matching keys with a line-based regex, so that keys are found regardless of where `.strings` comments + # sit (e.g. `CFBundleName /* note */ = WordPress;`) and `key = value`-looking text inside a comment is + # left alone. It also handles dictionary/array values (`"k" = { … };`) — prefixing the outer key and + # copying the value verbatim. + lines = read_utf8_lines(input_file) + applied_prefix = prefix + begin + lines = Fastlane::Helper::Ios::StringsFileValidationHelper.prefix_keys(lines: lines, prefix: prefix) + rescue StandardError => e + # `plutil` may still accept a construct the tokenizer can't rewrite: it parses fine (so the file + # clears the `:text` gate above) yet `prefix_keys` raises on it. Fail soft: copy this file's lines + # through unprefixed rather than aborting the whole merge — mirroring the scanner path, where + # `scan_for_duplicate_keys` returns `:unscannable` instead of crashing the lane. `lines` is untouched + # by the raise (the assignment above never completes), so it still holds the original file contents, + # and `applied_prefix` records that the keys went out *unprefixed* so the bookkeeping below matches. + applied_prefix = '' + UI.important("Could not add prefix `#{prefix}` to the keys in `#{input_file}` (#{e.message}); copying its lines through unprefixed.") end + + # Bookkeep the keys as they were actually written — prefixed, or unprefixed on the fail-soft path. + # Doing this *after* the rewrite keeps the reported duplicates consistent with the merged file even + # when prefixing fell back, so a genuine collision is still surfaced rather than silently collapsed. + string_keys = raw_keys.map { |k| "#{applied_prefix}#{k}" } + duplicates += (string_keys & all_keys_found) # Find duplicates using Array intersection, and add those to duplicates list + all_keys_found += string_keys + + lines.each { |line| tmp_file.write(line) } tmp_file.write("\n") end tmp_file.close # ensure we flush the content to disk diff --git a/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb b/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb index 890b8138a..5a0144e28 100644 --- a/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb +++ b/lib/fastlane/plugin/wpmreleasetoolkit/helper/ios/ios_strings_file_validation_helper.rb @@ -5,24 +5,101 @@ module Helper module Ios class StringsFileValidationHelper # context can be one of: - # :root, :maybe_comment_start, :in_line_comment, :in_block_comment, - # :maybe_block_comment_end, :in_quoted_key, - # :after_quoted_key_before_eq, :after_quoted_key_and_equal, - # :in_quoted_value, :after_quoted_value - State = Struct.new(:context, :buffer, :in_escaped_ctx, :found_key) + # :root, :maybe_comment_start, :maybe_comment_or_value, :in_line_comment, :in_block_comment, + # :maybe_block_comment_end, :in_quoted_key, :in_unquoted_key, + # :after_quoted_key_before_eq, :after_quoted_key_and_eq, + # :in_quoted_value, :in_unquoted_value, :after_quoted_value + # + # `resume_context` holds the context to return to once a comment ends. Comments are valid not only at + # the top level but also *between* the tokens of a statement (e.g. `"key" /* note */ = "value";`), so a + # comment must resume the state it interrupted rather than always dropping back to `:root`. + # `depth` tracks how deeply nested we are inside a container value (`{ … }` / `( … )`); see `:in_container_value`. + State = Struct.new(:context, :buffer, :in_escaped_ctx, :found_key, :resume_context, :depth) + + # Characters allowed in an *unquoted* string — a key or a value. Unquoted strings are valid + # `.strings` syntax (the old-style ASCII property-list format) and are common in `InfoPlist.strings` + # (e.g. `CFBundleName = WordPress;`). `plutil` accepts alphanumerics plus `_ . - $ : /` in an + # unquoted string, so we match the same set: this scanner only ever runs on input `plutil` has + # already accepted, and matching its grammar keeps a file it parses from tripping the scanner. + # An unquoted key runs until the first whitespace or `=`; an unquoted value until whitespace or `;`. + UNQUOTED_STRING_CHARACTER = %r{[a-zA-Z0-9_.$:/-]}u + + # Enter a comment from an inter-token position, remembering where to resume once it ends. + # (`state.context` is still the originating state when a transition lambda runs — it's only reassigned + # to the lambda's return value afterwards — so this captures the state the comment interrupts.) + ENTER_COMMENT = lambda do |state, _c| + state.resume_context = state.context + :maybe_comment_start + end + + # A `/` between `=` and the value is ambiguous: it can start a comment (`/* … */` or `// …`) or be the + # first character of an unquoted value (e.g. a path like `/usr/bin`). Defer the decision by one character + # via `:maybe_comment_or_value`. + ENTER_COMMENT_OR_VALUE = lambda do |state, _c| + state.resume_context = state.context + :maybe_comment_or_value + end + + # Restore the context a comment interrupted (defaults to `:root`), then clear the saved context. + RESUME_AFTER_COMMENT = lambda do |state, _c| + resume = state.resume_context || :root + state.resume_context = :root + resume + end + + # A value can be a nested container — a dictionary `{ … }` or an array `( … )` — which `plutil` accepts + # (e.g. `"k" = { a = b; };` or `"k" = ( "a", "b" );`). We don't rewrite or record anything *inside* a + # container: its inner keys are not top-level keys, and `prefix_keys` copies the value through verbatim. + # We only need to find the matching close delimiter, so we just count nesting depth. `OPEN_CONTAINER` + # doubles as the entry transition from `:after_quoted_key_and_eq` and as the nested-open transition. + OPEN_CONTAINER = lambda do |state, _c| + state.depth += 1 + :in_container_value + end + + # Close one level of nesting. The value is only finished — and we go back to expecting the terminating + # `;` — once depth returns to 0; otherwise we're still inside an outer container. + CLOSE_CONTAINER = lambda do |state, _c| + state.depth -= 1 + state.depth.zero? ? :after_quoted_value : :in_container_value + end + + # A `/` inside a container may start a comment — whose body can contain `{ } ( ) ; "` that must NOT count + # toward nesting — or just be an ordinary value character (e.g. a path). Defer the decision one char, + # resuming the container in either case. + ENTER_CONTAINER_COMMENT = lambda do |state, _c| + state.resume_context = :in_container_value + :maybe_container_comment + end TRANSITIONS = { root: { /\s/u => :root, '/' => :maybe_comment_start, - '"' => :in_quoted_key + '"' => :in_quoted_key, + # An unquoted key, e.g. `CFBundleName = "…";` as used by `InfoPlist.strings`. + UNQUOTED_STRING_CHARACTER => lambda do |state, c| + state.buffer.write(c) + :in_unquoted_key + end }, maybe_comment_start: { '/' => :in_line_comment, /\*/u => :in_block_comment }, + # Reached only from `:after_quoted_key_and_eq`, where a leading `/` might begin a comment or an + # unquoted value. A `*` or `/` confirms a comment; anything else means the `/` was the value's first + # character (values aren't buffered, so we just continue scanning the value). + maybe_comment_or_value: { + /\*/u => :in_block_comment, + '/' => :in_line_comment, + /./mu => lambda do |state, _c| + state.resume_context = :root + :in_unquoted_value + end + }, in_line_comment: { - "\n" => :root, + "\n" => RESUME_AFTER_COMMENT, /./u => :in_line_comment }, in_block_comment: { @@ -30,7 +107,7 @@ class StringsFileValidationHelper /./mu => :in_block_comment }, maybe_block_comment_end: { - '/' => :root, + '/' => RESUME_AFTER_COMMENT, /./mu => :in_block_comment }, in_quoted_key: { @@ -44,21 +121,78 @@ class StringsFileValidationHelper :in_quoted_key end }, + in_unquoted_key: { + # The key ends at the first whitespace or `=`. Whitespace still expects an `=` next; + # an `=` moves straight on to the value. + /[\s=]/u => lambda do |state, c| + state.found_key = state.buffer.string.dup + state.buffer = StringIO.new + c == '=' ? :after_quoted_key_and_eq : :after_quoted_key_before_eq + end, + UNQUOTED_STRING_CHARACTER => lambda do |state, c| + state.buffer.write(c) + :in_unquoted_key + end + }, after_quoted_key_before_eq: { + # A comment may sit between the key and the `=` (e.g. `"key" /* note */ = "value";`). + '/' => ENTER_COMMENT, /\s/u => :after_quoted_key_before_eq, '=' => :after_quoted_key_and_eq }, after_quoted_key_and_eq: { + # A `/` here may start a comment or an unquoted value (which can contain `/`); disambiguate one char + # later. This entry must precede `UNQUOTED_STRING_CHARACTER` below, which also matches `/`. + '/' => ENTER_COMMENT_OR_VALUE, /\s/u => :after_quoted_key_and_eq, - '"' => :in_quoted_value + '"' => :in_quoted_value, + # A container value — a dictionary `{ … }` or an array `( … )`, which may nest (e.g. `"k" = { a = b; };`). + /[{(]/u => OPEN_CONTAINER, + # An unquoted value, e.g. `CFBundleName = WordPress;` as used by `InfoPlist.strings`. + UNQUOTED_STRING_CHARACTER => :in_unquoted_value }, in_quoted_value: { '"' => :after_quoted_value, /./mu => :in_quoted_value }, + in_unquoted_value: { + # The value ends at the first whitespace or the terminating `;`. Its contents are irrelevant + # to duplicate-key detection, so — unlike a key — we don't buffer it. + ';' => :root, + /\s/u => :after_quoted_value, + UNQUOTED_STRING_CHARACTER => :in_unquoted_value + }, after_quoted_value: { + # A comment may sit between the value and the terminating `;` (e.g. `"key" = "value" /* note */;`). + '/' => ENTER_COMMENT, /\s/u => :after_quoted_value, ';' => :root + }, + # Inside a container value (`{ … }` / `( … )`). We ignore the contents — inner keys aren't top-level + # keys and the value is copied verbatim by `prefix_keys` — and only track nesting so we can find the + # matching close. Quoted strings and comments are entered explicitly because their bodies can contain + # `{ } ( ) ;` that must not affect the depth count; everything else (`=`, `;`, `,`, whitespace, unquoted + # text, newlines) is consumed by the catch-all, which must stay LAST so the specific keys win first. + in_container_value: { + /[{(]/u => OPEN_CONTAINER, + /[})]/u => CLOSE_CONTAINER, + '"' => :in_container_quoted_string, + '/' => ENTER_CONTAINER_COMMENT, + /./mu => :in_container_value + }, + # A quoted string inside a container. Skipped wholesale (its `{ } ( ) ; ,` are literal, not structural) + # until the closing quote returns us to the container. Escapes are handled globally (see the escape + # branch in `find_duplicated_keys`, whose allow-list includes this context). + in_container_quoted_string: { + '"' => :in_container_value, + /./mu => :in_container_quoted_string + }, + # One char after a `/` inside a container: `*`/`/` confirm a comment (which resumes the container once + # it ends), anything else means the `/` was just a value character and we stay in the container. + maybe_container_comment: { + /\*/u => :in_block_comment, + '/' => :in_line_comment, + /./mu => :in_container_value } }.freeze @@ -70,7 +204,7 @@ class StringsFileValidationHelper def self.find_duplicated_keys(file:) keys_with_lines = Hash.new { |h, k| h[k] = [] } - state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil) + state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil, resume_context: :root, depth: 0) # Using our `each_utf8_line` helper instead of `File.readlines` ensures we can also read files that are # encoded in UTF-16, yet process each of their lines as a UTF-8 string, so that `RegExp#match?` don't throw @@ -81,7 +215,7 @@ def self.find_duplicated_keys(file:) # This is more straightforward than having to account for it in the `TRANSITIONS` table. if state.in_escaped_ctx || c == '\\' # Just because we check for escaped characters at the global level, it doesn't mean we allow them in every context. - allowed_contexts_for_escaped_characters = %i[in_quoted_key in_quoted_value in_block_comment in_line_comment] + allowed_contexts_for_escaped_characters = %i[in_quoted_key in_quoted_value in_block_comment in_line_comment in_container_quoted_string] raise "Found escaped character outside of allowed contexts on line #{line_no + 1} (current context: #{state.context})" unless allowed_contexts_for_escaped_characters.include?(state.context) state.buffer.write(c) if state.context == :in_quoted_key @@ -106,6 +240,75 @@ def self.find_duplicated_keys(file:) keys_with_lines.keep_if { |_, lines| lines.count > 1 } end + + # Detects the file format and, when applicable, scans for duplicate keys — in one step, so + # callers don't each re-implement the "`:text`-only" gate. `find_duplicated_keys` only + # understands the flat ASCII-plist syntax; an xml/binary plist can't be tokenized by it (though + # `plutil` collapses any duplicate to its last value when parsing those anyway). + # + # @param [String] file The path to the `.strings` file to inspect. + # @param [Boolean] assume_valid Forwarded to `strings_file_type`: skip the redundant `plutil -lint` + # when the caller has already confirmed the file parses. + # @return [Array] A `[status, payload]` pair, one of: + # - `[:scanned, { key => [lines] }]` — a `:text` file we tokenized (hash empty if none). + # - `[:unsupported_format, format]` — not a `:text` file (`:xml`, `:binary`, or `nil`); not scanned. + # - `[:unscannable, error_message]` — a `:text` file the tokenizer couldn't read. + # Each caller decides how to react (warn-and-skip, fail closed, …) from this one source of truth. + def self.scan_for_duplicate_keys(file:, assume_valid: false) + format = Fastlane::Helper::Ios::L10nHelper.strings_file_type(path: file, assume_valid: assume_valid) + return [:unsupported_format, format] unless format == :text + + [:scanned, find_duplicated_keys(file: file)] + rescue StandardError => e + [:unscannable, e.message] + end + + # Rewrites `.strings` lines so every key carries `prefix`, leaving comments, values, whitespace and + # formatting untouched. A quoted key gets the prefix inside its quotes (`"key"` → `"key"`); an + # unquoted key is wrapped in quotes (`key` → `"key"`). Because it tokenizes the file the same way + # `find_duplicated_keys` does, it is comment-aware: a key sitting behind an inter-token comment (e.g. + # `key /* note */ = value;`) is still prefixed, and `key = value`-looking text *inside* a comment is left + # alone — a distinction a line-based regex can't reliably make. It is likewise container-aware: a + # dictionary or array value (`"k" = { … };` / `"k" = ( … );`, nesting allowed) has only its outer key + # prefixed, with the value — including any keys *inside* it — copied through verbatim. + # + # @param [Array] lines The file's lines, already decoded to UTF-8 (e.g. via `L10nHelper.read_utf8_lines`). + # @param [String] prefix The prefix to insert before every key. A nil/empty prefix returns `lines` unchanged. + # @return [Array] The rewritten lines. + def self.prefix_keys(lines:, prefix:) + return lines if prefix.nil? || prefix.empty? + + state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil, resume_context: :root, depth: 0) + lines.map do |line| + rewritten = +'' + line.each_char do |c| + # Escaped characters only occur inside quoted strings or comments — never around a key boundary — + # so they're copied through verbatim (mirroring `find_duplicated_keys`' global escape handling). + if state.in_escaped_ctx || c == '\\' + state.in_escaped_ctx = !state.in_escaped_ctx + rewritten << c + next + end + + previous_context = state.context + (_, transition) = TRANSITIONS[previous_context].find { |regex, _| c.match?(regex) } || [nil, nil] + raise "Invalid character `#{c}` found (current context: #{previous_context})" if transition.nil? + + state.context = transition.is_a?(Proc) ? transition.call(state, c) : transition + + if previous_context == :root && state.context == :in_quoted_key + rewritten << c << prefix # opening `"` of a quoted key — the prefix goes inside the quotes + elsif previous_context == :root && state.context == :in_unquoted_key + rewritten << '"' << prefix << c # first char of an unquoted key — open a quote + prefix, then the char + elsif previous_context == :in_unquoted_key && state.context != :in_unquoted_key + rewritten << '"' << c # the unquoted key just ended — close the quote, then the delimiter + else + rewritten << c + end + end + rewritten + end + end end end end diff --git a/spec/ios_l10n_helper_spec.rb b/spec/ios_l10n_helper_spec.rb index e9609240b..e7e492ebb 100644 --- a/spec/ios_l10n_helper_spec.rb +++ b/spec/ios_l10n_helper_spec.rb @@ -39,6 +39,14 @@ def file_encoding(path) expect(File).to exist(invalid_fixture) expect(described_class.strings_file_type(path: invalid_fixture)).to be_nil end + + it 'skips the redundant `plutil -lint` check when `assume_valid: true`, still detecting the format' do + text_fixture = fixture('Localizable-utf16.strings') + allow(Open3).to receive(:capture2).and_call_original + expect(Open3).not_to receive(:capture2).with('/usr/bin/plutil', '-lint', anything) + + expect(described_class.strings_file_type(path: text_fixture, assume_valid: true)).to eq(:text) + end end describe '#merge_strings' do @@ -92,6 +100,131 @@ def file_encoding(path) end end + it 'prefixes unquoted keys with unquoted values and keys containing `. - $ : /`' do + # Regression: prefixing must cover the full unquoted-string grammar `plutil` accepts — unquoted *values* + # (e.g. `CFBundleName = WordPress;`) and keys containing `. - $ : /` — so the written file stays consistent + # with the prefixed keys we report. (Previously only `[A-Z0-9_]` keys with a *quoted* value were prefixed, + # leaving these written without the prefix and resurfacing the very collisions the prefix avoids.) + content = <<~STRINGS + CFBundleName = WordPress; + com.automattic.app-id = "X"; + "QuotedKey" = "Y"; + STRINGS + Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| + input_file = File.join(tmp_dir, 'InfoPlist.strings') + File.write(input_file, content) + output_file = File.join(tmp_dir, 'output.strings') + described_class.merge_strings(paths: { input_file => 'pfx.' }, output_path: output_file) + merged = File.read(output_file) + expect(merged).to include('"pfx.CFBundleName" = WordPress;') + expect(merged).to include('"pfx.com.automattic.app-id" = "X";') + expect(merged).to include('"pfx.QuotedKey" = "Y";') + end + end + + it 'prefixes keys when `.strings` comments sit between a statement\'s tokens' do + # Regression: the duplicate-key scanner accepts comments *between* a statement's tokens (e.g. + # `CFBundleName /* c */ = WordPress;`), and bookkeeping derives keys from `plutil`, which agrees. + # But the line-based rewrite only prefixes when `=` and the value are adjacent, so an inter-token + # comment leaves the key written unprefixed while still bookkept *with* the prefix — the exact + # collision/inconsistent-output the prefix exists to prevent. + content = <<~STRINGS + CFBundleName = WordPress /* trailing */; + AppName /* between key and = */ = WordPress; + DisplayName /* between key and = */ = "WordPress"; + STRINGS + Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| + input_file = File.join(tmp_dir, 'InfoPlist.strings') + File.write(input_file, content) + output_file = File.join(tmp_dir, 'output.strings') + described_class.merge_strings(paths: { input_file => 'pfx.' }, output_path: output_file) + merged_keys = described_class.read_strings_file_as_hash(path: output_file).keys + expect(merged_keys).to contain_exactly('pfx.CFBundleName', 'pfx.AppName', 'pfx.DisplayName') + end + end + + it 'does not silently drop a key to a collision when an inter-token comment blocks prefixing' do + # The harm of the gap above: two files each carrying the same key behind an inter-token comment are + # bookkept under distinct prefixes (so no duplicate is reported), yet both are written verbatim — + # producing a genuine duplicate in the merged file that `plutil` later collapses to a single value. + content = "CFBundleName /* c */ = WordPress;\n" + Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| + file_a = File.join(tmp_dir, 'A.strings') + file_b = File.join(tmp_dir, 'B.strings') + File.write(file_a, content) + File.write(file_b, content) + output_file = File.join(tmp_dir, 'output.strings') + described_class.merge_strings(paths: { file_a => 'a.', file_b => 'b.' }, output_path: output_file) + merged_keys = described_class.read_strings_file_as_hash(path: output_file).keys + expect(merged_keys).to contain_exactly('a.CFBundleName', 'b.CFBundleName') + end + end + + it 'prefixes the outer key of a nested-dictionary value and preserves the value verbatim' do + # A dictionary/array value (`"k" = { … };`) is valid `:text` that `plutil` accepts; the tokenizer now + # prefixes the outer key and copies the container body through unchanged, rather than failing to rewrite it. + content = %("k" = { a = b; };\n) + Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| + input_file = File.join(tmp_dir, 'InfoPlist.strings') + File.write(input_file, content) + output_file = File.join(tmp_dir, 'output.strings') + + expect(FastlaneCore::UI).not_to receive(:important) + described_class.merge_strings(paths: { input_file => 'pfx.' }, output_path: output_file) + + # Outer key prefixed, nested value untouched — and the result still parses back to the prefixed key. + expect(File.read(output_file)).to include('"pfx.k" = { a = b; };') + expect(described_class.read_strings_file_as_hash(path: output_file).keys).to contain_exactly('pfx.k') + end + end + + it 'keeps a container-valued key distinct from a same-named key in another file (no silent collision)' do + # Before containers were tokenizable, this file was written unprefixed while bookkept *with* the prefix, so + # a same-named key elsewhere collided in the merged output yet went unreported and was silently collapsed by + # `plutil`. Now the key is actually prefixed, so the two stay distinct and neither value is clobbered. + Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| + flat = File.join(tmp_dir, 'A.strings') + nested = File.join(tmp_dir, 'B.strings') + File.write(flat, %("MyKey" = "original";\n)) + File.write(nested, %("MyKey" = { sub = val; };\n)) + output_file = File.join(tmp_dir, 'output.strings') + + duplicates = described_class.merge_strings(paths: { flat => nil, nested => 'pfx.' }, output_path: output_file) + + expect(duplicates).to be_empty + merged = described_class.read_strings_file_as_hash(path: output_file) + expect(merged.keys).to contain_exactly('MyKey', 'pfx.MyKey') + expect(merged['MyKey']).to eq('original') # the flat file's value is not clobbered + end + end + + it 'falls back to copying a file through unprefixed — and bookkeeps it unprefixed — when prefixing raises' do + # Backstop for any construct the tokenizer still can't rewrite: `merge_strings` warns and copies the file + # through unprefixed rather than aborting. Because it then bookkeeps those keys *unprefixed* (matching what + # was written), a genuine collision with another file is still reported instead of silently collapsing. + Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| + dest = File.join(tmp_dir, 'A.strings') + weird = File.join(tmp_dir, 'B.strings') + File.write(dest, %("shared" = "one";\n)) + File.write(weird, %("shared" = "two";\n)) + output_file = File.join(tmp_dir, 'output.strings') + + # Force the fail-soft path for the prefixed file only, regardless of its content. + allow(Fastlane::Helper::Ios::StringsFileValidationHelper).to receive(:prefix_keys).and_wrap_original do |orig, **kwargs| + raise 'boom' if kwargs[:prefix] == 'pfx.' + + orig.call(**kwargs) + end + expect(FastlaneCore::UI).to receive(:important).with(a_string_including('Could not add prefix `pfx.`').and(a_string_including('unprefixed'))) + + duplicates = described_class.merge_strings(paths: { dest => nil, weird => 'pfx.' }, output_path: output_file) + + # B was written unprefixed, so its `shared` collides with A's `shared` — and that collision is REPORTED. + expect(duplicates).to eq(['shared']) + expect(File.read(output_file)).to include('"shared" = "two";') + end + end + it 'returns duplicate keys found' do paths = { fixture('Localizable-utf16.strings') => nil, fixture('non-latin-utf16.strings') => nil } Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir| diff --git a/spec/ios_lint_localizations_spec.rb b/spec/ios_lint_localizations_spec.rb index af357ab3a..9afbab6ca 100644 --- a/spec/ios_lint_localizations_spec.rb +++ b/spec/ios_lint_localizations_spec.rb @@ -194,7 +194,7 @@ def run_l10n_linter_test(data_file:, check_duplicate_keys: nil, fail_on_strings_ File.write(File.join(fr_lproj, 'Localizable.strings'), File.read(xml_file)) expected_message = <<~EXPECTED_WARNING - File `#{fr_lproj}/Localizable.strings` is in xml format, while finding duplicate keys only make sense on files that are in ASCII-plist format. + File `#{fr_lproj}/Localizable.strings` is in xml format, while finding duplicate keys can only occur on files that are in ASCII-plist format. Since your files are in xml format, you should probably disable the `check_duplicate_keys` option from this `ios_lint_localizations` call. EXPECTED_WARNING expect(FastlaneCore::UI).to receive(:important).with(expected_message) @@ -228,4 +228,54 @@ def run_l10n_linter_test(data_file:, check_duplicate_keys: nil, fail_on_strings_ ) end end + + # Regression coverage for the unquoted-key/value parsing in `StringsFileValidationHelper`, which used + # to raise `Invalid character` on *unquoted* keys (valid `.strings` syntax, common in `InfoPlist.strings`) + # and so crashed `check_duplicate_keys`. It now parses unquoted keys and detects duplicates among them. + context 'duplicate-key detection on files with unquoted keys' do + let(:input_dir) { Dir.mktmpdir('a8c-lint-l10n-unquoted-') } + + after { FileUtils.remove_entry(input_dir) } + + def write_localizable(lang, content) + lproj = File.join(input_dir, "#{lang}.lproj") + FileUtils.mkdir_p(lproj) + File.write(File.join(lproj, 'Localizable.strings'), content) + end + + it 'parses unquoted keys instead of raising, and detects duplicates among them' do + write_localizable('en', <<~STRINGS) + okay_key = "value"; + NSCameraUsageDescription = "Take photos"; + NSCameraUsageDescription = "Take a photo"; + "quoted_key" = "value"; + STRINGS + + result = nil + expect { result = described_class.find_duplicated_keys({ input_dir: input_dir }) }.not_to raise_error + expect(result).to eq('en' => ['`NSCameraUsageDescription` was found at multiple lines: 2, 3']) + end + + it 'reports no duplicates when unquoted keys are unique (mixed with quoted keys)' do + write_localizable('en', <<~STRINGS) + CFBundleName = "WordPress"; + NSCameraUsageDescription = "Take photos"; + "quoted_key" = "value"; + STRINGS + + expect(described_class.find_duplicated_keys({ input_dir: input_dir })).to eq({}) + end + + it 'warns and skips (without crashing) a file that parses as a plist but is not a flat `.strings`' do + # A `` value is a valid old-style plist that `plutil` accepts but the duplicate-key scanner can't + # tokenize. This used to crash the lane (the scanner raised, uncaught); now it is surfaced via + # `UI.important` and the file is skipped. (Dictionary/array values, by contrast, are now tokenized.) + write_localizable('en', "\"k\" = <48656c6c6f>;\n") + expect(FastlaneCore::UI).to receive(:important).with(/Could not check .* for duplicate keys/) + + result = nil + expect { result = described_class.find_duplicated_keys({ input_dir: input_dir }) }.not_to raise_error + expect(result).to eq({}) + end + end end diff --git a/spec/ios_strings_file_validation_helper_spec.rb b/spec/ios_strings_file_validation_helper_spec.rb index a183f682a..54f86a317 100644 --- a/spec/ios_strings_file_validation_helper_spec.rb +++ b/spec/ios_strings_file_validation_helper_spec.rb @@ -54,14 +54,161 @@ end end - context 'when there are unquoted keys' do - it 'returns an error' do - # Unquoted strings are currently not supported by our validation helper in its current state, despite being a valid syntax, because we considered - # that it was not worth adding complexity to our state machine logic for this use case — we expect all the `.strings` files we plan to validate will - # come from GlotPress exports, and will thus always have their keys quoted. - # If support for unquoted strings is added to our validation helper in the future, feel free to update this test example accordingly. - expect { described_class.find_duplicated_keys(file: File.join(test_data_dir, 'ios_l10n_helper', 'expected-merged.strings')) } - .to raise_error(RuntimeError, 'Invalid character `I` found on line 21, col 1') + context 'when there are unquoted keys and values' do + # Unquoted strings — keys and values — are valid `.strings` syntax (the old-style ASCII plist format) + # and are common in `InfoPlist.strings` (e.g. `CFBundleDisplayName = WordPress;`). + it 'parses them alongside quoted keys without raising, finding no duplicates among unique keys' do + # `expected-merged.strings` mixes quoted (`key1`–`key3`) and unquoted (`InfoKey1`–`InfoKey3`) keys. + expect(described_class.find_duplicated_keys(file: File.join(test_data_dir, 'ios_l10n_helper', 'expected-merged.strings'))).to be_empty + end + + it 'detects duplicates among unquoted keys, reporting each occurrence line' do + content = <<~STRINGS + CFBundleName = "WordPress"; + NSCameraUsageDescription = "Take photos"; + CFBundleName = "Jetpack"; + STRINGS + with_tmp_file(named: 'InfoPlist.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('CFBundleName' => [1, 3]) + end + end + + it 'parses unquoted *values* (not just keys) without raising, and finds duplicates among them' do + # `CFBundleName = WordPress;` (both key and value unquoted) is valid ASCII-plist that `plutil` + # parses; the scanner must tokenize it rather than choking on the unquoted value. + content = <<~STRINGS + CFBundleName = WordPress; + CFBundleShortVersionString = 1.0; + CFBundleName = Jetpack; + STRINGS + with_tmp_file(named: 'InfoPlist.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('CFBundleName' => [1, 3]) + end + end + + it 'parses unquoted keys containing `.`, `-`, `_`, `$`, `:`, and `/` (the chars `plutil` allows)' do + content = <<~STRINGS + com.example.app-name_2 = "v"; + a$b:c/d = "v"; + a$b:c/d = "w"; + STRINGS + with_tmp_file(named: 'InfoPlist.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('a$b:c/d' => [2, 3]) + end + end + end + + context 'when comments appear between the tokens of a statement' do + # Comments are valid `.strings` syntax not only on their own line but also *between* the tokens of a + # statement — after a key, around the `=`, or before the terminating `;`. `plutil` accepts all of these, + # so the scanner must tokenize them rather than raising `Invalid character` on the `/`. + it 'parses comments after a key, around the `=`, and before the `;` without raising' do + content = <<~STRINGS + "afterKey" /* note */ = "1"; + "aroundEq" = /* note */ "2"; + "beforeSemicolon" = "3" /* note */; + unquotedKey /* note */ = unquotedValue; + STRINGS + with_tmp_file(named: 'Localizable.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to be_empty + end + end + + it 'still detects duplicate keys in a file that also contains inline comments' do + content = <<~STRINGS + "dup" /* first */ = "1"; + "unique" = "x"; + "dup" = "2" /* second */; + STRINGS + with_tmp_file(named: 'Localizable.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('dup' => [1, 3]) + end + end + + it 'does not mistake a `/`-leading unquoted value for the start of a comment' do + # A `/` right after `=` may begin a comment OR an unquoted value (e.g. a path or URL); the latter, + # which `plutil` accepts, must still parse rather than be swallowed as a comment. + content = <<~STRINGS + "path" = /usr/bin/tool; + "url" = https://example.com/x; + "path" = /opt; + STRINGS + with_tmp_file(named: 'Localizable.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('path' => [1, 3]) + end + end + end + + context 'when a value is a nested dictionary or array' do + # `plutil` accepts container values in the old-style ASCII plist (e.g. `"k" = { a = b; };` or `"k" = ( … );`), + # and they can nest. The tokenizer skips the container body — its inner keys are not top-level keys — while + # still tracking nesting so delimiters hidden inside quoted strings or comments don't end the value early, + # and still detecting duplicate *top-level* keys. + it 'does not treat keys inside a dictionary value as top-level keys' do + content = <<~STRINGS + "k" = { a = b; c = d; }; + "j" = "v"; + STRINGS + with_tmp_file(named: 'InfoPlist.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to be_empty + end + end + + it 'detects duplicate top-level keys whose values are containers' do + content = <<~STRINGS + "k" = { a = b; }; + "k" = ( "x", "y" ); + STRINGS + with_tmp_file(named: 'InfoPlist.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('k' => [1, 2]) + end + end + + it 'tracks nesting through delimiters hidden inside quoted strings and comments' do + # The inner `}` and `;` in the quoted `"c;d}"` and in the `/* } ; */` comment must NOT be read as + # structural — the container ends only at the real closing brace, so the second `"k"` is a duplicate. + content = <<~STRINGS + "k" = { a = { b = "c;d}"; }; /* } ; */ e = f; }; + "k" = 1; + STRINGS + with_tmp_file(named: 'InfoPlist.strings', content: content) do |path| + expect(described_class.find_duplicated_keys(file: path)).to eq('k' => [1, 2]) + end + end + end + + describe '.scan_for_duplicate_keys' do + it 'returns `[:scanned, duplicates]` for a `:text` file, with the duplicate hash (empty if none)' do + with_tmp_file(named: 'dups.strings', content: "\"k\" = \"a\";\n\"k\" = \"b\";\n") do |path| + expect(described_class.scan_for_duplicate_keys(file: path)).to eq([:scanned, { 'k' => [1, 2] }]) + end + with_tmp_file(named: 'unique.strings', content: "\"k\" = \"a\";\n\"j\" = \"b\";\n") do |path| + expect(described_class.scan_for_duplicate_keys(file: path)).to eq([:scanned, {}]) + end + end + + it 'returns `[:unsupported_format, format]` for a non-`:text` (XML) plist, without scanning' do + in_tmp_dir do |dir| + path = File.join(dir, 'x.strings') + Fastlane::Helper::Ios::L10nHelper.generate_strings_file_from_hash(translations: { 'a' => 'b' }, output_path: path) + status, format = described_class.scan_for_duplicate_keys(file: path) + expect(status).to eq(:unsupported_format) + expect(format).to eq(:xml) + end + end + + it 'returns `[:scanned, …]` for a `:text` plist whose values are (now tokenizable) nested dictionaries or arrays' do + with_tmp_file(named: 'nested.strings', content: "\"k\" = { a = b; };\n\"j\" = ( \"x\", \"y\" );\n") do |path| + expect(described_class.scan_for_duplicate_keys(file: path)).to eq([:scanned, {}]) + end + end + + it 'returns `[:unscannable, message]` for a `:text` plist the tokenizer still cannot read (e.g. a `` value)' do + with_tmp_file(named: 'data.strings', content: "\"k\" = <48656c6c6f>;\n") do |path| + status, message = described_class.scan_for_duplicate_keys(file: path) + expect(status).to eq(:unscannable) + expect(message).to match(/Invalid character/) + end end end end