Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,15 @@ _None_

### Bug Fixes

_None_
- `StringsFileValidationHelper.find_duplicated_keys` now parses *unquoted* keys and values (valid `.strings` syntax, common in `InfoPlist.strings`) instead of raising `Invalid character`, matching the character set `plutil` accepts for unquoted strings (alphanumerics plus `_ . - $ : /`). This lets `ios_lint_localizations`' `check_duplicate_keys` work on `InfoPlist.strings`-style files. [#741]
- `StringsFileValidationHelper.find_duplicated_keys` now also accepts comments placed *between* the tokens of a statement (e.g. `"key" /* note */ = "value";`), which `plutil` allows, instead of raising `Invalid character` on the `/`. [#741]
- `ios_lint_localizations`' `check_duplicate_keys` no longer crashes the lane on a file that parses as a property list but isn't a flat `.strings` (e.g. a nested-dictionary value); it now warns via `UI.important` and skips that file. [#741]
- `L10nHelper.merge_strings` now applies the key prefix to unquoted keys containing `. - $ : /` and to lines with an *unquoted* value (e.g. `CFBundleName = WordPress;`), matching the unquoted-string grammar `plutil` accepts. Previously those keys were written to the merged file without the prefix while still being bookkept *with* it, leaving the output inconsistent with the reported keys (which could resurface the very collisions the prefix avoids and break downstream key extraction). [#741]

### Internal Changes

_None_
- Centralized `.strings` duplicate-key detection behind `StringsFileValidationHelper.scan_for_duplicate_keys`, which detects the file format and returns a `[:scanned | :unsupported_format | :unscannable, payload]` tri-state that callers map to their own policy (warn-and-skip vs fail-closed). `ios_lint_localizations` now uses it. [#741]
- `L10nHelper.strings_file_type` (and `StringsFileValidationHelper.scan_for_duplicate_keys`) accept `assume_valid:` to skip a redundant `plutil -lint` when the caller has already parsed the file. [#741]

## 14.7.0

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,15 +56,20 @@ def self.find_duplicated_keys(params)
language = File.basename(File.dirname(file), '.lproj')
path = File.join(params[:input_dir], file)

file_type = Fastlane::Helper::Ios::L10nHelper.strings_file_type(path: path)
if file_type == :text
duplicates = Fastlane::Helper::Ios::StringsFileValidationHelper.find_duplicated_keys(file: path)
duplicate_keys[language] = duplicates.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless duplicates.empty?
else
status, payload = Fastlane::Helper::Ios::StringsFileValidationHelper.scan_for_duplicate_keys(file: path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice appraoch.

case status
when :scanned
duplicate_keys[language] = payload.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless payload.empty?
when :unsupported_format
UI.important <<~WRONG_FORMAT
File `#{path}` is in #{file_type} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.
Since your files are in #{file_type} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call.
File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.
Comment thread
jkmassel marked this conversation as resolved.
Outdated
Since your files are in #{payload} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call.
WRONG_FORMAT
when :unscannable
UI.important <<~UNSCANNABLE
Could not check `#{path}` for duplicate keys: #{payload.strip}
The file parses as a property list but isn't a flat `.strings` file the duplicate-key scanner understands, so duplicate detection was skipped for it.
UNSCANNABLE
end
end

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,24 @@ class L10nHelper
# Returns the type of a `.strings` file (XML, binary or ASCII)
#
# @param [String] path The path to the `.strings` file to check
# @param [Boolean] assume_valid Skip the `plutil -lint` validity check when the caller has already
# confirmed the file parses (e.g. via `read_strings_file_as_hash`), avoiding a redundant
# `plutil` invocation. Only the format detection (`file`) then runs.
# @return [Symbol] The file format used by the `.strings` file. Can be one of:
# - `:text` for the ASCII-plist file format (containing typical `"key" = "value";` lines)
# - `:xml` for XML plist file format (can be used if machine-generated, especially since there's no official way/tool to generate the ASCII-plist file format as output)
# - `:binary` for binary plist file format (usually only true for `.strings` files converted by Xcode at compile time and included in the final `.app`/`.ipa`)
# - `nil` if the file does not exist or is neither of those format (e.g. not a `.strings` file at all)
#
def self.strings_file_type(path:)
def self.strings_file_type(path:, assume_valid: false)
return :text if File.empty?(path) # If completely empty file, consider it as a valid `.strings` files in textual format

# Start by checking it seems like a valid property-list file (and not e.g. an image or plain text file)
_, status = Open3.capture2('/usr/bin/plutil', '-lint', path)
return nil unless status.success?
# Start by checking it seems like a valid property-list file (and not e.g. an image or plain text file).
# A caller that has already parsed the file can skip this redundant check via `assume_valid: true`.
unless assume_valid
_, status = Open3.capture2('/usr/bin/plutil', '-lint', path)
return nil unless status.success?
end

# If it is a valid property-list file, determine the actual format used
format_desc, status = Open3.capture2('/usr/bin/file', path)
Expand Down Expand Up @@ -96,9 +102,17 @@ def self.merge_strings(paths:, output_path:)
# Read line-by-line to reduce memory footprint during content copy
read_utf8_lines(input_file).each do |line|
unless prefix.nil? || prefix.empty?
# The `/u` modifier on the RegExps is to make them UTF-8
# The `/u` modifier on the RegExps is to make them UTF-8.
# The unquoted-key character set below matches what `plutil` accepts (alphanumerics plus
# `_ . - $ : /`, as in `StringsFileValidationHelper::UNQUOTED_STRING_CHARACTER`) so that keys
# containing `. - $ : /` (e.g. reverse-DNS keys) are prefixed like any other.
line.gsub!(/^(\s*")/u, "\\1#{prefix}") # Lines starting with a quote are considered to be start of a key; add prefix right after the quote
line.gsub!(/^(\s*)([A-Z0-9_]+)(\s*=\s*")/ui, "\\1\"#{prefix}\\2\"\\3") # Lines starting with an identifier followed by a '=' are considered to be an unquoted key (typical in InfoPlist.strings files for example)
# Lines starting with an unquoted key followed by a `=` and a *quoted* value (typical in `InfoPlist.strings`).
line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*")}u, "\\1\"#{prefix}\\2\"\\3")
# Lines starting with an unquoted key followed by a `=` and an *unquoted* value (e.g. `CFBundleName = WordPress;`).
# The value is required to be a single unquoted token ending in `;` — an unquoted string can't contain spaces, so
# this won't mistake comment prose like `and = a red herring for when…` (spaces in the "value") for a key/value pair.
line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*)([a-zA-Z0-9_.$:/-]+\s*;)}u, "\\1\"#{prefix}\\2\"\\3\\4")
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My AI noticed that merge_strings doesn't handle inter-token comments.

Could this logic handle the same inter-token comment placements that StringsFileValidationHelper now accepts? (https://github.com/wordpress-mobile/release-toolkit/pull/741/changes#diff-6e464dfe8ccd3974026649f8aaa02f9f8518261ef391f13bdb9535ef46d2b908R111-R116)

For example, CFBundleName /* c */ = WordPress; is accepted by plutil and by the duplicate-key scanner after this PR, but merge_strings still writes it unprefixed while bookkeeping it as prefixed.
That can reintroduce collisions in the merged file.

I added failing specs in #742 to pin the gap.

AI suggests: A tokenizer-backed rewrite would be ideal, but a comment-aware interim rewrite would also close the regression.

tmp_file.write(line)
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,83 @@ module Helper
module Ios
class StringsFileValidationHelper
# context can be one of:
# :root, :maybe_comment_start, :in_line_comment, :in_block_comment,
# :maybe_block_comment_end, :in_quoted_key,
# :after_quoted_key_before_eq, :after_quoted_key_and_equal,
# :in_quoted_value, :after_quoted_value
State = Struct.new(:context, :buffer, :in_escaped_ctx, :found_key)
# :root, :maybe_comment_start, :maybe_comment_or_value, :in_line_comment, :in_block_comment,
# :maybe_block_comment_end, :in_quoted_key, :in_unquoted_key,
# :after_quoted_key_before_eq, :after_quoted_key_and_eq,
# :in_quoted_value, :in_unquoted_value, :after_quoted_value
#
# `resume_context` holds the context to return to once a comment ends. Comments are valid not only at
# the top level but also *between* the tokens of a statement (e.g. `"key" /* note */ = "value";`), so a
# comment must resume the state it interrupted rather than always dropping back to `:root`.
State = Struct.new(:context, :buffer, :in_escaped_ctx, :found_key, :resume_context)

# Characters allowed in an *unquoted* string — a key or a value. Unquoted strings are valid
# `.strings` syntax (the old-style ASCII property-list format) and are common in `InfoPlist.strings`
# (e.g. `CFBundleName = WordPress;`). `plutil` accepts alphanumerics plus `_ . - $ : /` in an
# unquoted string, so we match the same set: this scanner only ever runs on input `plutil` has
# already accepted, and matching its grammar keeps a file it parses from tripping the scanner.
# An unquoted key runs until the first whitespace or `=`; an unquoted value until whitespace or `;`.
UNQUOTED_STRING_CHARACTER = %r{[a-zA-Z0-9_.$:/-]}u

# Enter a comment from an inter-token position, remembering where to resume once it ends.
# (`state.context` is still the originating state when a transition lambda runs — it's only reassigned
# to the lambda's return value afterwards — so this captures the state the comment interrupts.)
ENTER_COMMENT = lambda do |state, _c|
state.resume_context = state.context
:maybe_comment_start
end

# A `/` between `=` and the value is ambiguous: it can start a comment (`/* … */` or `// …`) or be the
# first character of an unquoted value (e.g. a path like `/usr/bin`). Defer the decision by one character
# via `:maybe_comment_or_value`.
ENTER_COMMENT_OR_VALUE = lambda do |state, _c|
state.resume_context = state.context
:maybe_comment_or_value
end

# Restore the context a comment interrupted (defaults to `:root`), then clear the saved context.
RESUME_AFTER_COMMENT = lambda do |state, _c|
resume = state.resume_context || :root
state.resume_context = :root
resume
end

TRANSITIONS = {
root: {
/\s/u => :root,
'/' => :maybe_comment_start,
'"' => :in_quoted_key
'"' => :in_quoted_key,
# An unquoted key, e.g. `CFBundleName = "…";` as used by `InfoPlist.strings`.
UNQUOTED_STRING_CHARACTER => lambda do |state, c|
state.buffer.write(c)
:in_unquoted_key
end
},
maybe_comment_start: {
'/' => :in_line_comment,
/\*/u => :in_block_comment
},
# Reached only from `:after_quoted_key_and_eq`, where a leading `/` might begin a comment or an
# unquoted value. A `*` or `/` confirms a comment; anything else means the `/` was the value's first
# character (values aren't buffered, so we just continue scanning the value).
maybe_comment_or_value: {
/\*/u => :in_block_comment,
'/' => :in_line_comment,
/./mu => lambda do |state, _c|
state.resume_context = :root
:in_unquoted_value
end
},
in_line_comment: {
"\n" => :root,
"\n" => RESUME_AFTER_COMMENT,
/./u => :in_line_comment
},
in_block_comment: {
/\*/ => :maybe_block_comment_end,
/./mu => :in_block_comment
},
maybe_block_comment_end: {
'/' => :root,
'/' => RESUME_AFTER_COMMENT,
/./mu => :in_block_comment
},
in_quoted_key: {
Expand All @@ -44,19 +95,48 @@ class StringsFileValidationHelper
:in_quoted_key
end
},
in_unquoted_key: {
# The key ends at the first whitespace or `=`. Whitespace still expects an `=` next;
# an `=` moves straight on to the value.
/[\s=]/u => lambda do |state, c|
state.found_key = state.buffer.string.dup
state.buffer = StringIO.new
c == '=' ? :after_quoted_key_and_eq : :after_quoted_key_before_eq
end,
UNQUOTED_STRING_CHARACTER => lambda do |state, c|
state.buffer.write(c)
:in_unquoted_key
end
},
after_quoted_key_before_eq: {
# A comment may sit between the key and the `=` (e.g. `"key" /* note */ = "value";`).
'/' => ENTER_COMMENT,
/\s/u => :after_quoted_key_before_eq,
'=' => :after_quoted_key_and_eq
},
after_quoted_key_and_eq: {
# A `/` here may start a comment or an unquoted value (which can contain `/`); disambiguate one char
# later. This entry must precede `UNQUOTED_STRING_CHARACTER` below, which also matches `/`.
'/' => ENTER_COMMENT_OR_VALUE,
/\s/u => :after_quoted_key_and_eq,
'"' => :in_quoted_value
'"' => :in_quoted_value,
# An unquoted value, e.g. `CFBundleName = WordPress;` as used by `InfoPlist.strings`.
UNQUOTED_STRING_CHARACTER => :in_unquoted_value
},
in_quoted_value: {
'"' => :after_quoted_value,
/./mu => :in_quoted_value
},
in_unquoted_value: {
# The value ends at the first whitespace or the terminating `;`. Its contents are irrelevant
# to duplicate-key detection, so — unlike a key — we don't buffer it.
';' => :root,
/\s/u => :after_quoted_value,
UNQUOTED_STRING_CHARACTER => :in_unquoted_value
},
after_quoted_value: {
# A comment may sit between the value and the terminating `;` (e.g. `"key" = "value" /* note */;`).
'/' => ENTER_COMMENT,
/\s/u => :after_quoted_value,
';' => :root
}
Expand All @@ -70,7 +150,7 @@ class StringsFileValidationHelper
def self.find_duplicated_keys(file:)
keys_with_lines = Hash.new { |h, k| h[k] = [] }

state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil)
state = State.new(context: :root, buffer: StringIO.new, in_escaped_ctx: false, found_key: nil, resume_context: :root)

# Using our `each_utf8_line` helper instead of `File.readlines` ensures we can also read files that are
# encoded in UTF-16, yet process each of their lines as a UTF-8 string, so that `RegExp#match?` don't throw
Expand Down Expand Up @@ -106,6 +186,28 @@ def self.find_duplicated_keys(file:)

keys_with_lines.keep_if { |_, lines| lines.count > 1 }
end

# Detects the file format and, when applicable, scans for duplicate keys — in one step, so
# callers don't each re-implement the "`:text`-only" gate. `find_duplicated_keys` only
# understands the flat ASCII-plist syntax; an xml/binary plist can't be tokenized by it (though
# `plutil` collapses any duplicate to its last value when parsing those anyway).
#
# @param [String] file The path to the `.strings` file to inspect.
# @param [Boolean] assume_valid Forwarded to `strings_file_type`: skip the redundant `plutil -lint`
# when the caller has already confirmed the file parses.
# @return [Array] A `[status, payload]` pair, one of:
# - `[:scanned, { key => [lines] }]` — a `:text` file we tokenized (hash empty if none).
# - `[:unsupported_format, format]` — not a `:text` file (`:xml`, `:binary`, or `nil`); not scanned.
# - `[:unscannable, error_message]` — a `:text` file the tokenizer couldn't read.
# Each caller decides how to react (warn-and-skip, fail closed, …) from this one source of truth.
def self.scan_for_duplicate_keys(file:, assume_valid: false)
format = Fastlane::Helper::Ios::L10nHelper.strings_file_type(path: file, assume_valid: assume_valid)
return [:unsupported_format, format] unless format == :text

[:scanned, find_duplicated_keys(file: file)]
rescue StandardError => e
[:unscannable, e.message]
end
end
end
end
Expand Down
30 changes: 30 additions & 0 deletions spec/ios_l10n_helper_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ def file_encoding(path)
expect(File).to exist(invalid_fixture)
expect(described_class.strings_file_type(path: invalid_fixture)).to be_nil
end

it 'skips the redundant `plutil -lint` check when `assume_valid: true`, still detecting the format' do
text_fixture = fixture('Localizable-utf16.strings')
allow(Open3).to receive(:capture2).and_call_original
expect(Open3).not_to receive(:capture2).with('/usr/bin/plutil', '-lint', anything)

expect(described_class.strings_file_type(path: text_fixture, assume_valid: true)).to eq(:text)
end
end

describe '#merge_strings' do
Expand Down Expand Up @@ -92,6 +100,28 @@ def file_encoding(path)
end
end

it 'prefixes unquoted keys with unquoted values and keys containing `. - $ : /`' do
# Regression: prefixing must cover the full unquoted-string grammar `plutil` accepts — unquoted *values*
# (e.g. `CFBundleName = WordPress;`) and keys containing `. - $ : /` — so the written file stays consistent
# with the prefixed keys we report. (Previously only `[A-Z0-9_]` keys with a *quoted* value were prefixed,
# leaving these written without the prefix and resurfacing the very collisions the prefix avoids.)
content = <<~STRINGS
CFBundleName = WordPress;
com.automattic.app-id = "X";
"QuotedKey" = "Y";
STRINGS
Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir|
input_file = File.join(tmp_dir, 'InfoPlist.strings')
File.write(input_file, content)
output_file = File.join(tmp_dir, 'output.strings')
described_class.merge_strings(paths: { input_file => 'pfx.' }, output_path: output_file)
merged = File.read(output_file)
expect(merged).to include('"pfx.CFBundleName" = WordPress;')
expect(merged).to include('"pfx.com.automattic.app-id" = "X";')
expect(merged).to include('"pfx.QuotedKey" = "Y";')
end
end

it 'returns duplicate keys found' do
paths = { fixture('Localizable-utf16.strings') => nil, fixture('non-latin-utf16.strings') => nil }
Dir.mktmpdir('a8c-release-toolkit-l10n-helper-tests-') do |tmp_dir|
Expand Down
Loading