Fix single-line comment regex stripping URLs inside template literals#453
Open
mschroettle wants to merge 1 commit into
Open
Fix single-line comment regex stripping URLs inside template literals#453mschroettle wants to merge 1 commit into
mschroettle wants to merge 1 commit into
Conversation
The regex `/\/\/.*$/m` in stripComments() matches `//` anywhere on a line,
including inside template literals (backtick strings) when extractStrings()
fails to extract them — e.g. when nested ${...} interpolations confuse the
extraction regex.
Adding negative lookbehind `(?<!:)` prevents matching `//` when preceded by
`:`, which protects URL schemes (https://, http://, ftp://, etc.) inside
any string context that might leak through string extraction.
Tests added: 4 new cases in dataProvider() covering URL preservation in
template literals + no-regression for legitimate line comments. All 185
existing tests still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request: Fix template-literal URL corruption in single-line comment stripping
Description
Single-line comment regex
/\/\/.*$/mcorrupts URLs inside template literals whenextractStrings()fails to extract them (e.g. with nested${...}interpolations in complex code).Adds a negative lookbehind
(?<!:)so//is only stripped when not preceded by:— preventing corruption of URL schemes likehttps://,http://,ftp://, etc. inside any string context that might leak through string extraction.Changes
src/JS.php(1 line)tests/JS/JSTest.php(new test cases)Added to
dataProvider():The third and fourth cases confirm the fix does NOT regress legitimate comment stripping.
Rationale
The fundamental issue is that
extractStrings('\'"')uses a flat regex that does not understand template-literal-with-nested-${...}` syntax, so some templates leak through unextracted. Fixing that properly requires a real JS tokenizer, which is out of scope.This PR is the defense-in-depth fix at the comment-stripping layer: even if a template literal slips past extraction, the
(?<!:)lookbehind prevents the most common corruption (URL schemes). It is::)Validation
Tested against real-world breakage:
wp-dark-mode/assets/js/app.min.jsv5.3.5 (82 KB plugin file withr[n]=\https://www.youtube.com/embed/${i}\`` patterns).acorn-parses outputUnexpected token ':'\https:<…``\https://…``Existing test suite still passes.
References