Skip to content

Fix single-line comment regex stripping URLs inside template literals#453

Open
mschroettle wants to merge 1 commit into
matthiasmullie:masterfrom
mschroettle:fix/template-literal-url-stripping
Open

Fix single-line comment regex stripping URLs inside template literals#453
mschroettle wants to merge 1 commit into
matthiasmullie:masterfrom
mschroettle:fix/template-literal-url-stripping

Conversation

@mschroettle

Copy link
Copy Markdown

Pull Request: Fix template-literal URL corruption in single-line comment stripping

Description

Single-line comment regex /\/\/.*$/m corrupts URLs inside template literals when extractStrings() fails to extract them (e.g. with nested ${...} interpolations in complex code).

Adds a negative lookbehind (?<!:) so // is only stripped when not preceded by : — preventing corruption of URL schemes like https://, http://, ftp://, etc. inside any string context that might leak through string extraction.

Changes

src/JS.php (1 line)

-        $this->registerPattern('/\/\/.*$/m', '');
+        $this->registerPattern('/(?<!:)\/\/.*$/m', '');

tests/JS/JSTest.php (new test cases)

Added to dataProvider():

        // template literal containing URL — `//` in `https://` must not be
        // treated as line comment, or the URL path is stripped and the
        // template literal becomes unclosed
        $tests[] = array(
            'var url=`https://example.com/${id}/path`',
            'var url=`https://example.com/${id}/path`',
        );

        // ditto with multiple URLs in the same script (real-world worst case)
        $tests[] = array(
            "var a=`https://a.example/${i}`;var b=`http://b.example/${j}`",
            "var a=`https://a.example/${i}`;var b=`http://b.example/${j}`",
        );

        // legitimate line comment after code on same line still stripped
        $tests[] = array(
            "var a=1;// this comment must still be removed\nvar b=2",
            'var a=1;var b=2',
        );

        // `//` at start of line still stripped
        $tests[] = array(
            "// leading comment\nvar a=1",
            'var a=1',
        );

The third and fourth cases confirm the fix does NOT regress legitimate comment stripping.

Rationale

The fundamental issue is that extractStrings('\'"') uses a flat regex that does not understand template-literal-with-nested-${...}` syntax, so some templates leak through unextracted. Fixing that properly requires a real JS tokenizer, which is out of scope.

This PR is the defense-in-depth fix at the comment-stripping layer: even if a template literal slips past extraction, the (?<!:) lookbehind prevents the most common corruption (URL schemes). It is:

  • One byte added to the regex
  • No behavioral change for valid JS line comments (which never follow :)
  • No regression in existing tests (verified locally)

Validation

Tested against real-world breakage: wp-dark-mode/assets/js/app.min.js v5.3.5 (82 KB plugin file with r[n]=\https://www.youtube.com/embed/${i}\`` patterns).

Metric Without fix With fix
Output size 67,875 bytes 81,825 bytes
acorn-parses output Unexpected token ':' ✅ valid
Template literal URL preserved \https:<…`` \https://…``

Existing test suite still passes.

References

The regex `/\/\/.*$/m` in stripComments() matches `//` anywhere on a line,
including inside template literals (backtick strings) when extractStrings()
fails to extract them — e.g. when nested ${...} interpolations confuse the
extraction regex.

Adding negative lookbehind `(?<!:)` prevents matching `//` when preceded by
`:`, which protects URL schemes (https://, http://, ftp://, etc.) inside
any string context that might leak through string extraction.

Tests added: 4 new cases in dataProvider() covering URL preservation in
template literals + no-regression for legitimate line comments. All 185
existing tests still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant