fix: sanitize raw HTML in MDXISH & MDX renderers#1526
Conversation
|
|
||
| // TODO: Skipped about the mdxish engine fails this test since it wraps the <pre> in a <p> tag | ||
| // Rendering looks correct, so skip this for now until we decide if we want to fix this or not | ||
| it.skip.each(renderingEngines)('%s: renders the html in a `<pre>` tag if safeMode={true}', (_label, renderContent) => { |
There was a problem hiding this comment.
This is now resolved
| * load remote resources, or open a foreign-content (MathML/SVG) parsing context | ||
| * that lets `<script>` survive namespace-confusion bypasses. | ||
| */ | ||
| const DANGEROUS_TAG_NAMES = new Set([ |
There was a problem hiding this comment.
From research, it seems that these tags are the main dangerous tags that might expose XSS vulnerability. It's quite extensive.
There was a problem hiding this comment.
If we've got object and applet should we also have embed?
There was a problem hiding this comment.
Yeah good point!
There was a problem hiding this comment.
Wait sorry actually legacy magic block embeds translates to raw embeds, so I don't think we can strip it, at least for mdxish 😢 . Might worth transforming them to our Embed component, but need to ensure the previous rendering logic is retained.
|
|
||
| // PascalCase names are custom React components (e.g. `<Callout>`), not host | ||
| // elements; their `on*`/url-like values are component props, not DOM handlers. | ||
| const isComponentName = (name: string): boolean => /^[A-Z]/.test(name); |
There was a problem hiding this comment.
The main thing to be careful is that we don't sanitise custom MDX components accidentally.
| }, | ||
| ]); | ||
| rehypePlugins.push([rehypeSanitize, sanitizeSchema]); | ||
| } else { |
There was a problem hiding this comment.
Seems like the main MDX path has never sanitise these scripts. They don't get executed during compilation, but later during execution.
kevinports
left a comment
There was a problem hiding this comment.
RE:
Soften iframe/svg? Currently removed wholesale (parity with md), all stripped of tags are in the DANGEROUS_TAG_NAMES list in dangerous-html.ts. We definitely want to strip script & math tags, but for the others I'm not sure if we want to
I'm inclined to allow SVG here because it's a common image format people inline in an html doc. I think we'd still have attribute-level sanitization which should cover our bases there?
It seems like for iframe we could strip them and tell folks to use the Embed block for this right?
| * load remote resources, or open a foreign-content (MathML/SVG) parsing context | ||
| * that lets `<script>` survive namespace-confusion bypasses. | ||
| */ | ||
| const DANGEROUS_TAG_NAMES = new Set([ |
There was a problem hiding this comment.
If we've got object and applet should we also have embed?
|
|
||
| if (isMdxJsxElement(node) && node.attributes) { | ||
| node.attributes = node.attributes.filter(attr => { | ||
| if (attr.type !== 'mdxJsxAttribute' || typeof attr.name !== 'string') return true; // keep `{...spread}` |
There was a problem hiding this comment.
I don't follow this "// keep {...spread}" comment.
Does this mean a spread like {...{onclick: 'alert(1)'} will survive sanitization?
There was a problem hiding this comment.
Yes it will since we don't sanitise expressions yet, this PR is for top level script tags & attribute / properties sanitisation. We can extend it to expressions, but would need to be careful to not accidentally strip legit content in them.
There was a problem hiding this comment.
This plugin factory could probably just live at the bottom of dangerous-html.ts instead of a separate file for this wrapper.
There was a problem hiding this comment.
Should there be some unit tests that cover this function directly?
There was a problem hiding this comment.
Yep good point!
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🔗 Linked repositories identifiedCodeRabbit considers these linked repositories for cross-repo context during reviews:
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR adds a rehype plugin, ChangesRelated issues: None provided. Sequence Diagram(s)sequenceDiagram
participant compile as compile()
participant mdxish as mdxish()
participant plugin as rehypeStripDangerousHtml
participant tree as HAST/MDX tree
compile->>plugin: add to non-md rehypePlugins
mdxish->>plugin: run after rehypeRaw
plugin->>tree: visit nodes
plugin->>tree: remove dangerous elements and attributes
plugin-->>compile: sanitized output
plugin-->>mdxish: sanitized output
Poem Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
__tests__/lib/compile-sanitize.test.tsx (1)
29-32: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winAdd explicit
vbscript:and dangerousdata:payloads here.The current fixture only feeds
javascript:intoexecute(), so a regression on the other blocked schemes would still pass. Add concrete inputs for those schemes to lock in the full URL-sanitization contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/lib/compile-sanitize.test.tsx` around lines 29 - 32, The compile-sanitize test only exercises the javascript: case, so it does not protect the full URL-sanitization contract. Update the test in compile-sanitize.test.tsx around the execute() fixture to include explicit vbscript: and dangerous data: payload inputs alongside the existing javascript: case, and keep the href assertions in the same test so it verifies all blocked schemes remain non-executable.__tests__/processor/plugin/dangerous-html.test.ts (1)
124-134: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winAdd a regression for expression-valued URL attributes.
The MDX JSX tests only cover string URL values. Add a host-element case where
hreforformActionis an expression node resolving to a dangerous scheme, matching the sanitizer gap above.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@__tests__/processor/plugin/dangerous-html.test.ts` around lines 124 - 134, The MDX JSX dangerous-attribute tests only cover string-valued URL props, so add a regression in dangerous-html.test.ts for host JSX elements whose href or formAction is an expression node resolving to a javascript: (or other dangerous) URL. Update the MDX JSX section alongside the existing jsx() / stripDangerousHtml(root(node)) case to assert that expression-valued URL attributes are also removed while safe attributes like id remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@__tests__/lib/mdxish/sanitize-raw-html.test.ts`:
- Around line 109-118: The test around mdxish sanitization is over-preserving
dangerous URL props on PascalCase components. Update the behavior in the mdxish
sanitize path and the corresponding expectation in sanitize-raw-html.test.ts so
custom components like TestComponent still keep safe props such as onClick but
do not retain javascript: values on href/src-style URL props unless they are
explicitly allowlisted. Use the existing mdxish and findElementByTagName test
setup to verify the dangerous prop is stripped or blocked instead of locked in.
In `@lib/compile.ts`:
- Around line 83-87: The `compile` path currently lets `...opts` override
`rehypePlugins`, which can drop `rehypeStripDangerousHtml` from the MDX
pipeline. Update the `compile` function so caller options are merged before the
explicit plugin list, or otherwise ensure user-supplied `rehypePlugins` are
appended while `rehypeStripDangerousHtml` is always added last. Use the
`compile` function and the `rehypePlugins`/`rehypeStripDangerousHtml` symbols to
locate the fix.
In `@processor/plugin/dangerous-html.ts`:
- Around line 129-134: The MDX attribute sanitizer in dangerous-html.ts only
rejects dangerous URL values when they are strings, so host elements can still
receive non-string URL props like href expressions. Update the filter inside the
node.attributes processing to treat any URL-bearing attribute on host MDX
elements as unsafe unless it is a clearly safe string literal, using the
existing helpers isUrlAttribute and isDangerousUrl in the attr.name/attr.value
check.
---
Nitpick comments:
In `@__tests__/lib/compile-sanitize.test.tsx`:
- Around line 29-32: The compile-sanitize test only exercises the javascript:
case, so it does not protect the full URL-sanitization contract. Update the test
in compile-sanitize.test.tsx around the execute() fixture to include explicit
vbscript: and dangerous data: payload inputs alongside the existing javascript:
case, and keep the href assertions in the same test so it verifies all blocked
schemes remain non-executable.
In `@__tests__/processor/plugin/dangerous-html.test.ts`:
- Around line 124-134: The MDX JSX dangerous-attribute tests only cover
string-valued URL props, so add a regression in dangerous-html.test.ts for host
JSX elements whose href or formAction is an expression node resolving to a
javascript: (or other dangerous) URL. Update the MDX JSX section alongside the
existing jsx() / stripDangerousHtml(root(node)) case to assert that
expression-valued URL attributes are also removed while safe attributes like id
remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 73d07070-afbb-4a74-8810-65b56e65c59c
📒 Files selected for processing (8)
__tests__/components/HTMLBlock.test.tsx__tests__/lib/compile-sanitize.test.tsx__tests__/lib/mdxish/sanitize-raw-html.test.ts__tests__/processor/plugin/dangerous-html.test.tslib/compile.tslib/mdxish.tspackage.jsonprocessor/plugin/dangerous-html.ts
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
readmeio/ai(manual)readmeio/gitto(manual)readmeio/markdown(manual)readmeio/readme(manual) → reviewed against branchdimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathmlinstead of the default branch
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/compile.ts`:
- Around line 70-75: The remark plugin merge order in compile() is wrong because
userRemarkPlugins are added before tailwindTransformer, allowing caller plugins
to run on the pre-Tailwind AST. Update the plugin assembly in lib/compile.ts so
tailwindTransformer is inserted before appending userRemarkPlugins, keeping the
intended order in the remarkPlugins array.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ffde9a6f-f2b9-4853-ac63-b6a29a8a2476
📒 Files selected for processing (5)
__tests__/lib/compile-sanitize.test.tsx__tests__/lib/mdxish/sanitize-raw-html.test.ts__tests__/processor/plugin/dangerous-html.test.tslib/compile.tsprocessor/plugin/dangerous-html.ts
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
readmeio/ai(manual)readmeio/gitto(manual)readmeio/markdown(manual)readmeio/readme(manual) → reviewed against branchdimas/rm-17024-stored-xss-in-hub-docs-renderer-via-mathmlinstead of the default branch
✅ Files skipped from review due to trivial changes (1)
- tests/lib/mdxish/sanitize-raw-html.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/lib/compile-sanitize.test.tsx
- tests/processor/plugin/dangerous-html.test.ts
- processor/plugin/dangerous-html.ts
🎯 What does this PR do?
Context
The was a reported vulnerability where a stored XSS in the hub docs renderer lets a non-owner put raw HTML in a guide body that runs JS in any viewer's session (incl. the owner's). From there an attacker reads same-origin endpoints (
…/apikeys,…/projects/me) and exfiltrates the live API key +custom_login.jwt_secret→ project takeover. Reported payload:<math><mtext><script>…</script></mtext></math>.Root cause: raw HTML was never sanitized on the new engines:
rehypeRaw(raw HTML → real elements) with no sanitizer after it.compileonly sanitizes whenformat === 'md'; main docs render in the default MDX format, which was not sanitized.mdpath was already protected byrehype-sanitize.Correction to the report: it assumed bare
<script>was already stripped and only MathML bypassed it. On mdxish & MDX (without nonmdformat) nothing is stripped, bare<script>,<img onerror>, andjavascript:all execute; the MathML wrapper isn't required (verified against the pipeline). So a MathML-only patch would've been insufficient.Fix
Sanitize the rendered content in the AST level. Created an AST-level stripper shared by both the mdxish and MDX
compilepipelines:on*handlers andjavascript:/vbscript:/executable-data:URLs (incl. whitespace/control-char obfuscation).element(mdxish/md) and MDX JSX (mdxJsxFlowElement/mdxJsxTextElement).on*/url values are React props) but recurses through them so nested raw HTML is still cleaned.Considerations & decisions
hast-util-sanitize(themdpath) is allowlist-based and would strip every custom component + prop — which is why it'smd-only. mdxish/MDX trees have first-class component nodes, so a denylist that preserves them is the fit. Trade-off: a denylist can miss a novel vector (see CSP below).jsdomon the server render path.Note
This sanitisation is not applied to HTMLBlocks since it's for more deliberate HTML execution. For MDXISH & MDX, by default scripts are stripped anyway
What no longer works (intended)
This affects ffects mdxish/MDX docs with hand-authored raw HTML (all already blocked on
md/legacy): inline<style>, raw<iframe>(use the Embed component), inline<svg>/<math>,<object>/<link>/<meta>/<base>, inlineon*handlers,javascript:URLs. If an MDXISH or MDX uses those HTML in their doc, it will no longer work!To discuss
iframe/svg? Currently removed wholesale (parity withmd), all stripped of tags are in theDANGEROUS_TAG_NAMESlist indangerous-html.ts. We definitely want to strip script & math tags, but for the others I'm not sure if we want todangerouslySetInnerHTMLsinks (HTMLBlock)🧪 QA tips
End-to-end in the readme app
Locally link this branch to readme
alert, noapikeys/network request, no frames/SVG, page styling unaffected; the control block (bold/links/code/div/callout) renders.md): same content (note the<style>{ }block hits a pre-existing MDX compile error from CSS braces — use the@importvariant, or test<style>alone) → same result.<Callout>/<Tabs>/<Image>/<Embed>/[block:embed], table alignment, inlinestyle=/class=, normal markdown all render unchanged.Sample doc content
After the fix every block above is inert; only the bold / link / code / div / callout controls render.
How to Replicate the Vulnerability in the report:
This is what it would look like in the app:
Screen.Recording.2026-06-26.at.10.44.54.pm.mov
📸 Screenshot or Loom
Before
In this demo, notice how in MDXISH & MDX, scripts, iframes, and svg, get exectued on View
Screen.Recording.2026-06-26.at.8.07.06.pm.mov
After
None of those top level tags execute
Screen.Recording.2026-06-26.at.8.08.11.pm.mov