Improve Model-Based Guardrails in LLM Prompt Injection Prevention by crony-io · Pull Request #2249 · OWASP/CheatSheetSeries

crony-io · 2026-06-23T16:04:59Z

This changes the reference in the Model-Based Guardrails section to the idea that the dual-LLM pattern, described by Simon Willison is the strongest architectural form of building AI assistants that can resist prompt injection. As said by Willison himself in his 11th April 2025 update, there is a potential flaw in his Dual LLM proposal, but CaMeL (released by Google DeepMind in this paper and this GitHub repo), proposes a new direction for mitigating prompt injection attacks.

🚩 If your PR is related to grammar/typo mistakes, please double-check the file for other mistakes in order to fix all the issues in the current cheat sheet.

Please make sure that for your contribution:

In case of a new Cheat Sheet, you have used the Cheat Sheet template.
All the markdown files do not raise any validation policy violation, see the policy.
All the markdown files follow these format rules.
All your assets are stored in the assets folder.
All the images used are in the PNG format.
Any references to websites have been formatted as [TEXT](URL)
You verified/tested the effectiveness of your contribution (e.g., the defensive code proposed is really an effective remediation? Please verify it works!).
The CI build of your PR pass, see the build status here.

Scope and sourcing (required)

This PR is focused: it modifies a single cheat sheet, or a small coordinated set, and the scope is described in the PR body.
Every technical claim, recommendation, or threat assertion added in this PR is supported by a primary source (RFC, NIST, OWASP standard, vendor documentation, peer-reviewed research) linked inline as [text](URL).
I have read each source I cite and confirm it actually supports the claim. I have not relied on summaries, hearsay, or model-generated citations.

AI Tool Usage Disclosure (required for all PRs)

Please select exactly one of the following options. PRs that leave this section blank will be closed.

I have NOT used any AI tool to generate the contents of this PR.
I have used AI tools to generate the contents of this PR. I have verified
the contents and I affirm the results. The LLM used is [llm name and version]
and the prompt used is [your prompt here]. I have independently verified every citation and technical claim against the cited source. [Feel free to add more details if needed]

jmanico · 2026-06-23T20:36:10Z

This is promising but purely theoretical.

Very few production AI systems implement this today.

It requires:

Data classification
Policy engine
Metadata propagation
Enforcement layer

Most companies aren’t doing this yet, or are even close to this.

crony-io · 2026-06-24T17:41:55Z

Hi @jmanico, i completely agree with you, CaMeL is indeed bleeding edge, and the architectural lift required is far beyond what most companies are currently implementing.
My primary motivation for this PR is that the current cheat sheet explicitly names Simon Willison's Dual LLM pattern as the "strongest architectural form." However, Willison updated that exact article to point out a flaw in his design (it remains vulnerable to data-flow manipulation) and cites CaMeL as the necessary architectural evolution to close that gap.

randomstuff · 2026-06-26T09:41:14Z

CaMeL is indeed bleeding edge, and the architectural lift required is far beyond what most companies are currently implementing.

In that case, maybe just adding a reference/further reading link would be better?

Further reading:

CaMeL (CApabilities for MachinE Learning), described by Google DeepMind.

Disclaimer: I have not looked at the paper at all 😄

crony-io · 2026-06-26T14:44:02Z

Sure, maybe i can remove all the description about CaMeL, keep the original text and just add something like this after?

The strongest architectural form of this idea is the dual-LLM pattern, described by Simon Willison. A privileged LLM holds the tools but never reads untrusted content directly. A quarantined LLM reads untrusted content but cannot take action. The privileged model receives only structured summaries or labels from the quarantined one, which breaks the path that injected instructions need to reach the actor.

A new architectural form of this idea is CaMeL (CApabilities for MachinE Learning). It improves upon the original Dual-LLM pattern to prevent injected data from manipulating tool arguments, as said by Willison in his blog CaMeL offers a promising new direction for mitigating prompt injection attacks
Further reading: Defeating Prompt Injections by Design from Google DeepMind and their GitHub repo for examples.

Update LLM_Prompt_Injection_Prevention_Cheat_Sheet.md

2c3ff99

Fix markdown errors

d98c6c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Model-Based Guardrails in LLM Prompt Injection Prevention#2249

Improve Model-Based Guardrails in LLM Prompt Injection Prevention#2249
crony-io wants to merge 2 commits into
OWASP:masterfrom
crony-io:master

crony-io commented Jun 23, 2026

Uh oh!

jmanico commented Jun 23, 2026

Uh oh!

crony-io commented Jun 24, 2026

Uh oh!

randomstuff commented Jun 26, 2026 •

edited

Loading

Uh oh!

crony-io commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

crony-io commented Jun 23, 2026

Scope and sourcing (required)

AI Tool Usage Disclosure (required for all PRs)

Uh oh!

jmanico commented Jun 23, 2026

Uh oh!

crony-io commented Jun 24, 2026

Uh oh!

randomstuff commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crony-io commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

randomstuff commented Jun 26, 2026 •

edited

Loading