Improve Model-Based Guardrails in LLM Prompt Injection Prevention#2249
Improve Model-Based Guardrails in LLM Prompt Injection Prevention#2249crony-io wants to merge 2 commits into
Conversation
|
This is promising but purely theoretical. Very few production AI systems implement this today. It requires:
Most companies aren’t doing this yet, or are even close to this. |
|
Hi @jmanico, i completely agree with you, CaMeL is indeed bleeding edge, and the architectural lift required is far beyond what most companies are currently implementing. |
In that case, maybe just adding a reference/further reading link would be better?
Disclaimer: I have not looked at the paper at all 😄 |
|
Sure, maybe i can remove all the description about CaMeL, keep the original text and just add something like this after?
|
This changes the reference in the Model-Based Guardrails section to the idea that the dual-LLM pattern, described by Simon Willison is the strongest architectural form of building AI assistants that can resist prompt injection. As said by Willison himself in his 11th April 2025 update, there is a potential flaw in his Dual LLM proposal, but CaMeL (released by Google DeepMind in this paper and this GitHub repo), proposes a new direction for mitigating prompt injection attacks.
Please make sure that for your contribution:
[TEXT](URL)Scope and sourcing (required)
[text](URL).AI Tool Usage Disclosure (required for all PRs)
Please select exactly one of the following options. PRs that leave this section blank will be closed.
the contents and I affirm the results. The LLM used is
[llm name and version]and the prompt used is
[your prompt here]. I have independently verified every citation and technical claim against the cited source. [Feel free to add more details if needed]