RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#677
RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)#677potiuk wants to merge 7 commits into
Conversation
Rebased onto current master, which already added AGENTS.md and SECURITY.md. Keeps both maintainer files and adds the detailed THREAT_MODEL.md plus the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointers. Generated-by: Claude Opus 4.8 (1M context)
35879b0 to
d4f0fdd
Compare
|
This PR looks like it needs answers from developers before submitting. |
|
Yes. Absolutely. - it's enough we just comment in the PR answering the questions and I will update the PR accordingly |
|
In answer to the first question, there is a PR to ensure we have this covered. |
Incorporates the Creadur PMC's PR apache#677 review: - archive walker confirmed unbounded (in-memory extraction) -> §9 gap + §10 - XML/DOCTYPE hardening noted as in-flight PMC PR (§14 Q3, link pending) - documents RAT write mode (--addLicense) as trusted-input / out-of-model - notes CLI/Ant/Maven front-ends are generated from a common core - §15 corrected: SECURITY.md already exists (added via apache#671) Generated-by: Claude Opus 4.8
|
Thanks
Still open if you have a moment (one line each is plenty): Q1 (confirm the untrusted-input case is the one to model), Q2 (RAT makes no network connections), Q5 (Whisker/Tentacles share the profile), and Q6 (want us to add the same pointer files to creadur-whisker/-tentacles, or will you?). One note on CI: the failing "Build and analyze" (CodeQL) check is unrelated to this PR — it's a docs-only change (three .md files), so it isn't introducing or affected by that build job; looks pre-existing/flaky on the branch. |
|
@potiuk There is one more point that has not been discussed. RAT allows developers to extend the matching algorithms. See https://creadur.apache.org/rat/license_def.html#Matchers The upshot is that 3rd parties can create new matchers and use them in license checks. Matchers are different from license checks in that license checks use matchers. For example the Apache 2.0 license check uses the Matchers scan the contents of the file (as a String) looking for matches. This means that a custom matcher would have access to all text from all files that are selected for scanning. But this is defined in the configuration and is under control of the developer using RAT. |
|
@potiuk the Sonarbuild does only run on specific branches/with specific PRs as the credentials are not shared among all PRs/builds due to ASF restrictions. |
#679 is the PR that does the XXE hardening. |
Per Claudenw (PR apache#677): RAT lets operators define custom matcher classes that see all scanned file text, but the matcher set is operator-defined config (not attacker-supplied), so it's OUT-OF-MODEL: trusted-input — same posture as the write mode. Generated-by: Claude Opus 4.8 (1M context)
|
Thanks
Still one open item: the XXE-hardening PR number (§14 Q3) — I've left §8 #2 tentative pending it. Whenever you drop the number I'll cite it and flip XXE from "hardening in flight" to a provided property. No rush. The remaining §14 questions (Q1 untrusted-input posture, Q2 no-network, Q5 Whisker/Tentacles profile, Q6 sibling pointer files) are still open whenever convenient — one line each is plenty. |
|
@potiuk - thanks again:
|
…d (external entities disabled, apache#679 hardens DOCTYPE), no-network confirmed (XSLT xsl:include caveat), correct archive path-handling (read to memory, no extract-to-disk → no path traversal), Whisker/Tentacles deferred
|
Thanks @ottlinger and @Claudenw — that's everything we needed. I've folded all your answers into
With every §14 question answered, the model is ready to ratify whenever the PMC's happy with it. (The red check is the CodeQL "Build and analyze" job, which is unrelated to these doc-only changes — all 13 build/test matrix jobs pass.) |
|
@potiuk if we merge this PR we are done and trigger the next step or are there any other things that need to be done on our side? Thanks |
|
@ottlinger @Claudenw — quick note on the red check: that's only the SonarCloud job, which can't run on PRs because the ASF doesn't share its Sonar credentials with PR builds (as you noted). It's not an actual test failure, so it's safe to merge past it. And yes — merging this is the last step on your side; once it lands I'll verify the AGENTS.md → SECURITY.md → THREAT_MODEL.md chain resolves and Creadur/RAT enters the queue. (Starting with RAT and adding Tentacles/Whisker later is fine.) |
|
@Claudenw anything to add from your side? I'm fine with merging in order to go forward. Thx |
Claudenw
left a comment
There was a problem hiding this comment.
I will have to spend more time thinking about the threat model. I think with license headers, and the XXE questions resolved we can proceed.
- THREAT_MODEL.md: use https for the license URL; clarify that --addLicense write mode inserts text from the operator-controlled license definition, not from the scanned tree. - AGENTS.md, SECURITY.md: add the ASF license header. - SECURITY.md: replace the TBD intro with a short security-posture summary and a "Reporting a Vulnerability" section. Generated-by: Claude Opus 4.8 (1M context)
|
Thanks for the careful review, @Claudenw and @ottlinger — really appreciate it. All six threads are addressed in a6978b8 and resolved:
Should be good to go now — shout if anything else stands out. Thanks again! 🙏 |
|
BTW. @Claudenw Usually (in order to save tokens) we either use no licence or we use short SPDX form of it - and usually exclude the AGENTS.md/SECURITY.md from release - to be 100% in line with expectation of full licences in released source packages. We had a lengthy discussion about it - without complete conclusion, but the discussion seemed to converge on "If you are not releasing the .md files, it's fine to have no licence or SPDX or some other short version of it. This is generally OK - according to https://www.apache.org/legal/src-headers.html#is-a-short-form-of-the-source-header-available - where one such example short form is allowed in justified cases, while SPDX seems to be another good form. and we usually exclude those files in .rat excludes and ignore on source releases. Long thread about it (with my assumptions summarized) can be found here - https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53 If you wish - we can use either approach. Greeting from NY Claude :) |
…e labels Answers Claudenw's review note (does apache#679 impact the XXE data-flow line?): the §5a/§8 text already records that RAT disables external entities + the apache#679 DOCTYPE hardening, but the data-flow diagram and the input/residual tables still labelled XXE a bare "surface". Annotate those three labels with the mitigation so the diagram is consistent with §5a/§8 apache#2. Generated-by: Claude Opus 4.8 (1M context)
Consistency with THREAT_MODEL.md (§5a / §8 apache#2): since RAT-560 (apache#679) RAT builds XML parsers via the hardened StandardXmlFactory (DOCTYPE + external entities disabled), so XXE is actively prevented. Lead with that; keep the operator-trusted-config argument as defense-in-depth. Generated-by: Claude Opus 4.8 (1M context)
What
Adds a threat model for Apache Creadur (RAT) at the Creadur PMC's request (GLASSWING / Mythos scan pre-flight):
THREAT_MODEL.md— the model (rubric).SECURITY.md+AGENTS.md— disclosure pointer + theAGENTS.md -> SECURITY.md -> THREAT_MODEL.mdchain.The model in brief
RAT is modelled as an in-process build/CLI license-audit tool — not a network service, and explicitly not a security/vulnerability scanner. Its security-relevant case is auditing untrusted input: the XML configuration (XXE surface) and archive descent (decompression-bomb surface). Findings that require RAT to process input the operator already trusts (the normal case — your own source tree) are out of model.
DRAFT — you own it; two quick technical confirmations
Because RAT is small, the §8-vs-§9 split hinges on two facts I've left as section 14 questions:
XMLConfigurationReaderdisable DOCTYPE/external entities (XXE-safe)?ArchiveWalkerbound decompression (size/depth/entry-count)?Your answers turn those from "open question" into either a provided property (§8) or a documented gap + downstream note (§9). Also Q6: want me to add the same chain to
creadur-whiskerandcreadur-tentaclesso all three are discoverable?Generated by the ASF Security team's threat-model tooling (Claude Opus); reviewed before opening.