From d4f0fdd434e2a68361987bafa7b8d7f53185dd17 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Sun, 14 Jun 2026 03:19:35 +0200 Subject: [PATCH 1/7] Add THREAT_MODEL.md and wire AGENTS.md/SECURITY.md to it Rebased onto current master, which already added AGENTS.md and SECURITY.md. Keeps both maintainer files and adds the detailed THREAT_MODEL.md plus the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointers. Generated-by: Claude Opus 4.8 (1M context) --- AGENTS.md | 7 ++ SECURITY.md | 8 ++ THREAT_MODEL.md | 247 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 262 insertions(+) create mode 100644 THREAT_MODEL.md diff --git a/AGENTS.md b/AGENTS.md index cd97ef78f..f8448af2b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -31,3 +31,10 @@ consult before producing output. - Don't add unnecessary dependencies - Follow the existing codebase patterns and conventions - Test your solutions when possible with unit or integration tests + +## Security + +Security model: [SECURITY.md](./SECURITY.md), which links to the project's +threat model at [THREAT_MODEL.md](./THREAT_MODEL.md). Consult the threat model +for the project's in-scope / out-of-scope declarations and known non-findings +before reporting security issues. diff --git a/SECURITY.md b/SECURITY.md index ec85a3c83..d3a5b9f23 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -9,3 +9,11 @@ TBD - Configuration files and XSLT documents passed to RAT are operator-controlled configuration, not request input. Reports claiming SSRF or path traversal via these resolvers, based on the assumption that the resource name is attacker-controlled, are out of scope under the documented threat model. XML and XSLT authorship, as well as resource configuration, are privileged operations. - Applications that thread untrusted input into XML configuration or XSLT documents should validate that input before passing it to RAT. Responsibility for such validation rests with the application, not with RAT. + +## Threat Model + +The full Apache Creadur RAT threat model — scope and intended use, trust +boundaries, the security properties RAT provides and disclaims, the adversary +model, and known non-findings — is documented in +[THREAT_MODEL.md](./THREAT_MODEL.md). The scope notes above are a summary; +THREAT_MODEL.md is the detailed companion. diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md new file mode 100644 index 000000000..3bbb889e8 --- /dev/null +++ b/THREAT_MODEL.md @@ -0,0 +1,247 @@ + +# Apache Creadur (RAT) — Threat Model + +## §1 Header + +- **Project:** Apache Creadur — primarily **RAT (Release Audit Tool)** + (`apache/creadur-rat`), with sibling tools **Whisker** + (`apache/creadur-whisker`, license-documentation generator) and **Tentacles** + (`apache/creadur-tentacles`, release-bundle analyzer). This model is written + in `creadur-rat` and covers the Creadur dev-tool family; Whisker/Tentacles + share RAT's trust profile (§2). +- **Written against:** `main`/`master` @ HEAD (2026-06). +- **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta + rubric) at the Creadur PMC's request (path 3). +- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified. +- **Reporting cross-reference:** §8-violating findings via the ASF security + process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this doc. +- **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each + *(inferred)* has a §14 open question. +- **Draft confidence:** ~14 documented / 0 maintainer / 16 inferred. + +**What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a +source tree, matches files against configurable license/header definitions, and +reports unapproved or unknown licenses. It runs as a **CLI**, an **Ant task**, +or a **Maven plugin** — always **in the developer's or CI's own process**, +never as a network service. Whisker generates license documentation; Tentacles +inspects staged release bundles. None is a server. + +## §2 Scope and intended use + +Intended use: a project maintainer or CI job runs RAT over a codebase to verify +license compliance before a release or on each change. The two inputs are the +**tree being audited** (files, including archives RAT descends into) and the +**RAT configuration** (XML/text license + matcher definitions). + +Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are +normally trusted too** (your own source, your own config) — but RAT is +sometimes pointed at **untrusted input**: a CI job auditing an untrusted +contribution/PR, or auditing a downloaded third-party artifact. That is the +case the model cares about. *(inferred — Q1.)* + +**Component families.** + +| Family | Entry point | Untrusted-input exposure | In model? | +| --- | --- | --- | --- | +| File walking + license matching | `Reporter`, walkers | scanned file **content/paths** | **Yes** | +| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if attacker-supplied) | **Yes** (XXE surface) | +| **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | **Yes** (decompression-bomb surface) | +| CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) | wrappers — trusted | +| Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note | + +## §3 Out of scope (explicit non-goals) + +- **RAT as a security scanner.** RAT checks *license* compliance; it is **not** + a vulnerability scanner or a security gate. "RAT didn't catch X security + issue" is not in scope. *(documented — purpose.)* +- **Audit *correctness* as a security property.** A missed/false license match + is a correctness bug, not a vulnerability (unless it crosses a resource bound, + §8). *(inferred.)* +- **The build/CI environment** RAT runs in, and the trust of the source tree + when RAT is deliberately run on your own (trusted) code — the dominant, + intended case. Findings whose only impact requires running RAT on input you + already trust are `OUT-OF-MODEL: trusted-input`. +- **Test resources** (the deliberately-odd license fixtures under + `*/src/test/resources/`) — those are test data, not a target. + +## §4 Trust boundaries and data flow + +The boundary is **the input RAT is pointed at** — files and configuration. +RAT's security questions only arise when that input is **untrusted**: + +``` +caller invokes RAT (CLI/Ant/Maven) on a directory + a config + │ trusted invocation + ▼ +read configuration (XMLConfigurationReader) ── XXE surface if config is untrusted +walk tree -> for each file: read content, match licenses + └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path surface if archive is untrusted + ▼ +report (approved / unapproved / unknown) +``` + +**Reachability precondition (triager's test):** a finding is in-model only if it +is triggered by **untrusted input** (a hostile file/archive/config) that a +*realistic* RAT deployment processes — e.g. CI auditing an untrusted PR. A +finding that requires the operator to feed RAT input they already control is +`OUT-OF-MODEL: trusted-input` (§3). + +## §5 Assumptions about the environment + +- A JRE; RAT reads the filesystem it is pointed at and writes a report. It opens + **no network connections** and runs no services. *(inferred — Q2, the + no-network claim is high-value to confirm.)* +- The XML parser behaviour depends on the platform JAXP unless RAT configures it + (§5a/§8). *(inferred — Q3.)* + +## §5a Build-time and configuration variants + +RAT has no security-mode flag. The security-relevant configuration is whether +its **XML config parser disables DOCTYPE/external entities** and whether the +**archive walker bounds decompression** (depth/size/entry count). Both are +hardcoded behaviours, not operator knobs — confirmed in §14. There is no +"insecure default toggle"; the question is simply what the parser/walker do by +default. *(inferred — Q3/Q4.)* + +## §6 Assumptions about inputs + +| Input | Attacker-controllable? (untrusted-run) | Concern | +| --- | --- | --- | +| scanned file content | **yes** | parsed/read; resource use | +| scanned file paths / archive entry names | **yes** | path handling on archive extraction | +| archives (zip/jar/tar) in the tree | **yes** | decompression bomb / nested-archive depth | +| RAT XML configuration | **maybe** (only if config is attacker-supplied) | XXE / external entity | +| invocation arguments | no — trusted caller | — | + +## §7 Adversary model + +- **In scope:** the party who controls the files/archives/config that an + *untrusted-input* RAT run processes — e.g. a contributor whose PR is audited + by CI, or the author of a third-party artifact being audited. Capabilities: + craft a malicious archive (zip bomb), a hostile XML config (XXE), or + pathological file content. *(inferred — Q1.)* +- **Out of scope:** an attacker who controls the RAT invocation or the trusted + source tree (the normal case — they already own the build). + +## §8 Security properties the project provides + +1. **Bounded resource use on untrusted archives** — the archive walker should + not allow a small input to cause unbounded CPU/memory (decompression-bomb / + nested-archive defence). *Violation:* OOM/hang from a crafted archive. + *Severity:* security (DoS) when RAT audits untrusted input. *(inferred — + Q4: confirm whether bounds exist; this may be a §8 property or a §9 gap.)* +2. **Safe XML configuration parsing** — the config reader should reject + DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a + crafted config. *Severity:* critical when config is untrusted. *(inferred — + Q3: confirm DOCTYPE handling — may be §8 or §9.)* +3. **No ambient network/side effects** — RAT does filesystem I/O only. + *Violation:* unexpected outbound connection. *(inferred — Q2.)* + +(Whether items 1–2 are *provided* properties or *disclaimed* gaps depends on the +maintainer's answers in §14; they are listed here as the relevant questions.) + +## §9 Security properties the project does *not* provide + +- **No safety guarantee when run on fully untrusted input without sandboxing**, + if the §14 answers reveal the XML parser/archive walker are not hardened. In + that case: treat RAT-on-untrusted-input as you would any parser — sandbox it. +- **It is not a security/vulnerability scanner** (§3); a clean RAT report says + nothing about security. +- **Well-known classes (parser/archive tools):** XXE via configuration, + decompression bombs / nested-archive blowup, and path handling on archive + entries — the standard risks of any tool that parses XML and descends into + archives. + +## §10 Downstream responsibilities + +- When auditing **untrusted** input (CI on untrusted PRs, third-party + artifacts), run RAT with resource limits / in a sandbox, and do not feed it + attacker-controlled **configuration**. +- Keep RAT updated; pin the version in your build. +- For your own (trusted) source tree — the normal case — no special handling. + +## §11 Known misuse patterns + +- **Running RAT on untrusted archives/config in CI** without resource limits, + expecting it to be hardened against decompression bombs / XXE. +- **Treating a clean RAT report as a security sign-off** (it is a license check). + +## §11a Known non-findings (recurring false positives) + +- **"RAT reads/parses files it is told to scan"** on a **trusted** tree — that + is the function; `OUT-OF-MODEL: trusted-input` (§3/§6). +- **Odd/invalid license fixtures under `src/test/resources/`** — test data, not + a target. `OUT-OF-MODEL: unsupported-component` (§3). +- **"RAT didn't detect a security vulnerability"** — out of purpose (§3). +- **XML parsing / archive reading flagged generically** without an untrusted- + input path — non-finding unless the reachability precondition (§4) is met. + +## §12 Conditions that would change this model + +- RAT gaining a network surface or a server mode. +- A change to the XML parser hardening or archive-walker bounds (§5a/§8). +- A report unroutable to a §13 disposition → revise §8/§9. + +## §13 Triage dispositions + +| Disposition | Meaning | Licensed by | +| --- | --- | --- | +| `VALID` | A §8 property breaks via untrusted input on a realistic run. | §8, §6, §7 | +| `VALID-HARDENING` | A §11 misuse is too easy (e.g. no archive bound). | §11 | +| `OUT-OF-MODEL: trusted-input` | Requires RAT to process input the operator already trusts. | §6 | +| `OUT-OF-MODEL: adversary-not-in-scope` | Needs control of the RAT invocation/host. | §7 | +| `OUT-OF-MODEL: unsupported-component` | Test fixtures / out-of-purpose. | §3 | +| `BY-DESIGN: property-disclaimed` | "Not a security scanner", trusted-input runs. | §9 | +| `KNOWN-NON-FINDING` | Matches §11a. | §11a | +| `MODEL-GAP` | Unroutable. | triggers §12 | + +## §14 Open questions for the maintainers + +**Wave 1 — the load-bearing ones.** + +- **Q1.** Confirm the intended trust posture: RAT runs in-process for a + trusted caller; inputs are normally trusted, but the security-relevant case is + RAT auditing **untrusted** input (CI on untrusted PRs, third-party artifacts). + Is that the case you want modelled, or do you consider all RAT input trusted + (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? (§2/§7.) +- **Q3.** Does `XMLConfigurationReader` disable DOCTYPE / external entities + (XXE-safe)? If yes, §8 #2 stands; if no, it's a §9 gap + a §10 responsibility. +- **Q4.** Does `ArchiveWalker` bound decompression (size/depth/entry-count) so a + crafted archive can't exhaust memory/CPU? §8 #1 vs §9 gap. + +**Wave 2 — surface.** + +- **Q2.** Confirm RAT makes no network connections and has no side effects beyond + reading the scanned tree and writing the report. (§5/§8.) +- **Q5.** Whisker and Tentacles — same trust profile (in-process dev tool, + inputs normally trusted)? Any input either processes that this RAT model + doesn't cover (e.g. Tentacles fetching/inspecting remote bundles)? (§2.) + +**Wave 3 — coexistence.** + +- **Q6.** This adds `THREAT_MODEL.md` + `SECURITY.md` + `AGENTS.md` to + `creadur-rat`. Want matching pointer files (AGENTS.md → SECURITY.md → this + model) added to `creadur-whisker` and `creadur-tentacles` so all three are + discoverable, or will you add them? (§1/§15.) + +## §15 Appendix — existing-policy back-map + +No in-repo `SECURITY.md` exists today; this PR adds one (ASF security-process +pointer) plus `AGENTS.md`. Once the §14 answers land (especially Q3/Q4), the +§8/§9 split firms up and the same chain can be added to `creadur-whisker` and +`creadur-tentacles`. From aaf95bc5e98b6f9df6506612d9aa5878a1facb9e Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Tue, 16 Jun 2026 21:27:45 -0400 Subject: [PATCH 2/7] THREAT_MODEL.md: fold in PMC review answers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Incorporates the Creadur PMC's PR #677 review: - archive walker confirmed unbounded (in-memory extraction) -> §9 gap + §10 - XML/DOCTYPE hardening noted as in-flight PMC PR (§14 Q3, link pending) - documents RAT write mode (--addLicense) as trusted-input / out-of-model - notes CLI/Ant/Maven front-ends are generated from a common core - §15 corrected: SECURITY.md already exists (added via #671) Generated-by: Claude Opus 4.8 --- THREAT_MODEL.md | 74 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 21 deletions(-) diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index 3bbb889e8..cb4504160 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -32,7 +32,8 @@ process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this doc. - **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each *(inferred)* has a §14 open question. -- **Draft confidence:** ~14 documented / 0 maintainer / 16 inferred. +- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer + answers folded in from PR #677 review, 2026-06). **What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a source tree, matches files against configurable license/header definitions, and @@ -62,8 +63,14 @@ case the model cares about. *(inferred — Q1.)* | **XML configuration reader** | `XMLConfigurationReader` | the **config** (if attacker-supplied) | **Yes** (XXE surface) | | **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | **Yes** (decompression-bomb surface) | | CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) | wrappers — trusted | +| **License-header insertion (write mode)** | `--addLicense` / editors | **modifies files in the audited tree** (operator-invoked) | trusted-input (§3) | | Whisker / Tentacles | their CLIs | same dev-tool profile | sibling — §2 note | +**Note (PMC, review).** The CLI, Ant task, and Maven plugin front-ends are +generated from a common option core, so any security-relevant behaviour (or +gap) in that core transfers automatically to all three UIs — a finding in one +front-end's handling generally applies to all of them. *(maintainer.)* + ## §3 Out of scope (explicit non-goals) - **RAT as a security scanner.** RAT checks *license* compliance; it is **not** @@ -78,6 +85,12 @@ case the model cares about. *(inferred — Q1.)* already trust are `OUT-OF-MODEL: trusted-input`. - **Test resources** (the deliberately-odd license fixtures under `*/src/test/resources/`) — those are test data, not a target. +- **RAT's header-insertion / file-modification mode** (`--addLicense` and the + editors) — RAT can *write* license headers into the audited files, mutating + the tree. This is explicitly operator-invoked against the operator's own + (trusted) sources; a run that modifies files the operator already controls is + `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is + noted here so the boundary is explicit rather than silent.) *(maintainer.)* ## §4 Trust boundaries and data flow @@ -114,9 +127,12 @@ finding that requires the operator to feed RAT input they already control is RAT has no security-mode flag. The security-relevant configuration is whether its **XML config parser disables DOCTYPE/external entities** and whether the **archive walker bounds decompression** (depth/size/entry count). Both are -hardcoded behaviours, not operator knobs — confirmed in §14. There is no -"insecure default toggle"; the question is simply what the parser/walker do by -default. *(inferred — Q3/Q4.)* +hardcoded behaviours, not operator knobs. The **archive walker does not bound +decompression** — it extracts entry contents into an in-memory buffer (Apache +Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit +(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened via +a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3 +pending PR link.)* ## §6 Assumptions about inputs @@ -140,20 +156,24 @@ default. *(inferred — Q3/Q4.)* ## §8 Security properties the project provides -1. **Bounded resource use on untrusted archives** — the archive walker should - not allow a small input to cause unbounded CPU/memory (decompression-bomb / - nested-archive defence). *Violation:* OOM/hang from a crafted archive. - *Severity:* security (DoS) when RAT audits untrusted input. *(inferred — - Q4: confirm whether bounds exist; this may be a §8 property or a §9 gap.)* +1. **Bounded resource use on untrusted archives** — **not currently provided.** + The archive walker (`ArchiveWalker`) uses Apache Commons Compress + `ArchiveStreamFactory` and extracts entry contents into an **in-memory + buffer** held until the document is processed, with no decompression / + size / depth / entry-count bound — so a crafted archive can exhaust memory + (OOM). This is therefore a **disclaimed gap (§9)** plus a downstream + responsibility (§10), not a provided property. *(maintainer — confirmed by + the Creadur PMC in PR #677 review, 2026-06.)* 2. **Safe XML configuration parsing** — the config reader should reject DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a - crafted config. *Severity:* critical when config is untrusted. *(inferred — - Q3: confirm DOCTYPE handling — may be §8 or §9.)* + crafted config. *Severity:* critical when config is untrusted. The PMC has + noted a hardening PR is in flight addressing this (§14 Q3); pending its link + this stays tentative. *(maintainer / Q3 pending PR link.)* 3. **No ambient network/side effects** — RAT does filesystem I/O only. *Violation:* unexpected outbound connection. *(inferred — Q2.)* -(Whether items 1–2 are *provided* properties or *disclaimed* gaps depends on the -maintainer's answers in §14; they are listed here as the relevant questions.) +(Item 1 is resolved as a disclaimed §9 gap per the maintainer's archive answer; +item 2 firms up once the §14 Q3 XXE-hardening PR is linked.) ## §9 Security properties the project does *not* provide @@ -162,6 +182,11 @@ maintainer's answers in §14; they are listed here as the relevant questions.) that case: treat RAT-on-untrusted-input as you would any parser — sandbox it. - **It is not a security/vulnerability scanner** (§3); a clean RAT report says nothing about security. +- **Decompression-bomb / archive resource exhaustion** — **confirmed not + bounded.** Archives are extracted into an in-memory buffer with no + size/depth/entry-count limit (Commons Compress `ArchiveStreamFactory`), so + RAT pointed at untrusted archives can OOM. Runs over untrusted archives must + be sandboxed / resource-limited (§10). *(maintainer.)* - **Well-known classes (parser/archive tools):** XXE via configuration, decompression bombs / nested-archive blowup, and path handling on archive entries — the standard risks of any tool that parses XML and descends into @@ -219,10 +244,15 @@ maintainer's answers in §14; they are listed here as the relevant questions.) RAT auditing **untrusted** input (CI on untrusted PRs, third-party artifacts). Is that the case you want modelled, or do you consider all RAT input trusted (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? (§2/§7.) -- **Q3.** Does `XMLConfigurationReader` disable DOCTYPE / external entities - (XXE-safe)? If yes, §8 #2 stands; if no, it's a §9 gap + a §10 responsibility. -- **Q4.** Does `ArchiveWalker` bound decompression (size/depth/entry-count) so a - crafted archive can't exhaust memory/CPU? §8 #1 vs §9 gap. +- **Q3.** *(Partially answered — PMC, PR #677: a hardening PR is in flight + ensuring DOCTYPE / external-entity handling is covered. **Pending the PR link + to cite**; once landed §8 #2 becomes a provided property.)* Does + `XMLConfigurationReader` disable DOCTYPE / external entities (XXE-safe)? +- **Q4.** *(Answered — PMC, PR #677: no bound. Archives are extracted into an + in-memory buffer (Commons Compress `ArchiveStreamFactory`) held until the + document is processed, so a crafted archive can OOM. Resolved as a §9 gap + + §10 responsibility; §8 #1 is **not** a provided property.)* Does + `ArchiveWalker` bound decompression (size/depth/entry-count)? **Wave 2 — surface.** @@ -241,7 +271,9 @@ maintainer's answers in §14; they are listed here as the relevant questions.) ## §15 Appendix — existing-policy back-map -No in-repo `SECURITY.md` exists today; this PR adds one (ASF security-process -pointer) plus `AGENTS.md`. Once the §14 answers land (especially Q3/Q4), the -§8/§9 split firms up and the same chain can be added to `creadur-whisker` and -`creadur-tentacles`. +A basic `SECURITY.md` was introduced via #671 (ASF security-process pointer); +this PR **appends** the `AGENTS.md` → `SECURITY.md` → `THREAT_MODEL.md` +discoverability pointer to it and adds `AGENTS.md`. With the §14 Q4 answer in +(archive walker unbounded → §9 gap) and Q3 pending its hardening-PR link, the +§8/§9 split is firming up; the same chain can be added to `creadur-whisker` and +`creadur-tentacles` (§14 Q6). From 2026dfb4bf93c2bc764bfce3a2ce969d7f572a75 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Thu, 18 Jun 2026 18:03:07 -0400 Subject: [PATCH 3/7] =?UTF-8?q?THREAT=5FMODEL.md:=20add=20custom-matcher?= =?UTF-8?q?=20extension=20surface=20(=C2=A73)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per Claudenw (PR #677): RAT lets operators define custom matcher classes that see all scanned file text, but the matcher set is operator-defined config (not attacker-supplied), so it's OUT-OF-MODEL: trusted-input — same posture as the write mode. Generated-by: Claude Opus 4.8 (1M context) --- THREAT_MODEL.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index cb4504160..4ec61331e 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -91,6 +91,15 @@ front-end's handling generally applies to all of them. *(maintainer.)* (trusted) sources; a run that modifies files the operator already controls is `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is noted here so the boundary is explicit rather than silent.) *(maintainer.)* +- **Custom matchers / matcher extensions** + () — RAT lets the + operator define custom matcher classes in its configuration, and a custom + matcher sees the full text of every file selected for scanning. Because the + matcher set is operator-defined configuration under the control of whoever + runs RAT (not attacker-supplied), a custom matcher reading scanned text is + `OUT-OF-MODEL: trusted-input` — the same posture as any operator-supplied + extension code (cf. the write mode above). (Raised by the PMC in review.) + *(maintainer — Claudenw.)* ## §4 Trust boundaries and data flow From f9c44b3a8356a8e195ff712225f3fc24ad81f51b Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Sun, 21 Jun 2026 15:20:26 -0400 Subject: [PATCH 4/7] =?UTF-8?q?Fold=20PMC=20review=20answers=20(PR=20#677)?= =?UTF-8?q?:=20Q1-Q6=20resolved=20=E2=80=94=20XXE=20provided=20(external?= =?UTF-8?q?=20entities=20disabled,=20#679=20hardens=20DOCTYPE),=20no-netwo?= =?UTF-8?q?rk=20confirmed=20(XSLT=20xsl:include=20caveat),=20correct=20arc?= =?UTF-8?q?hive=20path-handling=20(read=20to=20memory,=20no=20extract-to-d?= =?UTF-8?q?isk=20=E2=86=92=20no=20path=20traversal),=20Whisker/Tentacles?= =?UTF-8?q?=20deferred?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- THREAT_MODEL.md | 138 ++++++++++++++++++++++++++---------------------- 1 file changed, 76 insertions(+), 62 deletions(-) diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index 4ec61331e..b04492a8b 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -27,13 +27,14 @@ - **Written against:** `main`/`master` @ HEAD (2026-06). - **Author:** ASF Security team, via the threat-model-producer rubric (Scovetta rubric) at the Creadur PMC's request (path 3). -- **Status:** DRAFT — under maintainer review (2026-06-10). Not yet ratified. +- **Status:** DRAFT — all §14 questions answered in the PR #677 review + (ottlinger, Claudenw; 2026-06-21); ready to ratify at the PMC's discretion. - **Reporting cross-reference:** §8-violating findings via the ASF security process ([`SECURITY.md`](SECURITY.md)); §3/§9 findings closed citing this doc. - **Provenance legend:** *(documented)* / *(maintainer)* / *(inferred)* — each *(inferred)* has a §14 open question. -- **Draft confidence:** ~14 documented / 5 maintainer / 11 inferred (maintainer - answers folded in from PR #677 review, 2026-06). +- **Draft confidence:** ~14 documented / 16 maintainer / 1 inferred (all §14 + questions answered in the PR #677 review, 2026-06). **What it is.** RAT is a **build-time / CLI license-auditing tool**: it walks a source tree, matches files against configurable license/header definitions, and @@ -53,7 +54,8 @@ Caller trust level: the developer/CI invoking RAT is trusted. The **inputs are normally trusted too** (your own source, your own config) — but RAT is sometimes pointed at **untrusted input**: a CI job auditing an untrusted contribution/PR, or auditing a downloaded third-party artifact. That is the -case the model cares about. *(inferred — Q1.)* +case the model cares about. *(maintainer — Claudenw, PR #677: confirmed; RAT +config is operator-trusted, the scanned files may be untrusted.)* **Component families.** @@ -126,10 +128,16 @@ finding that requires the operator to feed RAT input they already control is ## §5 Assumptions about the environment - A JRE; RAT reads the filesystem it is pointed at and writes a report. It opens - **no network connections** and runs no services. *(inferred — Q2, the - no-network claim is high-value to confirm.)* -- The XML parser behaviour depends on the platform JAXP unless RAT configures it - (§5a/§8). *(inferred — Q3.)* + **no network connections** and runs no services — RAT runs locally and only + opens files. The one operator-reachable exception is an XSLT stylesheet using + `xsl:include` to pull a remote resource; XSLT stylesheets are trusted, + operator-controlled config, so that is `OUT-OF-MODEL: trusted-input` (§3). + (Build tooling — Maven/Ant — may fetch dependencies, but RAT itself does not.) + *(maintainer — Claudenw + ottlinger, PR #677.)* +- The XML parser behaviour depends on the platform JAXP and is configurable via + the standard [JAXP system properties](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jaxp/jaxp.html); + RAT disables external entities by default (§5a / §8 #2). *(maintainer — + Claudenw, PR #677.)* ## §5a Build-time and configuration variants @@ -139,16 +147,16 @@ its **XML config parser disables DOCTYPE/external entities** and whether the hardcoded behaviours, not operator knobs. The **archive walker does not bound decompression** — it extracts entry contents into an in-memory buffer (Apache Commons Compress `ArchiveStreamFactory`) with no size/depth/entry-count limit -(§8/§9, maintainer-confirmed). XML-parser DOCTYPE handling is being hardened via -a PMC PR (§14 Q3). There is no "insecure default toggle". *(maintainer / Q3 -pending PR link.)* +(§8/§9, maintainer-confirmed). The XML config reader **disables external +entities**; DOCTYPE handling is further hardened by PR #679 (§8 #2). There is no +"insecure default toggle". *(maintainer — Claudenw, PR #677; hardening in #679.)* ## §6 Assumptions about inputs | Input | Attacker-controllable? (untrusted-run) | Concern | | --- | --- | --- | | scanned file content | **yes** | parsed/read; resource use | -| scanned file paths / archive entry names | **yes** | path handling on archive extraction | +| scanned file paths / archive entry names | **yes** | reported as labels only — entries are read into memory, never extracted to disk, so no zip-slip / path-traversal-on-write surface *(maintainer)* | | archives (zip/jar/tar) in the tree | **yes** | decompression bomb / nested-archive depth | | RAT XML configuration | **maybe** (only if config is attacker-supplied) | XXE / external entity | | invocation arguments | no — trusted caller | — | @@ -159,7 +167,7 @@ pending PR link.)* *untrusted-input* RAT run processes — e.g. a contributor whose PR is audited by CI, or the author of a third-party artifact being audited. Capabilities: craft a malicious archive (zip bomb), a hostile XML config (XXE), or - pathological file content. *(inferred — Q1.)* + pathological file content. *(maintainer — Claudenw, PR #677.)* - **Out of scope:** an attacker who controls the RAT invocation or the trusted source tree (the normal case — they already own the build). @@ -173,16 +181,19 @@ pending PR link.)* (OOM). This is therefore a **disclaimed gap (§9)** plus a downstream responsibility (§10), not a provided property. *(maintainer — confirmed by the Creadur PMC in PR #677 review, 2026-06.)* -2. **Safe XML configuration parsing** — the config reader should reject - DOCTYPE/external entities (no XXE). *Violation:* file read / SSRF via a - crafted config. *Severity:* critical when config is untrusted. The PMC has - noted a hardening PR is in flight addressing this (§14 Q3); pending its link - this stays tentative. *(maintainer / Q3 pending PR link.)* -3. **No ambient network/side effects** — RAT does filesystem I/O only. - *Violation:* unexpected outbound connection. *(inferred — Q2.)* - -(Item 1 is resolved as a disclaimed §9 gap per the maintainer's archive answer; -item 2 firms up once the §14 Q3 XXE-hardening PR is linked.) +2. **Safe XML configuration parsing (no XXE)** — **provided.** The config reader + has **external entities disabled**; DOCTYPE handling is further hardened by + PR #679. *Violation:* file read / SSRF via a crafted config. *Severity:* + critical when config is untrusted. *(maintainer — Claudenw, PR #677; hardening + in #679.)* +3. **No ambient network/side effects** — RAT does filesystem I/O only; it opens + no network connections on default settings. (Sole exception: an + operator-supplied XSLT `xsl:include` pointing at a remote resource — trusted + config, `OUT-OF-MODEL`.) *Violation:* unexpected outbound connection. + *(maintainer — Claudenw + ottlinger, PR #677.)* + +(Item 1 is a disclaimed §9 gap per the maintainer's archive answer; item 2 is a +provided property — external entities disabled, with PR #679 hardening DOCTYPE.) ## §9 Security properties the project does *not* provide @@ -196,10 +207,12 @@ item 2 firms up once the §14 Q3 XXE-hardening PR is linked.) size/depth/entry-count limit (Commons Compress `ArchiveStreamFactory`), so RAT pointed at untrusted archives can OOM. Runs over untrusted archives must be sandboxed / resource-limited (§10). *(maintainer.)* -- **Well-known classes (parser/archive tools):** XXE via configuration, - decompression bombs / nested-archive blowup, and path handling on archive - entries — the standard risks of any tool that parses XML and descends into - archives. +- **Well-known classes (parser/archive tools):** decompression bombs / + nested-archive blowup remain the live untrusted-archive risk. XXE via + configuration is mitigated (external entities disabled, §8 #2). Path-traversal + on archive entries does **not** apply: RAT reads entries into memory and never + extracts them to disk, so an entry label like `bar/baz.zip#/junk.txt` is a + report string, not a write path. *(maintainer — Claudenw, PR #677.)* ## §10 Downstream responsibilities @@ -246,43 +259,44 @@ item 2 firms up once the §14 Q3 XXE-hardening PR is linked.) ## §14 Open questions for the maintainers -**Wave 1 — the load-bearing ones.** - -- **Q1.** Confirm the intended trust posture: RAT runs in-process for a - trusted caller; inputs are normally trusted, but the security-relevant case is - RAT auditing **untrusted** input (CI on untrusted PRs, third-party artifacts). - Is that the case you want modelled, or do you consider all RAT input trusted - (which would move XXE/archive items to `OUT-OF-MODEL: trusted-input`)? (§2/§7.) -- **Q3.** *(Partially answered — PMC, PR #677: a hardening PR is in flight - ensuring DOCTYPE / external-entity handling is covered. **Pending the PR link - to cite**; once landed §8 #2 becomes a provided property.)* Does - `XMLConfigurationReader` disable DOCTYPE / external entities (XXE-safe)? -- **Q4.** *(Answered — PMC, PR #677: no bound. Archives are extracted into an - in-memory buffer (Commons Compress `ArchiveStreamFactory`) held until the - document is processed, so a crafted archive can OOM. Resolved as a §9 gap + - §10 responsibility; §8 #1 is **not** a provided property.)* Does - `ArchiveWalker` bound decompression (size/depth/entry-count)? - -**Wave 2 — surface.** - -- **Q2.** Confirm RAT makes no network connections and has no side effects beyond - reading the scanned tree and writing the report. (§5/§8.) -- **Q5.** Whisker and Tentacles — same trust profile (in-process dev tool, - inputs normally trusted)? Any input either processes that this RAT model - doesn't cover (e.g. Tentacles fetching/inspecting remote bundles)? (§2.) - -**Wave 3 — coexistence.** - -- **Q6.** This adds `THREAT_MODEL.md` + `SECURITY.md` + `AGENTS.md` to - `creadur-rat`. Want matching pointer files (AGENTS.md → SECURITY.md → this - model) added to `creadur-whisker` and `creadur-tentacles` so all three are - discoverable, or will you add them? (§1/§15.) +All wave-1/2/3 questions were answered by the Creadur PMC in the PR #677 review +(ottlinger, Claudenw; 2026-06) and folded above. Kept here as a resolved record. + +- **Q1 — trust posture (answered, Claudenw).** Confirmed: RAT configuration + (XSLT stylesheets, config files, license definitions, custom matchers) is + trusted/operator-controlled; the scanned **files** may be untrusted (CI + auditing a third-party PR/artifact). The attack surface is whatever can break + out of the scanning stream under default settings. Folded into §2 / §7. +- **Q2 — no network (answered, Claudenw + ottlinger).** Confirmed: RAT opens no + network connections; it only reads files. Sole exception is an operator-set + XSLT `xsl:include` to a remote resource (trusted config → `OUT-OF-MODEL`). + Build tooling (Maven/Ant) may fetch dependencies, but RAT itself does not. + Folded into §5 / §8 #3. +- **Q3 — XXE / XML parser (answered, Claudenw).** External entities are + **disabled** in the config reader; DOCTYPE handling is further hardened by + PR #679. JAXP behaviour is configurable via the standard JAXP system + properties. §8 #2 is now a **provided** property. +- **Q4 — archive bound (answered, PMC).** No bound — entries are read into an + in-memory buffer (Commons Compress) with no size/depth/entry-count limit, and + OOM is **not** guarded ("we probably should add a limit but at this time we do + not"). Resolved as a §9 gap + §10 responsibility; §8 #1 is **not** provided. + Entries are never extracted to disk, so there is no path-traversal-on-write + surface (§6 / §9). +- **Q5 / Q6 — Whisker / Tentacles coexistence (answered, ottlinger).** + Development on Whisker/Tentacles is low right now; the PMC prefers to **start + with RAT** and add the sibling pointer files (AGENTS.md → SECURITY.md → model) + to `creadur-whisker` / `creadur-tentacles` later, to reduce noise. This PR is + therefore scoped to `creadur-rat`; the sibling chain is a deferred follow-up. + +With every question resolved, this model is ready to move from DRAFT to ratified +at the PMC's discretion. ## §15 Appendix — existing-policy back-map A basic `SECURITY.md` was introduced via #671 (ASF security-process pointer); this PR **appends** the `AGENTS.md` → `SECURITY.md` → `THREAT_MODEL.md` -discoverability pointer to it and adds `AGENTS.md`. With the §14 Q4 answer in -(archive walker unbounded → §9 gap) and Q3 pending its hardening-PR link, the -§8/§9 split is firming up; the same chain can be added to `creadur-whisker` and -`creadur-tentacles` (§14 Q6). +discoverability pointer to it and adds `AGENTS.md`. With every §14 question +answered — Q4 archive walker unbounded → §9 gap, Q3 external entities disabled + +PR #679 hardening DOCTYPE → §8 #2 provided — the §8/§9 split is settled. The same +pointer chain will be added to `creadur-whisker` and `creadur-tentacles` as a +deferred follow-up (§14 Q5/Q6). From a6978b840a4405f9f5f248a6ec5a96ce7ac9bb47 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Sat, 27 Jun 2026 03:29:44 -0400 Subject: [PATCH 5/7] Address review feedback on THREAT_MODEL / SECURITY / AGENTS - THREAT_MODEL.md: use https for the license URL; clarify that --addLicense write mode inserts text from the operator-controlled license definition, not from the scanned tree. - AGENTS.md, SECURITY.md: add the ASF license header. - SECURITY.md: replace the TBD intro with a short security-posture summary and a "Reporting a Vulnerability" section. Generated-by: Claude Opus 4.8 (1M context) --- AGENTS.md | 51 ++++++++++++++----------------------------------- SECURITY.md | 28 ++++++++++++++++++++++++++- THREAT_MODEL.md | 4 ++-- 3 files changed, 43 insertions(+), 40 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index f8448af2b..97e91f376 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,40 +1,17 @@ -# Agent guidance + -# BEST PRACTICES -- **DRY (Don't Repeat Yourself)**: Avoid code duplication -- **SOLID Principles**: Follow object-oriented design principles -- **Error Handling**: Always handle potential errors gracefully -- **Security**: Consider security implications in your code -- **Version Control**: Write clear commit messages, referencing Apache Creadur Jira tickets if possible - -# COMMUNICATION -- Explain your approach before implementing -- Break down complex solutions into steps -- Provide examples when helpful -- Ask clarifying questions when requirements are unclear - -# RESTRICTIONS -- Always ask before making breaking changes -- Don't add unnecessary dependencies -- Follow the existing codebase patterns and conventions -- Test your solutions when possible with unit or integration tests - -## Security - -Security model: [SECURITY.md](./SECURITY.md), which links to the project's -threat model at [THREAT_MODEL.md](./THREAT_MODEL.md). Consult the threat model -for the project's in-scope / out-of-scope declarations and known non-findings -before reporting security issues. diff --git a/SECURITY.md b/SECURITY.md index d3a5b9f23..ed9024a2b 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -1,6 +1,32 @@ + + # Security -TBD +Apache Creadur RAT (Release Audit Tool) runs as a CLI, an Ant task, or a Maven +plugin in the developer's or CI's own process — it is not a network service. It +audits a source tree against operator-controlled license and header definitions. + +## Reporting a Vulnerability + +Please report suspected security vulnerabilities privately to the Apache Security +Team at security@apache.org, following the +[ASF vulnerability handling process](https://www.apache.org/security/). Please do +not report security issues on public issue trackers or mailing lists. ## Known Non-Findings diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index b04492a8b..dce996b36 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -6,7 +6,7 @@ (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 + https://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, @@ -92,7 +92,7 @@ front-end's handling generally applies to all of them. *(maintainer.)* the tree. This is explicitly operator-invoked against the operator's own (trusted) sources; a run that modifies files the operator already controls is `OUT-OF-MODEL: trusted-input`. (Raised by the PMC in review — write mode is - noted here so the boundary is explicit rather than silent.) *(maintainer.)* + noted here so the boundary is explicit rather than silent.) The header text RAT writes in comes from the operator-controlled license/header definition (a controlled input file), not from the scanned tree, so write mode cannot inject attacker-chosen content. *(maintainer — ottlinger / Claudenw, PR #677.)* - **Custom matchers / matcher extensions** () — RAT lets the operator define custom matcher classes in its configuration, and a custom From bb3809285af755105340798c8664b6b1a8b7bfa8 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Sat, 27 Jun 2026 03:45:57 -0400 Subject: [PATCH 6/7] THREAT_MODEL.md: note RAT-560/#679 XXE mitigation on the surface labels MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Answers Claudenw's review note (does #679 impact the XXE data-flow line?): the §5a/§8 text already records that RAT disables external entities + the #679 DOCTYPE hardening, but the data-flow diagram and the input/residual tables still labelled XXE a bare "surface". Annotate those three labels with the mitigation so the diagram is consistent with §5a/§8 #2. Generated-by: Claude Opus 4.8 (1M context) --- THREAT_MODEL.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md index dce996b36..15ed198cb 100644 --- a/THREAT_MODEL.md +++ b/THREAT_MODEL.md @@ -62,7 +62,7 @@ config is operator-trusted, the scanned files may be untrusted.)* | Family | Entry point | Untrusted-input exposure | In model? | | --- | --- | --- | --- | | File walking + license matching | `Reporter`, walkers | scanned file **content/paths** | **Yes** | -| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if attacker-supplied) | **Yes** (XXE surface) | +| **XML configuration reader** | `XMLConfigurationReader` | the **config** (if attacker-supplied) | **Yes** (XXE surface — mitigated: external entities disabled, RAT-560/#679) | | **Archive walker** | `ArchiveWalker` | archives in the tree (zip/jar/tar) | **Yes** (decompression-bomb surface) | | CLI / Ant task / Maven plugin | wrappers | invocation args (trusted caller) | wrappers — trusted | | **License-header insertion (write mode)** | `--addLicense` / editors | **modifies files in the audited tree** (operator-invoked) | trusted-input (§3) | @@ -112,7 +112,7 @@ RAT's security questions only arise when that input is **untrusted**: caller invokes RAT (CLI/Ant/Maven) on a directory + a config │ trusted invocation ▼ -read configuration (XMLConfigurationReader) ── XXE surface if config is untrusted +read configuration (XMLConfigurationReader) ── XXE surface if config is untrusted (mitigated: external entities disabled, RAT-560/#679) walk tree -> for each file: read content, match licenses └─ ArchiveWalker descends into zip/jar/tar ── decompression-bomb / path surface if archive is untrusted ▼ @@ -158,7 +158,7 @@ entities**; DOCTYPE handling is further hardened by PR #679 (§8 #2). There is n | scanned file content | **yes** | parsed/read; resource use | | scanned file paths / archive entry names | **yes** | reported as labels only — entries are read into memory, never extracted to disk, so no zip-slip / path-traversal-on-write surface *(maintainer)* | | archives (zip/jar/tar) in the tree | **yes** | decompression bomb / nested-archive depth | -| RAT XML configuration | **maybe** (only if config is attacker-supplied) | XXE / external entity | +| RAT XML configuration | **maybe** (only if config is attacker-supplied) | XXE / external entity — mitigated by RAT-560/#679 (external entities disabled) | | invocation arguments | no — trusted caller | — | ## §7 Adversary model From b3464143306574de15e84b0c347cc4f4f3f8fb91 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Sat, 27 Jun 2026 03:48:34 -0400 Subject: [PATCH 7/7] SECURITY.md: flip the XXE known-non-finding to a provided property MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Consistency with THREAT_MODEL.md (§5a / §8 #2): since RAT-560 (#679) RAT builds XML parsers via the hardened StandardXmlFactory (DOCTYPE + external entities disabled), so XXE is actively prevented. Lead with that; keep the operator-trusted-config argument as defense-in-depth. Generated-by: Claude Opus 4.8 (1M context) --- SECURITY.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/SECURITY.md b/SECURITY.md index ed9024a2b..86ecb297f 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -30,11 +30,11 @@ not report security issues on public issue trackers or mailing lists. ## Known Non-Findings -- Static code analysis may report `XXE_DOCUMENT` vulnerabilities because RAT reads XML and XSLT files provided as user input. +- Static analyzers may report `XXE_DOCUMENT` on RAT's XML/XSLT reading. As of RAT-560 ([#679](https://github.com/apache/creadur-rat/pull/679)) RAT builds its XML parsers through the hardened `StandardXmlFactory`, which disables DOCTYPE and external general/parameter entities — so XXE is actively prevented and these reports are false positives against the hardened factory. - - Configuration files and XSLT documents passed to RAT are operator-controlled configuration, not request input. Reports claiming SSRF or path traversal via these resolvers, based on the assumption that the resource name is attacker-controlled, are out of scope under the documented threat model. XML and XSLT authorship, as well as resource configuration, are privileged operations. + - Defense in depth: the configuration files and XSLT documents RAT reads are operator-controlled configuration, not request input, so the resource names are not attacker-controlled in the first place. Reports asserting SSRF or path traversal via these resolvers (assuming an attacker-controlled resource name) are out of scope under the documented threat model — XML and XSLT authorship, as well as resource configuration, are privileged operations. - - Applications that thread untrusted input into XML configuration or XSLT documents should validate that input before passing it to RAT. Responsibility for such validation rests with the application, not with RAT. + - Applications that thread untrusted input into XML configuration or XSLT documents should still validate that input before passing it to RAT. Responsibility for such validation rests with the application, not with RAT. ## Threat Model