feat: add backoffice review capture endpoint + reviews composition (SITES-43974)#2652
feat: add backoffice review capture endpoint + reviews composition (SITES-43974)#2652tathagat2241 wants to merge 5 commits into
Conversation
|
This PR will trigger a minor release when merged. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
absarasw
left a comment
There was a problem hiding this comment.
Thanks Tathagat — substantial PR. A lot lands cleanly against the spec:
- FR-09 (mandatory
event_id) — no server fallback; rejects missing + non-UUID with 400; PG 23505 collapses to 200 with the existing row. Tests cover all three paths. - FR-10 (source bound to route) — controller stamps
REVIEW_SOURCES.BACKOFFICEfrom the route; rejects body-supplied source mismatches with 400. Route variant matches the per-source design. - Cross-tenant write guard —
accessControlUtil.hasAccess(site)rejects with 403 when caller lacks org access; suggestion's opportunity site-id is verified against the URL site-id. - Server-derived reviewer_id, 8 KB cap with 413, comprehensive secret-pattern scrub (PEM, AWS, GH/GL PATs, Slack, JWT, Bearer, basic-auth-URL, Adobe hosts/emails — covers each pattern in a dedicated test), idempotency tests, OpenAPI updated with
SuggestionReviewschema + per-status responses,?include=reviews,patchesopt-in for raw patches, 503 when feedback store unavailable, graceful empty-reviews on query error. Test coverage is genuinely strong (20+ cases on the controller + a per-pattern coverage test on the scrubber).
Findings
Five Important issues inline. They cluster on two themes:
- Security tightenings the spec was explicit about — markdown sanitisation is denylist-shaped (the file's own JSDoc says "Allowlist-ish") where the spec called for an allowlist sanitiser (nh3 / bleach-equivalent); the
data:URI strip missesdata:image/svg+xml(known XSS vector); per-(reviewer_id, site_id)rate-limit is missing (OpenAPI documents 429 but the controller can't return it). - Correctness gaps —
organization_idfallback would substitute asite_idinto the org column on the unlikely path the fallback fires;state_transitionis free-text passed straight through (no enum validation), so the LA corpus can receive arbitrary strings.
A few Minor observations not worth PR comments — happy to mention on Slack if useful: scrub-hits emitted as a log line not a metric counter (depends on whether SpaceCat's metrics derive from logs), reviewer_id stored as IMS email vs the v4-design's ims_sub wording (spec is internally inconsistent; impl matches the DB-migration comment), Buffer.byteLength size check happens before redactFeedbackContent (correct ordering — protects against giant payloads regardless of HTML content), scrubbed = Object.entries(scrubHits) could be Object.keys(...).length (style).
Assessment
Ready to merge? With fixes.
The security findings are real — the spec was explicit about an allowlist sanitiser, and the missing rate-limit + incomplete data: strip combined open a small but actual XSS-via-rationale surface that ESEs are trusted-but-not-infallible carriers of. The organization_id fallback bug is unlikely to fire in practice but would silently corrupt the corpus if it did. The state_transition enum gap is data-quality.
None of these block the overall design; all are addressable in this PR without restructuring.
Merge prereq from the PR description (carried forward)
The PR body already notes that PR #1696 (spacecat-shared) must publish before this PR can merge — the controller statically imports REVIEW_SOURCES, REVIEW_VERDICTS, REJECTION_CATEGORIES, FEEDBACK_TIERS, verdictToSignal, toReviewView. Acknowledged. ✓
ramboz
left a comment
There was a problem hiding this comment.
Verdict: REQUEST CHANGES
Well-structured, well-tested PR — the controller follows the repo's established auth/validation patterns, idempotency is handled correctly, and test coverage is thorough (24 controller cases + redaction unit tests). But two hard blockers and one cross-repo contract item must be resolved before merge.
Cross-repo note (added during synthesis): two of this agent's flags were cleared by cross-checking the other PRs:
- "should be a
wrpc_RPC" — DB PR 718 intentionally exposes no write RPC and grantspostgrest_writerINSERT; direct insert is the design. Not a divergence.- "is the writer JWT wired?" — confirmed: shared
service/index.js:71setsAuthorization: BearerfromPOSTGREST_API_KEY; existing write controllers (e.g.brands.js) rely on it. Cleared (operational caveat: thePOSTGREST_API_KEYsecret must be present in the Lambda env — already true for current postgres writes).
Blockers
B1 — Dependency pinned to an untrusted personal gist tarball. package.json:88 pins @adobe/spacecat-shared-data-access to https://gist.github.com/tathagat2241/.../adobe-spacecat-shared-data-access-3.79.0.tgz. Supply-chain risk (unversioned, unsigned, single-author, mutable URL); will not pass review/release. Replace with the registry version published by PR 1696. Those six imported symbols (REVIEW_SOURCES, REVIEW_VERDICTS, REJECTION_CATEGORIES, FEEDBACK_TIERS, verdictToSignal, toReviewView) don't exist in any published data-access version today, and are statically imported at module top-level, so a wrong/missing version fails the entire suggestions controller load. Do not merge until 1696 has landed + published.
B2 — reviewer_id stores the IMS user GUID but the contract says "email". src/controllers/suggestions.js (~line 2871): const reviewerId = profile?.email ?? null;. For IMS callers, profile.email is the IMS user_id GUID, not a mailbox — documented elsewhere in this same file. Yet schemas.yaml and the PR both say "IMS email." This mislabels identity for the entire corpus/export/training pipeline. Either use the real address (profile.trial_email) or rename/redocument the field as an opaque IMS user id. The unit test uses a fake ese@adobe.com so it never catches the GUID behavior.
Concerns
C1 — Direct table INSERT vs RPC. (Resolved during synthesis — see cross-repo note above; DB uses writer INSERT grant, no write RPC.)
C2 — Writer auth / JWT to PostgREST. (Resolved — wired via shared, used by existing controllers.)
C3 — The markdown "sanitizer" is regex-based and bypassable. src/support/feedback-redaction.js:sanitizeMarkdown. Concrete gaps: event-handler stripper requires whitespace (/\son\w+/) so <svg/onload=...> is not stripped; .replace(/javascript:/gi,'') is text-deletion not neutralization ([x](javascript:evil()) → [x](evil())); only script|style|iframe removed (<object>/<embed>/<a href> pass). Acceptable only as documented defense-in-depth if detail_markdown is never rendered as trusted HTML downstream. Since it feeds an S3 corpus + possibly a UI via ?include=reviews, confirm the render path escapes; if anything renders it as HTML this is stored XSS → blocker. Recommend sanitize-html/DOMPurify or plain text, and document the "escape at render" assumption.
C4 — organization_id fallback is wrong. ~line 2901: organization_id: site.getOrganizationId?.() ?? opportunity.getSiteId?.(). The fallback is a site id, not an org id — silently corrupts the column if getOrganizationId is ever nullish. Use null (or fail).
C5 — Secret-scrub can corrupt legitimate patch data and isn't reversible. scrubDeep rewrites previous_fix/edited_fix. For a corpus that captures code patches, false positives (a base64 blob matching jwt, any sk-... string) silently mangle the stored fix, degrading training data with only a count for audit. Consider scrub metadata or gating scrub to detail_markdown (free text) only.
C6 — ?include=reviews has no pagination/limit. select('*').order('event_time') — a suggestion with many reviews returns all rows, and ?include=...,patches pulls full patch JSONB. Add a .limit(N) and confirm a (suggestion_id, event_time) index DB-side (718 provides idx_feedback_event_suggestion). Query parsing itself is correct (GET reads include from query string, no injection); no N+1 for single-suggestion GET.
Nits
- N1 — Read path returns
[]on missing postgrest client while write path returns 503. Fine but worth a comment that it's intentional. - N2 —
detailMarkdown8 KB cap usesBuffer.byteLength > 8192(bytes) but OpenAPI saysmaxLength: 8192(chars). Multi-byte UTF-8 skew. - N3 —
state_transitionstored free-form with no whitelist. - N4 —
opportunity_type: opportunity.getType?.() ?? null— confirm DB tolerates arbitrary type strings (it does; unconstrained). - N5 — Test
reviewer_id: 'ese@adobe.com'reinforces the B2 mislabel; after the fix, add a GUID-shaped test.
Strengths
- Authorization matches the repo baseline exactly (UUID validation →
Site.findById→accessControlUtil.hasAccess→ 403); route added torouteRequiredCapabilitieswithCAP_SUGGESTION_WRITE. sourceroute-bound (rejects body spoofing, FR-10);event_ididempotency via PG23505→ 200-with-existing-row, tested (FR-09).- Enum validation uses the shared constants +
verdictToSignal— no hand-rolled lists. Exactly the cross-repo discipline needed (once the dep is fixed). - Raw patches not echoed by default;
?include=...,patchesis explicit opt-in. deriveFeedbackTierand reviews fetch fail soft (never throw).- Genuinely thorough tests (happy path, all validation failures, 403/404 ownership, idempotency, 500/503, tier derivation, scrub metric, all five
?includepermutations). OpenAPI complete.
|
B2 — kept profile.email (it's the stable IMS user id, good for reviewer-continuity) but relabeled it everywhere as an opaque IMS user identifier, not a mailbox (controller comment, schemas.yaml, feedback_event.reviewer_id column comment) + added a GUID-shaped test. |
What
POST /sites/:siteId/opportunities/:opportunityId/suggestions/:suggestionId/backoffice-reviews — captures an ESE verdict. Mandatory client-supplied event_id (idempotent via PG 23505→200), IMS auth + org check, verdict→signal, server-derived reviewer, tier from entitlement, secret-scrub + markdown sanitise. Raw patches not echoed.
⚠️ Merge checklist (required — do before merge)
Extends the suggestion fetch with ?include=reviews (read-time composition; ,patches opt-in).
New feedback-redaction util + OpenAPI (SuggestionReview schema) + tests.
not done
Bump @adobe/spacecat-shared-data-access to the version published by PR 2 (it currently pins 3.75.1, which lacks the new exports). The capture handler statically imports them; without the bump the controller fails to load.
Testing
Syntax + redaction logic + OpenAPI YAML validated. Controller + util unit tests added (mocha/coverage run in CI).
Please ensure your pull request adheres to the following guidelines:
describe here the problem you're solving.
If the PR is changing the API specification:
yet. Ideally, return a 501 status code with a message explaining the feature is not implemented yet.
If the PR is changing the API implementation or an entity exposed through the API:
If the PR is introducing a new audit type:
Related Issues
https://jira.corp.adobe.com/browse/SITES-39001
Thanks for contributing!