Skip to content

feat: add backoffice review capture endpoint + reviews composition (SITES-43974)#2652

Open
tathagat2241 wants to merge 5 commits into
mainfrom
SITES-43974-feedback-event-pipeline
Open

feat: add backoffice review capture endpoint + reviews composition (SITES-43974)#2652
tathagat2241 wants to merge 5 commits into
mainfrom
SITES-43974-feedback-event-pipeline

Conversation

@tathagat2241

Copy link
Copy Markdown
Contributor

What

POST /sites/:siteId/opportunities/:opportunityId/suggestions/:suggestionId/backoffice-reviews — captures an ESE verdict. Mandatory client-supplied event_id (idempotent via PG 23505→200), IMS auth + org check, verdict→signal, server-derived reviewer, tier from entitlement, secret-scrub + markdown sanitise. Raw patches not echoed.
Extends the suggestion fetch with ?include=reviews (read-time composition; ,patches opt-in).
New feedback-redaction util + OpenAPI (SuggestionReview schema) + tests.
⚠️ Merge checklist (required — do before merge)

not done
Bump @adobe/spacecat-shared-data-access to the version published by PR 2 (it currently pins 3.75.1, which lacks the new exports). The capture handler statically imports them; without the bump the controller fails to load.
Testing

Syntax + redaction logic + OpenAPI YAML validated. Controller + util unit tests added (mocha/coverage run in CI).
Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description. Or if there's no issue created, make sure you
    describe here the problem you're solving.
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

If the PR is changing the API specification:

  • make sure you add a "Not implemented yet" note the endpoint description, if the implementation is not ready
    yet. Ideally, return a 501 status code with a message explaining the feature is not implemented yet.
  • make sure you add at least one example of the request and response.

If the PR is changing the API implementation or an entity exposed through the API:

  • make sure you update the API specification and the examples to reflect the changes.

If the PR is introducing a new audit type:

  • make sure you update the API specification with the type, schema of the audit result and an example

Related Issues

https://jira.corp.adobe.com/browse/SITES-39001

Thanks for contributing!

@github-actions

Copy link
Copy Markdown

This PR will trigger a minor release when merged.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@absarasw absarasw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tathagat — substantial PR. A lot lands cleanly against the spec:

  • FR-09 (mandatory event_id) — no server fallback; rejects missing + non-UUID with 400; PG 23505 collapses to 200 with the existing row. Tests cover all three paths.
  • FR-10 (source bound to route) — controller stamps REVIEW_SOURCES.BACKOFFICE from the route; rejects body-supplied source mismatches with 400. Route variant matches the per-source design.
  • Cross-tenant write guardaccessControlUtil.hasAccess(site) rejects with 403 when caller lacks org access; suggestion's opportunity site-id is verified against the URL site-id.
  • Server-derived reviewer_id, 8 KB cap with 413, comprehensive secret-pattern scrub (PEM, AWS, GH/GL PATs, Slack, JWT, Bearer, basic-auth-URL, Adobe hosts/emails — covers each pattern in a dedicated test), idempotency tests, OpenAPI updated with SuggestionReview schema + per-status responses, ?include=reviews,patches opt-in for raw patches, 503 when feedback store unavailable, graceful empty-reviews on query error. Test coverage is genuinely strong (20+ cases on the controller + a per-pattern coverage test on the scrubber).

Findings

Five Important issues inline. They cluster on two themes:

  • Security tightenings the spec was explicit about — markdown sanitisation is denylist-shaped (the file's own JSDoc says "Allowlist-ish") where the spec called for an allowlist sanitiser (nh3 / bleach-equivalent); the data: URI strip misses data:image/svg+xml (known XSS vector); per-(reviewer_id, site_id) rate-limit is missing (OpenAPI documents 429 but the controller can't return it).
  • Correctness gapsorganization_id fallback would substitute a site_id into the org column on the unlikely path the fallback fires; state_transition is free-text passed straight through (no enum validation), so the LA corpus can receive arbitrary strings.

A few Minor observations not worth PR comments — happy to mention on Slack if useful: scrub-hits emitted as a log line not a metric counter (depends on whether SpaceCat's metrics derive from logs), reviewer_id stored as IMS email vs the v4-design's ims_sub wording (spec is internally inconsistent; impl matches the DB-migration comment), Buffer.byteLength size check happens before redactFeedbackContent (correct ordering — protects against giant payloads regardless of HTML content), scrubbed = Object.entries(scrubHits) could be Object.keys(...).length (style).

Assessment

Ready to merge? With fixes.

The security findings are real — the spec was explicit about an allowlist sanitiser, and the missing rate-limit + incomplete data: strip combined open a small but actual XSS-via-rationale surface that ESEs are trusted-but-not-infallible carriers of. The organization_id fallback bug is unlikely to fire in practice but would silently corrupt the corpus if it did. The state_transition enum gap is data-quality.

None of these block the overall design; all are addressable in this PR without restructuring.

Merge prereq from the PR description (carried forward)

The PR body already notes that PR #1696 (spacecat-shared) must publish before this PR can merge — the controller statically imports REVIEW_SOURCES, REVIEW_VERDICTS, REJECTION_CATEGORIES, FEEDBACK_TIERS, verdictToSignal, toReviewView. Acknowledged. ✓

Comment thread src/support/feedback-redaction.js
Comment thread src/support/feedback-redaction.js Outdated
Comment thread src/controllers/suggestions.js
Comment thread src/controllers/suggestions.js Outdated
Comment thread src/controllers/suggestions.js

@ramboz ramboz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: REQUEST CHANGES

Well-structured, well-tested PR — the controller follows the repo's established auth/validation patterns, idempotency is handled correctly, and test coverage is thorough (24 controller cases + redaction unit tests). But two hard blockers and one cross-repo contract item must be resolved before merge.

Cross-repo note (added during synthesis): two of this agent's flags were cleared by cross-checking the other PRs:

  • "should be a wrpc_ RPC" — DB PR 718 intentionally exposes no write RPC and grants postgrest_writer INSERT; direct insert is the design. Not a divergence.
  • "is the writer JWT wired?" — confirmed: shared service/index.js:71 sets Authorization: Bearer from POSTGREST_API_KEY; existing write controllers (e.g. brands.js) rely on it. Cleared (operational caveat: the POSTGREST_API_KEY secret must be present in the Lambda env — already true for current postgres writes).

Blockers

B1 — Dependency pinned to an untrusted personal gist tarball. package.json:88 pins @adobe/spacecat-shared-data-access to https://gist.github.com/tathagat2241/.../adobe-spacecat-shared-data-access-3.79.0.tgz. Supply-chain risk (unversioned, unsigned, single-author, mutable URL); will not pass review/release. Replace with the registry version published by PR 1696. Those six imported symbols (REVIEW_SOURCES, REVIEW_VERDICTS, REJECTION_CATEGORIES, FEEDBACK_TIERS, verdictToSignal, toReviewView) don't exist in any published data-access version today, and are statically imported at module top-level, so a wrong/missing version fails the entire suggestions controller load. Do not merge until 1696 has landed + published.

B2 — reviewer_id stores the IMS user GUID but the contract says "email". src/controllers/suggestions.js (~line 2871): const reviewerId = profile?.email ?? null;. For IMS callers, profile.email is the IMS user_id GUID, not a mailbox — documented elsewhere in this same file. Yet schemas.yaml and the PR both say "IMS email." This mislabels identity for the entire corpus/export/training pipeline. Either use the real address (profile.trial_email) or rename/redocument the field as an opaque IMS user id. The unit test uses a fake ese@adobe.com so it never catches the GUID behavior.

Concerns

C1 — Direct table INSERT vs RPC. (Resolved during synthesis — see cross-repo note above; DB uses writer INSERT grant, no write RPC.)

C2 — Writer auth / JWT to PostgREST. (Resolved — wired via shared, used by existing controllers.)

C3 — The markdown "sanitizer" is regex-based and bypassable. src/support/feedback-redaction.js:sanitizeMarkdown. Concrete gaps: event-handler stripper requires whitespace (/\son\w+/) so <svg/onload=...> is not stripped; .replace(/javascript:/gi,'') is text-deletion not neutralization ([x](javascript:evil())[x](evil())); only script|style|iframe removed (<object>/<embed>/<a href> pass). Acceptable only as documented defense-in-depth if detail_markdown is never rendered as trusted HTML downstream. Since it feeds an S3 corpus + possibly a UI via ?include=reviews, confirm the render path escapes; if anything renders it as HTML this is stored XSS → blocker. Recommend sanitize-html/DOMPurify or plain text, and document the "escape at render" assumption.

C4 — organization_id fallback is wrong. ~line 2901: organization_id: site.getOrganizationId?.() ?? opportunity.getSiteId?.(). The fallback is a site id, not an org id — silently corrupts the column if getOrganizationId is ever nullish. Use null (or fail).

C5 — Secret-scrub can corrupt legitimate patch data and isn't reversible. scrubDeep rewrites previous_fix/edited_fix. For a corpus that captures code patches, false positives (a base64 blob matching jwt, any sk-... string) silently mangle the stored fix, degrading training data with only a count for audit. Consider scrub metadata or gating scrub to detail_markdown (free text) only.

C6 — ?include=reviews has no pagination/limit. select('*').order('event_time') — a suggestion with many reviews returns all rows, and ?include=...,patches pulls full patch JSONB. Add a .limit(N) and confirm a (suggestion_id, event_time) index DB-side (718 provides idx_feedback_event_suggestion). Query parsing itself is correct (GET reads include from query string, no injection); no N+1 for single-suggestion GET.

Nits

  • N1 — Read path returns [] on missing postgrest client while write path returns 503. Fine but worth a comment that it's intentional.
  • N2 — detailMarkdown 8 KB cap uses Buffer.byteLength > 8192 (bytes) but OpenAPI says maxLength: 8192 (chars). Multi-byte UTF-8 skew.
  • N3 — state_transition stored free-form with no whitelist.
  • N4 — opportunity_type: opportunity.getType?.() ?? null — confirm DB tolerates arbitrary type strings (it does; unconstrained).
  • N5 — Test reviewer_id: 'ese@adobe.com' reinforces the B2 mislabel; after the fix, add a GUID-shaped test.

Strengths

  • Authorization matches the repo baseline exactly (UUID validation → Site.findByIdaccessControlUtil.hasAccess → 403); route added to routeRequiredCapabilities with CAP_SUGGESTION_WRITE.
  • source route-bound (rejects body spoofing, FR-10); event_id idempotency via PG 23505 → 200-with-existing-row, tested (FR-09).
  • Enum validation uses the shared constants + verdictToSignal — no hand-rolled lists. Exactly the cross-repo discipline needed (once the dep is fixed).
  • Raw patches not echoed by default; ?include=...,patches is explicit opt-in.
  • deriveFeedbackTier and reviews fetch fail soft (never throw).
  • Genuinely thorough tests (happy path, all validation failures, 403/404 ownership, idempotency, 500/503, tier derivation, scrub metric, all five ?include permutations). OpenAPI complete.

@tathagat2241

Copy link
Copy Markdown
Contributor Author

B2 — kept profile.email (it's the stable IMS user id, good for reviewer-continuity) but relabeled it everywhere as an opaque IMS user identifier, not a mailbox (controller comment, schemas.yaml, feedback_event.reviewer_id column comment) + added a GUID-shaped test.
C3 — added / to the strip list; kept the whitespace-anchored event-handler match (broadening to / mangles legit markdown URLs); documented the allowlist-sanitiser follow-up and the render-escape assumption (Backoffice uses react-markdown without rehype-raw).
C6 — added .limit(100) to ?include=reviews (backed by idx_feedback_event_suggestion).
C5 — kept scrubbing all three fields per spec §11.3; feedback_capture.scrub_hit_total is the audit trail.
C4 / Comment 3 (org_id), state_transition enum, data:svg, 429 were addressed earlier in the PR.
B1 — acknowledged; the gist pin is temporary and will be swapped for the registry version once #1696 publishes (documented merge prerequisite).
N2 (byte vs char cap) — minor; can align the OpenAPI note if you'd like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants