Skip to content

feat(edge-optimize): source CloudFront control-plane from tokowaka-client#2682

Open
ABHA61 wants to merge 90 commits into
mainfrom
feat/edge-optimize-cf-automation-simplified
Open

feat(edge-optimize): source CloudFront control-plane from tokowaka-client#2682
ABHA61 wants to merge 90 commits into
mainfrom
feat/edge-optimize-cf-automation-simplified

Conversation

@ABHA61

@ABHA61 ABHA61 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Warning

DO NOT MERGE — review only.

What

Removes the local CloudFront "Optimize at Edge" control-plane and sources it
from @adobe/spacecat-shared-tokowaka-client instead.

Why

The control-plane has been moved into the shared client (adobe/spacecat-shared#1722)
so it can be reused and the llmo controller stays thin.

Changes

  • Deleted src/support/edge-optimize.js (1722 LOC) and
    src/support/edge-optimize-edge-code.js — moved to the client verbatim.
  • Deleted test/support/edge-optimize.test.js — coverage moved to the
    client (100%).
  • src/controllers/llmo/llmo.js — imports the 12 control-plane functions +
    calculateForwardedHost from the client; no behavioural change.
  • test/controllers/llmo/llmo.test.js — adapted mocks to the client module.
  • package.json / package-lock.json — dependency repointed to the gist test
    build (temporary, see warning above).

Net

−3900 / +40 in source; behaviour unchanged. 537 llmo controller tests pass;
lint clean; the migrated import graph bundles cleanly.

🤖 Generated with Claude Code

Akash Bhardwaj and others added 30 commits June 19, 2026 01:03
POST /sites/:siteId/llmo/edge-optimize-bootstrap-url returns a CloudFormation
quick-create URL with a server-side presigned template URL, so a customer can
create the cross-account Edge Optimize connector role in their own AWS account
without a public S3 bucket and without any S3 access of their own. Presigning is
done with the service execution role.

Includes route + capability registration, OpenAPI spec, and unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The getRouteHandlers "segregates static and dynamic routes" test asserts the
exact set of routes; add the new dynamic route to the expected list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardcode EDGE_OPTIMIZE_TEMPLATE_BUCKET and EDGE_OPTIMIZE_TRUSTED_PRINCIPAL_ARN
fallbacks so the dev/ci branch deploy returns a quick-create URL before those
env vars are wired into Vault/secrets. Marked TEMPORARY / TODO REMOVE —
revert before merge/prod (values must come from env config).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p-url' into feat/llmo-edge-optimize-bootstrap-url
Use llmo-edgeoptimize-cf-template (in 682033462621, where the service deploys
and signs) so the dev role reads it same-account; stage customer fetches via
the presigned URL. Still TEMPORARY / TODO REMOVE before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p-url' into feat/llmo-edge-optimize-bootstrap-url
…ault

The TEMPORARY hardcoded EDGE_OPTIMIZE_TEMPLATE_BUCKET default makes the
bucket always set, so the 'not configured' guard can no longer be hit via
an empty env. Exercise the same guard via the missing S3 client instead.
TODO: restore the empty-bucket variant when the temp default is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the default lifetime of the bootstrap template presigned URL from
1h to 15m. The customer opens the quick-create link immediately, so a
shorter TTL shrinks the leak window. A leaked URL only grants GetObject on
the single template object until expiry; still override via
EDGE_OPTIMIZE_PRESIGN_TTL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(Phase 2)

Backend for the CloudFront 'Deploy routing' wizard's read steps. The
api-service assumes the customer's cross-account connector role server-side
(no AWS creds in the browser):

- New src/support/edge-optimize.js: assumeConnectorRole (STS AssumeRole with
  the per-session external ID) + listCloudFrontDistributions.
- POST /sites/:siteId/llmo/edge-optimize/connect - verifies the role is
  assumable (returns { connected } so the UI can poll while the customer
  creates the role); POST .../edge-optimize/distributions - lists the
  account's CloudFront distributions. Both gated by site access + LLMO admin
  and added to INTERNAL_ROUTES (not exposed to S2S).
- Adds @aws-sdk/client-sts and @aws-sdk/client-cloudfront.
- Unit tests for the support module (mocked SDK) and both handlers; route +
  capability lists updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the already-shipped connect and distributions endpoints plus the
new read-only prerequisites, origins, and behaviors endpoints for the
CloudFront "Deploy routing" wizard. Adds shared connector/distribution
request schemas and per-endpoint response definitions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ints (Phase 2)

Add three read-only CloudFront wizard endpoints, mirroring the existing
connect/distributions handlers (12-digit accountId + externalId validation,
site/access/LLMO-admin gate, assumed-role calls, badRequest on failure):

- POST /sites/:siteId/llmo/edge-optimize/prerequisites -> checkEdgeOptimizePrerequisites
  reports connectorRole + cloudFrontRead checks (ok/false + detail, never 500)
- POST /sites/:siteId/llmo/edge-optimize/origins -> getEdgeOptimizeOrigins
  returns origins + hasEdgeOptimizeOrigin detection
- POST /sites/:siteId/llmo/edge-optimize/behaviors -> getEdgeOptimizeBehaviors
  returns default + ordered cache behaviors

Adds getDistributionConfig() support fn (GetDistributionConfigCommand) and
unit tests for the support fn, controller handlers, and route registration.
All three routes added to INTERNAL_ROUTES (admin/IMS-only, not S2S).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sync per-step endpoints that assume the customer connector role server-side
and perform one CloudFront write each (no AWS creds in the browser):

- create-origin: add the EdgeOptimize_Origin (env EDGE_OPTIMIZE_ORIGIN_DOMAIN,
  default dev.edgeoptimize.net) via UpdateDistribution (ETag IfMatch).
- create-function: create/update + publish the edgeoptimize-routing CF Function
  (bot-routing JS ported from the standalone wizard).
- apply-cache: add EO headers to the behavior's custom cache policy (common
  path; legacy ForwardedValues / managed-policy clone left as TODO).
- create-lambda: create exec role (bounded IAM-propagation retry) + the
  edgeoptimize-origin Lambda@Edge and publish a version.
- apply-associations: wire the function (viewer-request) + Lambda (origin-
  request/response) onto the selected behavior.
- verify: server-side bot-vs-human probe; passed requires x-edgeoptimize-request-id
  (x-edgeoptimize-fo = failover, not success).

Adds @aws-sdk/client-iam + @aws-sdk/client-lambda. All AWS ops use ETag
read-modify-write. Embedded function/Lambda code ported verbatim from the
connect-aws-wizard; Lambda code inlined per the helix-deploy bundling rule.
Mocked-SDK unit tests for all 6 support fns + handlers; routes/capabilities
+ OpenAPI updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-wizard' into feat/llmo-edge-optimize-cloudfront-wizard
The Default(*) behavior commonly uses an AWS-managed cache policy, which
cannot be updated (UpdateCachePolicy -> 'update is not allowed for this
policy'). applyEdgeOptimizeCacheHeaders now ports the full standalone-wizard
logic with all three scenarios:
- legacy (ForwardedValues, no CachePolicyId): add EO headers there + MinTTL 0
- custom policy: UpdateCachePolicy to add EO headers (existing path)
- managed policy: CLONE into a custom edgeoptimize-cache policy with the EO
  headers, then repoint the behavior to it (idempotent by name)

Adds GetCachePolicy/ListCachePolicies/CreateCachePolicy. Support tests
rewritten to dispatch by command name and cover all three scenarios.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…endpoint

Fixes the Lambda step's three failure modes (timeout, 'update in progress',
no existence check):
- waitForLambdaIdle now gates on State Active AND LastUpdateStatus !=
  InProgress (was State only), so we never hit ResourceConflictException
  ('update is in progress') on a retry after a slow/timed-out first call.
- createEdgeOptimizeLambda is fully idempotent: if a published numbered
  version already matches the current code, reuse it (no update/publish);
  otherwise update + publish. So a retry after a CDN first-byte timeout
  returns immediately instead of conflicting.
- New read-only POST /sites/:siteId/llmo/edge-optimize/lambda-status
  (getEdgeOptimizeLambdaStatus) so the wizard can detect on entry and poll
  whether the function already exists with a published version.

Support tests rewritten to dispatch by command name; status tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dly)

create-lambda no longer blocks on a fresh function becoming Active (which
exceeded the CDN first-byte timeout -> 503). It now ensures the role + kicks
off the function create and returns { status: 'provisioning' | 'ready' }
immediately; the UI polls until a published version exists. Also:
- buildLambdaZip uses a fixed timestamp so CodeSha256 is deterministic
  (no version churn).
- lambda-status now reports roleExists + a ready flag (role is created
  synchronously by the create ack) so the wizard can show role + function
  state and check on entry.
- Removed the in-request waitForLambdaIdle/UpdateFunctionCode blocking path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
edge-optimize.test.js re-ran esmock() in beforeEach, re-instantiating the
mocked AWS SDK module graph on every test. As this file grew this session it
accumulated enough memory to push the 12.5k-test suite past the 4GB V8 heap
limit (worker OOM -> '1 failing: Worker terminated' + lost-worker coverage
dropping below the 90% gate). Move esmock to a single before() hook and reset
only the send stubs per test. Suite run time for this file drops from ~4min
to ~7s and the leak is gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-wizard' into feat/llmo-edge-optimize-cloudfront-wizard
The create-origin step created the EdgeOptimize_Origin without its custom
headers, so the routing function's request could not authenticate to Edge
Optimize or resolve the customer host - Verify never returned an
x-edgeoptimize-request-id.

- createEdgeOptimizeOrigin now sets x-edgeoptimize-api-key (site EO API key),
  x-forwarded-host (customer host), and optional x-edgeoptimize-fetcher-key,
  mirroring the standalone wizard + CloudFormation installer.
- Self-heals: an origin created header-less by the earlier version is patched
  in place on re-run (returns updated: true).
- Handler derives both server-side - api key from the tokowaka metaconfig
  (apiKeys[0]), forwarded host from calculateForwardedHost(site.baseURL) - so
  no new UI input; gateEdgeOptimizeWizard now returns the site to avoid a
  second fetch.
- Verify: documented the prod TODO (probe the customer's real domain, not the
  *.cloudfront.net domain) - behavior unchanged for dev testing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tighten the dev-only default for EDGE_OPTIMIZE_TRUSTED_PRINCIPAL_ARN from the
whole dev account (arn:aws:iam::682033462621:root) to the exact assuming
identity - the spacecat-api-service Lambda execution role
(arn:aws:iam::682033462621:role/spacecat-role-lambda-generic) - shrinking the
blast radius of the connector-role trust. No AWS-side change needed; the
assuming identity is already that role. Prod must still set this via env to the
prod execution role ARN (no in-code default) - tracked in the punch list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Refresh the pre-release @adobe/spacecat-shared-tokowaka-client tarball to the
latest #1722 build (verify-probe AbortController timeout, distribution
pagination + GetDistribution propagation lookup, cdn/cloudfront folder move,
buildCloudfrontFunctionCode rename) so #2682 can be tested on dev against the
current shared changes before #1722 merges.

Co-authored-by: Cursor <cursoragent@cursor.com>
ABHA61 and others added 2 commits June 26, 2026 12:58
…CloudFront

Mirror the existing Cloudflare onboarding convention (listLlmoCloudflareZones /
deployLlmoCloudFrontWorker) so the CloudFront "Optimize at Edge" wizard no longer
collides with the older product-level Edge Optimize feature or the bare "cf" abbrev.

- Rename the 15 wizard operationIds + exported handlers + the permissions handler
  (getEdgeOptimizePermissions -> getLlmoCloudFrontPermissions) and the private
  helpers (gate/parse/resolve) in src/controllers/llmo/llmo.js, plus the route-map
  wiring in src/routes/index.js and the gate comment in facs-capabilities.js.
- Rename OpenAPI path-item anchors site-llmo-edge-optimize-* -> site-llmo-cloudfront-*,
  the operationIds, the shared request schemas edge-optimize-*-request ->
  cloudfront-*-request, and response field cfFunctionArn -> cloudFrontFunctionArn.
- Update unit tests (describe names + controller-method invocations).

HTTP route paths (/sites/:siteId/llmo/onboarding/cloudfront/*) are unchanged. The
pre-existing Edge Optimize symbols (checkEdgeOptimizeStatus, enableEdgeOptimize,
probeEdgeOptimizeConnectivity, /llmo/edge-optimize-* routes) and the tokowaka-client
import names are intentionally left untouched.

Co-authored-by: Cursor <cursoragent@cursor.com>
…FunctionArn

Reverts the broad operationId/handler/helper + OpenAPI key renames from 7c89f11
(those were out of the agreed scope). Keeps only the approved bare-cf fix
cfFunctionArn -> cloudFrontFunctionArn in the OpenAPI response shape, and
re-pins the tokowaka-client gist to revision c245747 (which carries the same
field rename in spacecat-shared #1722) so dev testing matches.

Co-authored-by: Cursor <cursoragent@cursor.com>
ABHA61 added 2 commits June 26, 2026 13:17
…-automation-simplified

Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	docs/index.html
@ABHA61 ABHA61 requested a review from MysticatBot June 26, 2026 07:51
@MysticatBot

Copy link
Copy Markdown

Mysticat review failed: Claude CLI crashed (exit 1): stderr= stdout={"type":"result","subtype":"error_max_budget_usd","duration_ms":8,"duration_api_ms":1556945,"is_error":true,"num_turns":1,"stop_reason":null,"session_id":"40b42d9e-c031-4e6f-9e75-5ef5cde0a34a","total_cost_usd":10.474445650000005,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_i

@ABHA61

ABHA61 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Status update on the review feedback (code at 5c398f46):

Fixed

  • Route namespace — all 16 wizard endpoints moved under /sites/{siteId}/llmo/onboarding/cloudfront/*.
  • Internal errors → 500 — wizard handlers' catch blocks now return internalServerError (generic client message, full error logged server-side); up-front validation still returns 400.
  • lambdaVersionArn validation — enforced with ^arn:aws:lambda:us-east-1:\d{12}:function:[A-Za-z0-9_-]+:\d+$ and the ARN account segment must equal the request accountId.
  • Duplicated credential checks — centralised into parseEoCredentials(...).
  • POST-for-reads rationale — documented in the OpenAPI descriptions ("read-only").
  • Error sanitisation — unexpected errors return a generic message; raw AWS details (ARNs/account ids) are logged server-side only.

Not required (by design)

  • externalId not bound to site/session — confused-deputy protection is provided by a per-session, randomly generated externalId created at connector-role setup and reused for the assume; a new session generates a new id, so there is no static secret to bind. Can add an explicit server-side site↔externalId record if you'd prefer defence-in-depth.
  • resolveEoTarget error-as-value — kept intentionally; the { error } Response is returned immediately by the caller, so it doesn't actually interleave with the try/catch flow.

Pending before merge (intentional)

Deferred (optional follow-up)

  • Split the ~3200-line llmo.js — planned as a follow-up to extract the CloudFront wizard into src/controllers/llmo/llmo-cloudfront.js (parallel to llmo-cloudflare.js); kept out of this PR to limit churn.
  • Full HOF/middleware for the gate/validate/assume boilerplate — partially done via parseEoCredentials; a thin wrapper can follow.
  • js-yaml lazy import — minor; can switch the permissions endpoint to a dynamic import.

…ucture

Move the 16 CloudFront "Optimize at Edge" onboarding-wizard handlers out of
llmo.js into a dedicated LlmoCloudFrontController with short method names,
mirroring LlmoCloudflareController. Routes move from
/llmo/onboarding/cloudfront/* to /llmo/cdn-onboard/cloudfront/* and OpenAPI
operationIds become *LlmoCloudFront*. Pre-existing product-level edge-optimize
endpoints and the tokowaka-client import names are left untouched.

Co-authored-by: Cursor <cursoragent@cursor.com>
@ABHA61 ABHA61 requested review from MysticatBot and removed request for MysticatBot June 26, 2026 10:51

@MysticatBot MysticatBot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ABHA61,

⚠ Degraded review - no spec document was found for this change (searched the PR links, the touched repos' docs, the architecture/guidelines docs, and linked Jira). This review covers code-level quality but could not validate the change against an agreed design, so confidence is reduced. Add a spec link (PR template section 4) and re-request review for a full-confidence pass.

Verdict: Request changes - error handling gap missed from last round, plus unsanitized code-generation input needs defense-in-depth.
Complexity: HIGH - large diff; API surface + dependency change + cross-account AWS operations.
Changes: Extracts the CloudFront "Optimize at Edge" control-plane into a dedicated controller sourced from the shared tokowaka client, adding 16 wizard endpoints (14 files).
Note: Recommend a human read before merge - this change adds a new API surface (docs/openapi/llmo-api.yaml) with cross-account AWS operations. The bot review is a complement to, not a replacement for, a human read here.

Note: CI build is pending. The PR is explicitly marked DO NOT MERGE pending adobe/spacecat-shared#1722 and the gist dependency replacement.

Must fix before merge

  1. [Important] connect outer catch returns 400 with raw error message (inconsistent with all other handlers) - src/controllers/llmo/llmo-cloudfront.js:226 (details inline)
  2. [Important] targetedPaths passed unsanitized to CloudFront Function code-generation - src/controllers/llmo/llmo-cloudfront.js:497 (details inline)
Non-blocking (3): minor issues and suggestions
  • nit: getPermissions inner catch returns badRequest for an S3 read failure (server-side problem, should be internalServerError) - src/controllers/llmo/llmo-cloudfront.js:770
  • nit: No format validation on distributionId - CloudFront IDs match ^[A-Z0-9]{12,14}$; adding a regex in parseEoCredentials would reject garbage early - src/controllers/llmo/llmo-cloudfront.js:181
  • suggestion: Add a test case that passes an actual targetedPaths array to createRoutingFunction and verifies it is forwarded to the downstream function (currently only the null case is tested) - test/controllers/llmo/llmo-cloudfront.test.js:1001

Previously flagged, now resolved

  • Controller extracted into dedicated llmo-cloudfront.js (968 lines vs prior 3200-line monolith)
  • Catch blocks return internalServerError with generic messages (except connect, see finding 1)
  • lambdaVersionArn validated with regex + account-ID cross-check
  • Routes moved under /sites/{siteId}/llmo/cdn-onboard/cloudfront/* namespace
  • Credential checks centralized in parseEoCredentials

Skill: pr-review | Model: us.anthropic.claude-opus-4-6-v1[1m] | Duration: 3m 12s | Cost: $7.34 | Commit: e29e782e3a56b26674e3452d68164da785e11024
If this code review was useful, please react with 👍. Otherwise, react with 👎.

Comment thread src/controllers/llmo/llmo-cloudfront.js Outdated
}
} catch (error) {
log.error(`Failed to connect edge optimize role for site ${siteId}:`, error);
return badRequest(cleanupHeaderValue(error.message));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): The outer catch in connect returns badRequest(cleanupHeaderValue(error.message)) while every other handler returns internalServerError(...) with a generic message. This leaks internal error details (DynamoDB table names, connection info) and returns 400 (implying client fault) for server-side failures.

This was supposedly addressed from the prior review (finding: broad catch-all returns 400) but this one handler was missed.

Fix:

} catch (error) {
  log.error(`Failed to connect edge optimize role for site ${siteId}:`, error);
  return internalServerError('Failed to connect the edge optimize role, please try again');
}

const { Site } = dataAccess;
const targetedPaths = Array.isArray(context.data?.targetedPaths)
? context.data.targetedPaths
: null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): targetedPaths is validated only as "is it an array or null" and then forwarded directly to createEdgeOptimizeRoutingFunction. If the downstream function interpolates these values into CloudFront Function code (JavaScript executed at the CDN edge), a caller could inject arbitrary code.

Even if the shared client validates, defense in depth is warranted for inputs that flow into code generation.

Fix: Add boundary validation before passing downstream:

if (targetedPaths !== null) {
  if (targetedPaths.length > 50) {
    return badRequest('targetedPaths must not exceed 50 entries');
  }
  if (!targetedPaths.every((p) => typeof p === 'string' && p.length > 0 && p.length <= 256
    && /^[a-zA-Z0-9/_.*-]+$/.test(p))) {
    return badRequest('Each targetedPaths entry must be a non-empty path pattern (max 256 chars, alphanumeric/slashes/wildcards only)');
  }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is useful defense-in-depth, but we are not taking it as a current release blocker. The FE wizard does not expose arbitrary targeted paths today, and the shared code serializes the array into generated Function code rather than concatenating raw path text. I added this to the CloudFront punch list as a P1 follow-up: add controller-level targetedPaths limits before passing downstream (max entries, max length, accepted path characters).

@MysticatBot MysticatBot added the needs-human-review AI reviewer recommends a human read before merge label Jun 26, 2026
@ABHA61

ABHA61 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Review feedback update (latest pushed code 9b974316)

Fixed

  • Route namespace: all CloudFront wizard routes and OpenAPI paths are now under /sites/{siteId}/llmo/cdn-onboard/cloudfront/*, aligned with the existing Cloudflare convention.
  • connect outer catch: now logs the real error server-side and returns internalServerError("Failed to connect the edge optimize role, please try again") instead of returning 400 with the raw error message.
  • Shared-client refresh: api-service now consumes the rebuilt shared tarball and imports the AWS-style shared aliases.

Deferred / asking for next release

  • rest pending items seems p1 to me, Can we take this in the next release .

Focused verification after rebase: llmo-cloudfront.test.js = 165 passing; eslint on the CloudFront controller/test files passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed Reviewed by AI complexity:high AI-assessed PR complexity: HIGH needs-human-review AI reviewer recommends a human read before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants