Skip to content

feat(llmo): CloudFront onboarding — Option A (Adobe-managed wizard)#2679

Open
ABHA61 wants to merge 72 commits into
mainfrom
feat/llmo-cloudfront-onboarding-option-a
Open

feat(llmo): CloudFront onboarding — Option A (Adobe-managed wizard)#2679
ABHA61 wants to merge 72 commits into
mainfrom
feat/llmo-cloudfront-onboarding-option-a

Conversation

@ABHA61

@ABHA61 ABHA61 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

DO NOT MERGE — demo branch for the LLMO CloudFront "Optimize at Edge" onboarding. One intentional demo-only aid must be reverted before this can go to main (see checklist).

What this is

Final / productionize branch for the Option A (Adobe-managed assume-role wizard) CloudFront onboarding backend. Supersedes the old Option-B demo PR #2664 — the customer-managed installer endpoint is removed here.

Changes

  • Removed Option-B installer backend: getEdgeOptimizeInstallerUrl handler + controller export, GET /sites/:siteId/llmo/edge-optimize/installer-url route + required-capability entry, the OpenAPI path/$ref, and the associated unit tests.
  • Deploy orchestrator: added a Propagation step (gates verify until the distribution reaches Deployed) and a richer Verify probe — bot vs human: UA, HTTP status, x-edgeoptimize-request-id, and x-edgeoptimize-fo failover.
  • Edge code extracted to src/support/edge-optimize-edge-code.js (Lambda@Edge handler + CloudFront routing function), imported + re-exported. Bundle-safe (no sibling-file reads) — validated by the CI bundle-build gate.

Demo-only (revert before merge)

  • runEdgeOptimizeDeployStep verify probes the distribution's own *.cloudfront.net (distDomain) because the dev test domain isn't pointed at the distribution. Restore the customer's real forwardedHost (one commented line in edge-optimize.js) before merge.

Verification

  • 592 unit tests pass · OpenAPI valid · CI green (type-check + bundle-build + it-postgres).

Paired with project-elmo-ui PR (Option-A-only FE).

🤖 Generated with Claude Code

Akash Bhardwaj and others added 30 commits June 19, 2026 01:03
POST /sites/:siteId/llmo/edge-optimize-bootstrap-url returns a CloudFormation
quick-create URL with a server-side presigned template URL, so a customer can
create the cross-account Edge Optimize connector role in their own AWS account
without a public S3 bucket and without any S3 access of their own. Presigning is
done with the service execution role.

Includes route + capability registration, OpenAPI spec, and unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The getRouteHandlers "segregates static and dynamic routes" test asserts the
exact set of routes; add the new dynamic route to the expected list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardcode EDGE_OPTIMIZE_TEMPLATE_BUCKET and EDGE_OPTIMIZE_TRUSTED_PRINCIPAL_ARN
fallbacks so the dev/ci branch deploy returns a quick-create URL before those
env vars are wired into Vault/secrets. Marked TEMPORARY / TODO REMOVE —
revert before merge/prod (values must come from env config).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p-url' into feat/llmo-edge-optimize-bootstrap-url
Use llmo-edgeoptimize-cf-template (in 682033462621, where the service deploys
and signs) so the dev role reads it same-account; stage customer fetches via
the presigned URL. Still TEMPORARY / TODO REMOVE before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p-url' into feat/llmo-edge-optimize-bootstrap-url
…ault

The TEMPORARY hardcoded EDGE_OPTIMIZE_TEMPLATE_BUCKET default makes the
bucket always set, so the 'not configured' guard can no longer be hit via
an empty env. Exercise the same guard via the missing S3 client instead.
TODO: restore the empty-bucket variant when the temp default is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the default lifetime of the bootstrap template presigned URL from
1h to 15m. The customer opens the quick-create link immediately, so a
shorter TTL shrinks the leak window. A leaked URL only grants GetObject on
the single template object until expiry; still override via
EDGE_OPTIMIZE_PRESIGN_TTL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(Phase 2)

Backend for the CloudFront 'Deploy routing' wizard's read steps. The
api-service assumes the customer's cross-account connector role server-side
(no AWS creds in the browser):

- New src/support/edge-optimize.js: assumeConnectorRole (STS AssumeRole with
  the per-session external ID) + listCloudFrontDistributions.
- POST /sites/:siteId/llmo/edge-optimize/connect - verifies the role is
  assumable (returns { connected } so the UI can poll while the customer
  creates the role); POST .../edge-optimize/distributions - lists the
  account's CloudFront distributions. Both gated by site access + LLMO admin
  and added to INTERNAL_ROUTES (not exposed to S2S).
- Adds @aws-sdk/client-sts and @aws-sdk/client-cloudfront.
- Unit tests for the support module (mocked SDK) and both handlers; route +
  capability lists updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document the already-shipped connect and distributions endpoints plus the
new read-only prerequisites, origins, and behaviors endpoints for the
CloudFront "Deploy routing" wizard. Adds shared connector/distribution
request schemas and per-endpoint response definitions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ints (Phase 2)

Add three read-only CloudFront wizard endpoints, mirroring the existing
connect/distributions handlers (12-digit accountId + externalId validation,
site/access/LLMO-admin gate, assumed-role calls, badRequest on failure):

- POST /sites/:siteId/llmo/edge-optimize/prerequisites -> checkEdgeOptimizePrerequisites
  reports connectorRole + cloudFrontRead checks (ok/false + detail, never 500)
- POST /sites/:siteId/llmo/edge-optimize/origins -> getEdgeOptimizeOrigins
  returns origins + hasEdgeOptimizeOrigin detection
- POST /sites/:siteId/llmo/edge-optimize/behaviors -> getEdgeOptimizeBehaviors
  returns default + ordered cache behaviors

Adds getDistributionConfig() support fn (GetDistributionConfigCommand) and
unit tests for the support fn, controller handlers, and route registration.
All three routes added to INTERNAL_ROUTES (admin/IMS-only, not S2S).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sync per-step endpoints that assume the customer connector role server-side
and perform one CloudFront write each (no AWS creds in the browser):

- create-origin: add the EdgeOptimize_Origin (env EDGE_OPTIMIZE_ORIGIN_DOMAIN,
  default dev.edgeoptimize.net) via UpdateDistribution (ETag IfMatch).
- create-function: create/update + publish the edgeoptimize-routing CF Function
  (bot-routing JS ported from the standalone wizard).
- apply-cache: add EO headers to the behavior's custom cache policy (common
  path; legacy ForwardedValues / managed-policy clone left as TODO).
- create-lambda: create exec role (bounded IAM-propagation retry) + the
  edgeoptimize-origin Lambda@Edge and publish a version.
- apply-associations: wire the function (viewer-request) + Lambda (origin-
  request/response) onto the selected behavior.
- verify: server-side bot-vs-human probe; passed requires x-edgeoptimize-request-id
  (x-edgeoptimize-fo = failover, not success).

Adds @aws-sdk/client-iam + @aws-sdk/client-lambda. All AWS ops use ETag
read-modify-write. Embedded function/Lambda code ported verbatim from the
connect-aws-wizard; Lambda code inlined per the helix-deploy bundling rule.
Mocked-SDK unit tests for all 6 support fns + handlers; routes/capabilities
+ OpenAPI updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-wizard' into feat/llmo-edge-optimize-cloudfront-wizard
The Default(*) behavior commonly uses an AWS-managed cache policy, which
cannot be updated (UpdateCachePolicy -> 'update is not allowed for this
policy'). applyEdgeOptimizeCacheHeaders now ports the full standalone-wizard
logic with all three scenarios:
- legacy (ForwardedValues, no CachePolicyId): add EO headers there + MinTTL 0
- custom policy: UpdateCachePolicy to add EO headers (existing path)
- managed policy: CLONE into a custom edgeoptimize-cache policy with the EO
  headers, then repoint the behavior to it (idempotent by name)

Adds GetCachePolicy/ListCachePolicies/CreateCachePolicy. Support tests
rewritten to dispatch by command name and cover all three scenarios.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…endpoint

Fixes the Lambda step's three failure modes (timeout, 'update in progress',
no existence check):
- waitForLambdaIdle now gates on State Active AND LastUpdateStatus !=
  InProgress (was State only), so we never hit ResourceConflictException
  ('update is in progress') on a retry after a slow/timed-out first call.
- createEdgeOptimizeLambda is fully idempotent: if a published numbered
  version already matches the current code, reuse it (no update/publish);
  otherwise update + publish. So a retry after a CDN first-byte timeout
  returns immediately instead of conflicting.
- New read-only POST /sites/:siteId/llmo/edge-optimize/lambda-status
  (getEdgeOptimizeLambdaStatus) so the wizard can detect on entry and poll
  whether the function already exists with a published version.

Support tests rewritten to dispatch by command name; status tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dly)

create-lambda no longer blocks on a fresh function becoming Active (which
exceeded the CDN first-byte timeout -> 503). It now ensures the role + kicks
off the function create and returns { status: 'provisioning' | 'ready' }
immediately; the UI polls until a published version exists. Also:
- buildLambdaZip uses a fixed timestamp so CodeSha256 is deterministic
  (no version churn).
- lambda-status now reports roleExists + a ready flag (role is created
  synchronously by the create ack) so the wizard can show role + function
  state and check on entry.
- Removed the in-request waitForLambdaIdle/UpdateFunctionCode blocking path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
edge-optimize.test.js re-ran esmock() in beforeEach, re-instantiating the
mocked AWS SDK module graph on every test. As this file grew this session it
accumulated enough memory to push the 12.5k-test suite past the 4GB V8 heap
limit (worker OOM -> '1 failing: Worker terminated' + lost-worker coverage
dropping below the 90% gate). Move esmock to a single before() hook and reset
only the send stubs per test. Suite run time for this file drops from ~4min
to ~7s and the leak is gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt-wizard' into feat/llmo-edge-optimize-cloudfront-wizard
The create-origin step created the EdgeOptimize_Origin without its custom
headers, so the routing function's request could not authenticate to Edge
Optimize or resolve the customer host - Verify never returned an
x-edgeoptimize-request-id.

- createEdgeOptimizeOrigin now sets x-edgeoptimize-api-key (site EO API key),
  x-forwarded-host (customer host), and optional x-edgeoptimize-fetcher-key,
  mirroring the standalone wizard + CloudFormation installer.
- Self-heals: an origin created header-less by the earlier version is patched
  in place on re-run (returns updated: true).
- Handler derives both server-side - api key from the tokowaka metaconfig
  (apiKeys[0]), forwarded host from calculateForwardedHost(site.baseURL) - so
  no new UI input; gateEdgeOptimizeWizard now returns the site to avoid a
  second fetch.
- Verify: documented the prod TODO (probe the customer's real domain, not the
  *.cloudfront.net domain) - behavior unchanged for dev testing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tighten the dev-only default for EDGE_OPTIMIZE_TRUSTED_PRINCIPAL_ARN from the
whole dev account (arn:aws:iam::682033462621:root) to the exact assuming
identity - the spacecat-api-service Lambda execution role
(arn:aws:iam::682033462621:role/spacecat-role-lambda-generic) - shrinking the
blast radius of the connector-role trust. No AWS-side change needed; the
assuming identity is already that role. Prod must still set this via env to the
prod execution role ARN (no in-code default) - tracked in the punch list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ABHA61 and others added 2 commits June 25, 2026 08:54
…e status

Association (data safety): applyEdgeOptimizeAssociations no longer replaces the
behavior's FunctionAssociations / LambdaFunctionAssociations wholesale — it MERGES,
preserving every association on event types EO does not own (e.g. a viewer-response
function or lambda) and (re)setting only EO's own slots (viewer-request function +
origin-request/origin-response lambda). A non-EO association already on a slot EO
needs is now REFUSED with a clear message instead of silently clobbered; the plan
surfaces the same conflict so Review blocks before deploy.

Role visibility: the plan's Lambda@Edge step now reports the execution role — it
inspects an existing role (trust = lambda + edgelambda + the logs inline policy) and
says whether it is correctly configured and will be reused, will be created, or will
be corrected (the deploy already conforms it). Previously Review said nothing about
the role even when one existed from a prior partial run.

68 support tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…adobe-<distId>

No behavior change. The plan + deploy already match the existing clone by its FULL
derived name (Managed- stripped, then -adobe-<distId>), not by suffix — so if the
customer re-points the behavior to a different source policy, we create a clone
matching the CURRENT source rather than reusing one built from a different base.
Documents the decision to prevent a regression to suffix matching.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… when function is ready

The deploy step gated the lambda step on `ls.ready` only, so a published function
with a missing OR mis-configured execution role was marked done without recreating/
healing the role — contradicting the plan's 'role will be created/corrected' message.

getEdgeOptimizeLambdaStatus now also reports roleOk (exists + trust(lambda+edgelambda)
+ EdgeOptimizeLambdaLogging policy) by reusing inspectEdgeOptimizeLambdaRole, and the
gate is now `ls.ready && ls.roleOk`. When the role is missing/mis-configured it falls
through to createEdgeOptimizeLambda, which (re)creates the role + heals its trust/logs
policy; the function already exists so it reuses the published version and returns
ready. The plan derives its role note from roleExists/roleOk (no extra IAM read).

Tests: getStatus roleOk; deploy-step gate rows — ready+role-missing → recreated,
ready+role-misconfigured → healed, ready+role-ok → no role writes (no churn). 71 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New `environment: 'production' | 'stage'` request field (default production) on the
plan + deploy endpoints. resolveEoTarget(site, environment) resolves the EO target:
production = the site's own baseURL/apiKey/host (today); stage = the single
stagingDomains[0] → that stage site (same org) → its metaconfig apiKey +
calculateForwardedHost. The resolved apiKey + forwardedHost flow into originHeaders,
so verify (which reads originHeaders.forwardedHost on restore) is env-aware for free.
plan response now returns targetDomain. OpenAPI updated (deploy env field + documented
plan path/schema). 608 tests pass; stage paths + 400s covered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ABHA61 and others added 2 commits June 25, 2026 18:08
…ity map

Adds the 16 admin-only LLMO CloudFront Optimize-at-Edge onboarding wizard
routes to INTERNAL_ROUTES, fixing the routeFacsCapabilities invariant that
broke after merging main into this branch.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant