Skip to content
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions docs/decisions/006-sites-base-url-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# ADR-006: Substring Base-URL Search on `GET /sites`

## Status
Accepted

## Context
The Experience Success Studio back-office UI ("Backoffice") Sites page let operators find a site
to manage. Its only mechanism was to **load every site** (`GET /sites`, cursor-paginated at 500/page)
into the browser and filter client-side. With ~18k sites that meant ~36 **sequential** cursor
requests (each page's cursor is only known after the previous response resolves) — a 15–22s blank
load before the table was usable. See SITES-47203.

`GET /sites` had no server-side search: only cursor pagination, an exact `GET /sites/by-base-url/:baseURL`
lookup, and an exact `GET /sites/:siteId` lookup. So "find the site whose URL contains `icici`"
forced the full client-side bulk load.

Two facts shaped the decision:

- **Cursor pagination cannot be parallelized.** You cannot issue page _N+1_ without page _N_'s
cursor, and `GET /sites` exposes neither offset nor a total count. So the sequential walk is
inherent — the fix is to *not load everything*, not to page faster.
- **The data layer migrated from DynamoDB to PostgreSQL (via PostgREST).** `spacecat-shared-data-access`
now backs `Site` with PostgREST and its collection query API supports `ilike`/`like`/`contains`
filters and offset pagination. On DynamoDB a substring search would have been a full-partition
scan (anti-pattern); on Postgres `ILIKE '%…%'` is a normal, cheap query.

## Decision
Add an optional **`baseUrlLike`** query parameter to `GET /sites`:

`GET /sites?baseUrlLike=<substring>&limit=<N>`

- Maps to `Site.all({}, { where: (attr, op) => op.ilike(attr.baseURL, '%<escaped>%'), limit: N+1, order: 'asc' })`.
The data-access `where` builder passes `(attrs, op)`: `attrs` maps model fields to DB columns and
`op` carries the operators. No `spacecat-shared-data-access` change was required — the `ilike`
operator already exists. (`order: 'asc'` sorts by the index's order fields with the primary key as a
deterministic tiebreaker — see `base.collection`'s `#getOrderFields`.)
- **Validation:** `baseUrlLike` must be ≥ 3 characters (trimmed); LIKE wildcards (`%`, `_`, `\`) in
user input are escaped so callers cannot inject wildcards.
- **Top-N + "more exists":** `limit` defaults to 50, capped at `MAX_LIMIT` (500). We fetch `N+1` rows
and trim to `N`; the extra row drives `pagination.hasMore`, which the UI surfaces as a
"refine your search" hint. Response shape: `{ sites: [...], pagination: { limit, hasMore, baseUrlLike } }`
— the `baseUrlLike` echo is the deploy-ordering discriminator (see Consequences).
- **Authorization is unchanged** — the new branch runs after the existing admin / S2S `site:readAll`
check. Non-admin (org-scoped) callers continue to receive `403` on `GET /sites`; the Backoffice
client falls back to the org-scoped sites endpoint (a small, bounded set) and filters it
client-side. The complex org/delegated-sites endpoint was intentionally left untouched.

## Alternatives considered
Comment thread
habansal marked this conversation as resolved.

- **Client-side progressive rendering** (render pages as they stream in). Rejected: it only traded
the blank spinner for ~15s of a churning, re-sorting table, and never addressed the root cause —
shipping ~18k rows to the browser. (This was an earlier PR, since abandoned.)
- **Parallel page fetching.** Impossible: cursor pagination has no offset/total, so pages must be
sequential. Even hypothetically, ~36 concurrent 500-row reads carry 429 / DB-load risk for
negligible benefit.
- **Prefix-only search (`begins_with`).** This was the *DynamoDB-idiomatic* option (efficient on the
`baseURL` sort key). It is moot now that the backend is Postgres, and substring is the better UX
(matches anywhere, so the stored `https://`/`www.` prefix doesn't get in the way).
- **Dedicated search index (OpenSearch).** Correct for large-scale fuzzy/multi-field search, but
heavy infrastructure and unjustified for an internal tool at this scale.

## Consequences
- The Backoffice **Sites page** drops the bulk-load (and its two rarely-used dropdown filters): it now
searches by base-URL substring or looks a site up by exact ID. See OneAdobe/experience-success-studio-backoffice#332.
(The legacy `getSites` bulk walk still backs `LLMOptimizerData.js` — eliminating that is tracked as a
separate follow-up; this ADR does not address it.)
- **Deploy ordering.** The Backoffice client always sends `limit`, so an *older* API deployment would
ignore `baseUrlLike` and return unfiltered cursor results. To avoid silent wrong results, the search
response echoes `pagination.baseUrlLike`; the client treats a missing/mismatched echo as "search
unsupported" and surfaces an error. Deploy the API before (or with) the Backoffice change.
- **No trigram index yet.** `base_url` has a UNIQUE btree but no `pg_trgm` GIN index, so a leading-wildcard
`ILIKE '%…%'` is a sequential scan. At ~18k small rows this is single-digit-ms in Postgres and only
matches cross the wire, so it is acceptable for now. **Deferred follow-up:** add
`CREATE EXTENSION pg_trgm` + a GIN trigram index on `sites.base_url` (owned by `mysticat-data-service`)
if/when table growth or latency warrants index-accelerated substring search.
- The contract is additive and backward-compatible: existing cursor-paginated and legacy flat-array
behavior of `GET /sites` is unchanged.

## References
- SITES-47203
- API change: this PR (adobe/spacecat-api-service)
- Backoffice consumer: OneAdobe/experience-success-studio-backoffice#332
38 changes: 38 additions & 0 deletions docs/openapi/schemas.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -833,6 +833,44 @@ SitePagedResponse:
required:
- sites
- pagination
SiteSearchResponse:
type: object
description: |
Response for the `baseUrlLike` substring search on `GET /sites`. Unlike
`SitePagedResponse`, this path is not cursor-iterable: `pagination` carries no
`cursor`, and instead echoes the trimmed `baseUrlLike` query so a client can
confirm the search was actually applied (an older deployment that ignores
`baseUrlLike` but honors `limit` would return `SitePagedResponse` with no echo).
properties:
sites:
type: array
items:
$ref: './schemas.yaml#/SiteListItem'
pagination:
type: object
properties:
limit:
description: The maximum number of items returned (default 50, clamped to 500)
type: integer
example: 50
hasMore:
description: Indicates whether more matching sites exist beyond this result
type: boolean
example: false
baseUrlLike:
description: |
The trimmed `baseUrlLike` query that was applied (3-256 chars). Echoed so
clients can confirm the search ran; absent on deployments that ignore the
param.
type: string
example: "adobe"
required:
- limit
- hasMore
- baseUrlLike
required:
- sites
- pagination
SiteWithLatestAuditList:
type: array
items:
Expand Down
44 changes: 38 additions & 6 deletions docs/openapi/sites-api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,11 @@ sites:
- **Paginated (with `limit` and/or `cursor`):** returns `{ sites, pagination }` with
the full unfiltered result set. Iterate using `pagination.cursor` until
`pagination.hasMore` is `false` to fetch all sites.
- **Substring search (with `baseUrlLike`):** returns `{ sites, pagination }` where
`sites` are those whose `baseURL` contains the (case-insensitive) substring.
`pagination` is `{ limit, hasMore }` (no `cursor`; this branch is not
cursor-iterable). Defaults to a limit of 50 (clamped to 500). The search term
must be at least 3 characters after trimming (otherwise 400).

Required capabilities: admin access (legacy admin path) or S2S `site:readAll`.
operationId: getSites
Expand Down Expand Up @@ -160,6 +165,27 @@ sites:
type: string
maxLength: 256
example: "eyJvZmZzZXQiOjEwMH0="
- name: baseUrlLike
in: query
required: false
description: |
Case-insensitive substring to match against each site's `baseURL`. When
provided, the response uses a non-cursor paginated envelope
(`{ sites, pagination: { limit, hasMore, baseUrlLike } }`), where
`pagination.baseUrlLike` echoes the trimmed query so a client can confirm
the search was applied (an older deployment that ignores this param would
omit the echo). Length is enforced **after trimming**: must be at least 3
and at most 256 characters (otherwise 400). LIKE wildcards (`%`, `_`, `\`)
in the value are escaped and treated literally. The `limit` parameter
(default 50, clamped to 500) bounds the result size; `cursor` is not used
on this path.
schema:
type: string
# Bounds are enforced server-side AFTER trimming surrounding whitespace, so
# minLength/maxLength here are documentation of the effective post-trim limits.
minLength: 3
maxLength: 256
example: "adobe"
responses:
'200':
description: |
Expand All @@ -168,13 +194,18 @@ sites:
application/json:
schema:
description: |
Two response shapes are possible, selected by the request and
unambiguous by JSON type, so no `discriminator` is used (nor can one
be expressed — the legacy branch is a top-level array, which has no
Three response shapes are possible, selected by the request and
unambiguous by JSON type/shape, so no `discriminator` is used (nor can
one be expressed — the legacy branch is a top-level array, which has no
property to discriminate on):
- When `limit` and/or `cursor` is provided → **`SitePagedResponse`**,
a JSON object with `sites` and `pagination`.
- When neither is provided → **`SiteList`**, the legacy top-level
- When `baseUrlLike` is provided → **`SiteSearchResponse`**, a JSON
object with `sites` and a non-cursor `pagination`
(`{ limit, hasMore, baseUrlLike }`). The `baseUrlLike` echo lets a
client confirm the search ran on the deployment it hit.
- When `limit` and/or `cursor` is provided (without `baseUrlLike`) →
**`SitePagedResponse`**, a JSON object with `sites` and a
cursor-based `pagination`.
- When none are provided → **`SiteList`**, the legacy top-level
JSON array.

This resolves to a single shape once the legacy path is sunset.
Expand All @@ -186,6 +217,7 @@ sites:
- $ref: './schemas.yaml#/SiteList'
deprecated: true
- $ref: './schemas.yaml#/SitePagedResponse'
- $ref: './schemas.yaml#/SiteSearchResponse'
'400':
$ref: './responses.yaml#/400'
'401':
Expand Down
79 changes: 79 additions & 0 deletions src/controllers/sites.js
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@ const MONTH_DAYS = 30;
const TOTAL_METRICS = 'totalMetrics';
const BRAND_PROFILE_AGENT_ID = 'brand-profile';
const DEFAULT_LIMIT = 100;
const SEARCH_DEFAULT_LIMIT = 50;
const MAX_LIMIT = 500;

/**
Expand Down Expand Up @@ -430,6 +431,13 @@ function SitesController(ctx, log, env) {
* Gets all sites with cursor-based pagination. Accessible to admin callers (legacy admin path)
* and to S2S consumers that hold the `site:readAll` capability - see
* `docs/s2s/READALL_CAPABILITY_DESIGN.md`.
*
* Optional `baseUrlLike` query param: when provided (3-256 chars after trim),
* performs a case-insensitive substring search on `baseURL` and returns a non-cursor
* `{ sites, pagination: { limit, hasMore, baseUrlLike } }` response. The trimmed query
* is echoed back in `pagination.baseUrlLike` so a client can confirm its search was
* applied even if it hits an older deployment that ignores the param. LIKE wildcards in
* the input are escaped so callers cannot inject their own wildcards.
* @returns {Promise<Response>} Paginated sites response
*/
const getAll = async (context) => {
Expand All @@ -447,6 +455,77 @@ function SitesController(ctx, log, env) {

const limitParam = context?.data?.limit;
const cursor = context?.data?.cursor || null;

// Optional substring search by base URL. Runs after the authz check (so
// unauthorized callers still get 403) and before the cursor/legacy branches.
const baseUrlLike = context?.data?.baseUrlLike;
if (hasText(baseUrlLike) && hasText(cursor)) {
// baseUrlLike search does not paginate via cursor; accepting both would
// silently discard the cursor and mislead the client into thinking
// cursor pagination is active. Reject the combination explicitly.
return badRequest('cursor is not supported with baseUrlLike');
}
if (hasText(baseUrlLike)) {
Comment thread
habansal marked this conversation as resolved.
Outdated
const q = baseUrlLike.trim();
if (q.length < 3) {
return badRequest('baseUrlLike must be at least 3 characters');
}
if (q.length > 256) {
return badRequest('baseUrlLike exceeds maximum length');
}

const parsedLimit = hasText(limitParam) ? parseInt(limitParam, 10) : SEARCH_DEFAULT_LIMIT;
if (!Number.isInteger(parsedLimit) || parsedLimit <= 0) {
return badRequest('limit must be a positive integer');
}
const effectiveLimit = Math.min(parsedLimit, MAX_LIMIT);

// Escape LIKE special chars so user input cannot inject its own wildcards.
const escaped = q.replace(/([\\%_])/g, '\\$1');

// Fetch one extra row to detect whether more results exist beyond the limit.
// The data-access `where` builder passes (attrs, op): `attrs` maps model
// fields to DB columns, `op` carries the operators. (NOT `s => s.ilike(...)`.)
let rows;
try {
rows = await Site.all({}, {
where: (attr, op) => op.ilike(attr.baseURL, `%${escaped}%`),
limit: effectiveLimit + 1,
order: 'asc',
});
} catch (e) {
// Re-throw so the framework still returns a 500 — the point here is a
// searchable, prefixed log line, not swallowing the error.
log.error(`[sites][baseUrlLike] query failed requestId=${requestId}`, e);
throw e;
}
let list;
if (Array.isArray(rows)) {
list = rows;
} else if (Array.isArray(rows?.data)) {
list = rows.data;
} else {
log.warn(`[sites][baseUrlLike] unexpected Site.all shape; returning empty requestId=${requestId}`);
list = [];
}
const hasMore = list.length > effectiveLimit;
const sites = list.slice(0, effectiveLimit).map((site) => SiteDto.toListJSON(site));

if (s2sResult.allowed) {
log.info(`[s2s-readall] GET /sites (baseUrlLike) granted clientId=${s2sResult.clientId} consumerId=${s2sResult.consumerId} capability=${CAP_SITE_READ_ALL} count=${sites.length} requestId=${requestId}`);
}

// Unconditional observability for both admin and S2S paths. Never log the raw
// query value (URLs may be sensitive) — only its length and result counts.
log.info(`[sites][baseUrlLike] qlen=${q.length} count=${sites.length} hasMore=${hasMore} requestId=${requestId}`);

// Echo the trimmed query in the pagination so a new client can confirm its
// search was actually applied. An older deployment that ignores `baseUrlLike`
// but still honors `limit` would return the cursor envelope with unfiltered
// sites and no `baseUrlLike` echo — letting clients detect the version skew.
return ok({ sites, pagination: { limit: effectiveLimit, hasMore, baseUrlLike: q } });
}

const paginated = hasText(limitParam) || hasText(cursor);

if (cursor !== null) {
Expand Down
Loading
Loading