-
Notifications
You must be signed in to change notification settings - Fork 13
feat(SITES-47203): add baseUrlContains substring search to GET /sites #2693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
habansal
wants to merge
11
commits into
main
Choose a base branch
from
fix/SITES-47203-sites-base-url-search
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
e14e9ce
feat(SITES-47203): add baseUrlLike substring search to GET /sites
habansal 30fa80c
docs(SITES-47203): ADR-006 for baseUrlLike substring search on GET /s…
habansal 218045f
fix(SITES-47203): address review feedback on baseUrlLike search
habansal 167521d
docs(SITES-47203): ADR-006 — clarify LLMO bulk-load remains + documen…
habansal 3ac4bb8
test(SITES-47203): integration coverage for baseUrlLike search (real …
habansal fd8c1f6
fix(SITES-47203): use correct (attrs, op) where-builder for baseUrlLike
habansal 32c8d0c
fix(SITES-47203): address MysticatBot review — cursor+baseUrlLike gua…
habansal e115b34
docs(SITES-47203): fix ADR-006 where-signature + response shape (Myst…
habansal b63dca2
Merge branch 'main' into fix/SITES-47203-sites-base-url-search
habansal b222558
test(SITES-47203): cover S2S-readall log branch in baseUrlLike path
habansal b564cf0
feat(SITES-47203): rename baseUrlLike->baseUrlContains, add offset pa…
habansal File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,88 @@ | ||
| # ADR-006: Substring Base-URL Search on `GET /sites` | ||
|
|
||
| ## Status | ||
| Accepted | ||
|
|
||
| ## Context | ||
| The Experience Success Studio back-office UI ("Backoffice") Sites page let operators find a site | ||
| to manage. Its only mechanism was to **load every site** (`GET /sites`, cursor-paginated at 500/page) | ||
| into the browser and filter client-side. With ~18k sites that meant ~36 **sequential** cursor | ||
| requests (each page's cursor is only known after the previous response resolves) — a 15–22s blank | ||
| load before the table was usable. See SITES-47203. | ||
|
|
||
| `GET /sites` had no server-side search: only cursor pagination, an exact `GET /sites/by-base-url/:baseURL` | ||
| lookup, and an exact `GET /sites/:siteId` lookup. So "find the site whose URL contains `icici`" | ||
| forced the full client-side bulk load. | ||
|
|
||
| Two facts shaped the decision: | ||
|
|
||
| - **Cursor pagination cannot be parallelized.** You cannot issue page _N+1_ without page _N_'s | ||
| cursor, and `GET /sites` exposes neither offset nor a total count. So the sequential walk is | ||
| inherent — the fix is to *not load everything*, not to page faster. | ||
| - **The data layer migrated from DynamoDB to PostgreSQL (via PostgREST).** `spacecat-shared-data-access` | ||
| now backs `Site` with PostgREST and its collection query API supports `ilike`/`like`/`contains` | ||
| filters and offset pagination. On DynamoDB a substring search would have been a full-partition | ||
| scan (anti-pattern); on Postgres `ILIKE '%…%'` is a normal, cheap query. | ||
|
|
||
| ## Decision | ||
| Add an optional **`baseUrlContains`** query parameter to `GET /sites`: | ||
|
|
||
| `GET /sites?baseUrlContains=<substring>&limit=<N>&offset=<M>` | ||
|
|
||
| - Maps to `Site.all({}, { where: (attr, op) => op.ilike(attr.baseURL, '%<escaped>%'), limit: N+1, cursor: <offset-cursor>, order: 'asc' })`. | ||
| The data-access `where` builder passes `(attrs, op)`: `attrs` maps model fields to DB columns and | ||
| `op` carries the operators. No `spacecat-shared-data-access` change was required — the `ilike` | ||
| operator already exists. (`order: 'asc'` sorts by the index's order fields with the primary key as a | ||
| deterministic tiebreaker — see `base.collection`'s `#getOrderFields`.) The data-access layer exposes | ||
| no public `offset` option — it paginates via an opaque, offset-encoded cursor (`postgrest.utils` | ||
| `encodeCursor`, which is not exported). The controller therefore builds the same | ||
| `base64(JSON.stringify({ offset }))` cursor inline to reach the requested offset; if a direct | ||
| `offset` option is added upstream, switch to it. | ||
| - **Validation:** `baseUrlContains` must be ≥ 3 characters (trimmed); LIKE wildcards (`%`, `_`, `\`) in | ||
| user input are escaped so callers cannot inject wildcards. `offset` defaults to 0 and must be a | ||
| non-negative integer (otherwise 400). | ||
| - **Top-N + "more exists":** `limit` defaults to 50, capped at `MAX_LIMIT` (500). We fetch `N+1` rows | ||
| at the requested `offset` and trim to `N`; the extra row drives `pagination.hasMore`, which the UI | ||
| surfaces as a "refine your search" hint / next-page affordance. Response shape: | ||
| `{ sites: [...], pagination: { limit, offset, hasMore, baseUrlContains } }` | ||
| — the `baseUrlContains` echo is the deploy-ordering discriminator (see Consequences). | ||
| - **Authorization is unchanged** — the new branch runs after the existing admin / S2S `site:readAll` | ||
| check. Non-admin (org-scoped) callers continue to receive `403` on `GET /sites`; the Backoffice | ||
| client falls back to the org-scoped sites endpoint (a small, bounded set) and filters it | ||
| client-side. The complex org/delegated-sites endpoint was intentionally left untouched. | ||
|
|
||
| ## Alternatives considered | ||
|
|
||
| - **Client-side progressive rendering** (render pages as they stream in). Rejected: it only traded | ||
| the blank spinner for ~15s of a churning, re-sorting table, and never addressed the root cause — | ||
| shipping ~18k rows to the browser. (This was an earlier PR, since abandoned.) | ||
| - **Parallel page fetching.** Impossible: cursor pagination has no offset/total, so pages must be | ||
| sequential. Even hypothetically, ~36 concurrent 500-row reads carry 429 / DB-load risk for | ||
| negligible benefit. | ||
| - **Prefix-only search (`begins_with`).** This was the *DynamoDB-idiomatic* option (efficient on the | ||
| `baseURL` sort key). It is moot now that the backend is Postgres, and substring is the better UX | ||
| (matches anywhere, so the stored `https://`/`www.` prefix doesn't get in the way). | ||
| - **Dedicated search index (OpenSearch).** Correct for large-scale fuzzy/multi-field search, but | ||
| heavy infrastructure and unjustified for an internal tool at this scale. | ||
|
|
||
| ## Consequences | ||
| - The Backoffice **Sites page** drops the bulk-load (and its two rarely-used dropdown filters): it now | ||
| searches by base-URL substring or looks a site up by exact ID. See OneAdobe/experience-success-studio-backoffice#332. | ||
| (The legacy `getSites` bulk walk still backs `LLMOptimizerData.js` — eliminating that is tracked as a | ||
| separate follow-up; this ADR does not address it.) | ||
| - **Deploy ordering.** The Backoffice client always sends `limit`, so an *older* API deployment would | ||
| ignore `baseUrlContains` and return unfiltered cursor results. To avoid silent wrong results, the search | ||
| response echoes `pagination.baseUrlContains`; the client treats a missing/mismatched echo as "search | ||
| unsupported" and surfaces an error. Deploy the API before (or with) the Backoffice change. | ||
| - **No trigram index yet.** `base_url` has a UNIQUE btree but no `pg_trgm` GIN index, so a leading-wildcard | ||
| `ILIKE '%…%'` is a sequential scan. At ~18k small rows this is single-digit-ms in Postgres and only | ||
| matches cross the wire, so it is acceptable for now. **Deferred follow-up:** add | ||
| `CREATE EXTENSION pg_trgm` + a GIN trigram index on `sites.base_url` (owned by `mysticat-data-service`) | ||
| if/when table growth or latency warrants index-accelerated substring search. | ||
| - The contract is additive and backward-compatible: existing cursor-paginated and legacy flat-array | ||
| behavior of `GET /sites` is unchanged. | ||
|
|
||
| ## References | ||
| - SITES-47203 | ||
| - API change: this PR (adobe/spacecat-api-service) | ||
| - Backoffice consumer: OneAdobe/experience-success-studio-backoffice#332 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.