Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -86,46 +86,47 @@ output/
└── da/<page-slug>.html ← DA-source body fragment
```

**Rewrite relative asset paths.** When a source uses relative paths
like `assets/photos/foo.jpg`, `url(./images/bar.png)`,
`<link href="assets/css/site.css">`, they resolve against our serving
host (`localhost:3000/drafts/...` or
`<branch>--<repo>--<owner>.aem.page/<da-root>/...`) — where they 404.
They need rewriting to one of three target forms, depending on
**asset strategy** (recorded in `decisions.json["assetStrategy"]`):

1. **`absolute`** (default for publicly hosted sources):
`https://<source-host>/path/to/assets/...`. Source host serves the
binaries directly; EDS preview sideloads any `<img src>` URL into
Media Bus on first preview (see da-content `media.md` §2).
2. **`vendor`** (local-only source, accepted repo size impact):
Copy the asset tree into `./assets/` in the repo. Template /
fragment / page-CSS refs become root-relative `/assets/...`. DA
cell refs use absolute branch URLs
(`https://<branch>--<repo>--<owner>.aem.page/assets/...`).
3. **`da-media`** (cleanest long-term):
Upload binaries to DA `/media/<scope>/<file>` via the bundled
`<SKILL_DIR>/scripts/da-media-upload.mjs` script. Template /
fragment / page-CSS / DA cell refs all use
`https://content.da.live/<org>/<repo>/media/<scope>/<file>`. The
uploader emits a `media-mapping.json` of local-path →
content.da.live URL that Generate consumes for the rewrites.

| Aspect | absolute | vendor | da-media |
|---------------------------|-------------|--------------|--------------|
| Repo size | unchanged | +N MB | unchanged |
| Branch-independent assets | N/A | No | **Yes** |
| Local-only source | No | Yes | Yes |
| Initial-run effort | none | curl+sed | uploader |
| Tooling required | none | none | bundled `.mjs` |
| Reusable across runs | N/A | per-run | **Yes** |
| DA-cell image URL form | source host | branch URL | content.da.live |
| Delivered image URL | `./media_<hash>` (sideload) | `./media_<hash>` | `./media_<hash>` |

For fonts specifically, even under `da-media`, place font files in Code
Bus `/fonts/<file>.woff2` (or `.otf`) per da-content `media.md`
§13.2 decision tree. Fonts upload would be a DA media-bus mismatch
(SVG/PNG/JPG/MP4 are Media Bus; fonts are Content Bus).
**Asset handling is per-asset, decided mechanically.** The
`asset-collect.mjs` script (run in Phase 1) scans `index.html` for
in-scope asset references — raster images, videos, and fonts — and
assigns a per-asset strategy based on reachability and type:

| Asset reachability | Asset type | Strategy | Destination |
|---|---|---|---|
| Stable CDN (fonts.gstatic.com, cdn.jsdelivr.net, etc.) | any | `absolute` — leave URL as-is | no download |
| Ephemeral host (claudeusercontent.com, etc.) | font | `vendor` — always download | `input/fonts/` → `/fonts/` |
| Ephemeral host | image, video | `da-media` — always download | `input/images/` or `input/videos/` |
| Reachable public host + public base URL | any | `absolute` — leave URL as-is | no download |
| Reachable public host + local base URL (snapshot) | font | `vendor` — download | `input/fonts/` → `/fonts/` |
| Reachable public host + local base URL (snapshot) | image, video | `da-media` — download | `input/images/` or `input/videos/` |
| Local / private IP | font | `vendor` — download | `input/fonts/` → `/fonts/` |
| Local / private IP | image, video | `da-media` — download | `input/images/` or `input/videos/` |

**Never vendor images or videos into the git repo.** Binary content
assets belong in DA Media Bus, not Code Bus. Only code assets (fonts,
SVGs under 40KB, scripts, stylesheets) go in the git repo.

The script downloads assets, normalizes filenames, and rewrites
references in `index.html` to relative paths (`fonts/foo.woff2`,
`images/hero.jpg`). Phase 3 transforms these depending on context:

- **Template / fragment / CSS**: root-relative paths (`/fonts/...`,
`/images/...`). Browser resolves against code-bus host.
- **DA cell image refs**: relative paths (`images/hero.jpg`). The
calling pipeline's `rewriteImageRefs` rewrites these to DA Media
Bus URLs before pushing to DA. Do NOT use absolute branch URLs —
they break the pipeline's pattern matching.

The `da-media-upload.mjs` script (Phase 5) uploads `da-media` assets
to DA and emits final `content.da.live` URLs; DA cells should be
updated to those final URLs after upload.

| Aspect | absolute | vendor | da-media |
|---|---|---|---|
| Asset types | any | fonts only | images, videos |
| Repo size | unchanged | +font files (small) | unchanged |
| DA-cell URL form | source host (leave as-is) | N/A (fonts are in CSS, not DA cells) | relative (`images/...`) → pipeline rewrites |
| Delivered image URL | `./media_<hash>` (sideload) | `./media_<hash>` | `./media_<hash>` |

This applies to template HTML, fragment HTML, DA cell values
referencing images, and any CSS `url()` references.
Expand Down Expand Up @@ -203,15 +204,15 @@ or consequences that aren't in `da-content`:
page styling hooks to CSS-on-structure (`:has()`, sibling
selectors) or swap to a preserved semantic element before emitting.

3. **`<img>` URLs in DA cells are stricter than template/fragment
HTML.** DA cells require absolute URLs (`da-content` §9).
Template and fragment HTML can use root-relative `/assets/...`
because the browser resolves those against the rendered page host
(= code-bus host). The DA pipeline is what's stricter, not the
browser. Acceptable absolute forms in Snowflake DA cells:
- Public source page: `https://<source-host>/<path>/image.png`
- Vendored same-branch assets: `https://<branch>--<repo>--<owner>.aem.page/assets/...`
- DA media: `https://content.da.live/<org>/<repo>/media_<sha>...`
3. **`<img>` URLs in DA cells must be relative, not absolute.**
Phase 3 writes relative paths (`images/hero.jpg`) in DA cell
image refs. The calling pipeline's `rewriteImageRefs` matches
these patterns and rewrites them to DA Media Bus URLs before
the DA push. Absolute branch URLs
(`https://<branch>--...aem.page/images/...`) break the
pipeline's pattern matching. Template and fragment HTML use
root-relative paths (`/images/...`) instead — the browser
resolves those against the code-bus host.

### Slot rules in the template

Expand Down Expand Up @@ -335,19 +336,16 @@ animations to settle, then capture. Save each as
cannot reach the source's assets. Three options, in order of
preference:

1. **Vendor the referenced assets under `/assets/` in the repo.**
Same paths work locally and on production via code-bus. Same-origin
so no CORS issues for fonts. Trade-off is repo size. Mechanical
steps:
- `cp -R <source-assets-dir> ./assets/`
- Remove `.DS_Store`s and unreferenced files
- Rename any directory containing spaces (AEM CLI 404s on
URL-encoded `%20`)
- In template/fragments/CSS: rewrite localhost URLs to
root-relative `/assets/...`
- In DA doc cells: rewrite localhost URLs to ABSOLUTE branch URLs
(`https://<branch>--<repo>--<owner>.aem.page/assets/...`) — Media
Bus requires absolute (see Generate phase rule #4).
1. **Deploy assets to the repo** (`fonts/`, `images/`, `videos/`).
`asset-collect.mjs` handles this automatically — it downloads,
normalizes, and rewrites refs in Phase 1. Wire (Phase 4) copies
them to repo root. Same paths work locally and on production via
code-bus. Same-origin so no CORS issues for fonts. Mechanical
rules:
- In template/fragments/CSS: root-relative `/fonts/...`,
`/images/...`
- In DA doc cells: relative paths (`images/...`) — the pipeline
rewrites these to DA Media Bus URLs before the DA push.
2. **Migrate assets to DA `/media/`.** Cleaner long-term; requires
tooling not yet in scope.
3. **Skip production round-trip.** Lowest effort; ask the user first
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,45 +101,36 @@ curl -fsS "$SOURCE_URL" -o "${PROJ}/input/index.html"
Validate: file size > 0, response was HTML (look for `<!doctype html`
or `<html` in the first 200 bytes).

## Identify and fetch external assets
## Collect and normalize assets

Look for these patterns in the saved HTML:

- `<link rel="stylesheet" href="...">` — external CSS
- `<script src="...">` — external JS
- For each match, if the href is relative or points to the same host
as the source, fetch it into `input/` (preserve filename).

Skip CDN resources (Google Fonts, jsdelivr, etc.) that are publicly
available; they'll be referenced directly in the converted artifacts.
Local/same-host external assets are saved so the project is self-
contained.
Run the asset-collect script to discover in-scope asset references
(raster images, videos, fonts), download local/unreachable ones,
normalize filenames, rewrite references in `index.html`, and emit
`asset-manifest.json`. Always runs — no-op when already collected.

```bash
INPUT="${PROJ}/input"
# Extract candidate external assets
grep -oE '<(link|script)[^>]*(href|src)="[^"]+"' "$INPUT/index.html" \
| grep -oE '(href|src)="[^"]+"' \
| sed -E 's/^(href|src)="//;s/"$//' \
| sort -u \
| while IFS= read -r ref; do
case "$ref" in
http://*|https://*)
host=$(echo "$ref" | sed -E 's|^https?://([^/]+)/.*|\1|')
src_host=$(echo "$SOURCE_URL" | sed -E 's|^https?://([^/]+)/.*|\1|')
if [ "$host" != "$src_host" ]; then continue; fi
;;
esac
url=$(node -e "
const base = process.argv[1];
const ref = process.argv[2];
process.stdout.write(new URL(ref, base).toString());
" "$SOURCE_URL" "$ref")
filename=$(basename "$url" | sed 's/?.*$//')
[ -n "$filename" ] && curl -fsS "$url" -o "$INPUT/$filename" || true
done
node "<SKILL_DIR>/scripts/asset-collect.mjs" \
--input "${PROJ}/input" \
--base-url "$SOURCE_URL"
```

The script handles:
- HTML tags (`<img src>`, `<img srcset>`, `<picture>`, `<video>`, inline `style` url())
- CSS `url()` inside inline `<style>` blocks (including `@font-face src`)
- Per-asset strategy: local/ephemeral fonts → `vendor` (saved to `input/fonts/`),
local/ephemeral images/videos → `da-media` (saved to `input/images/` or `input/videos/`),
stable CDN URLs → `absolute` (left as-is).
Ephemeral hosts (e.g. claudeusercontent.com) are always downloaded.
When base URL is localhost (snapshot mode), all non-CDN external assets
are downloaded too.
- Hash-named font files: renamed to human-readable names from `@font-face` context
- References in `index.html` rewritten to normalized relative paths

Validate the manifest: check `asset-manifest.json` was written, review
any `warnings` entries (especially cross-origin font CORS risks).

Read `asset-manifest.json` to note asset stats for the init summary.

## Write a stub README.md for the project

```markdown
Expand All @@ -155,6 +146,8 @@ Status: capturing
## Update state and finish

Set `state.phase = "capture"` and `state.phaseStatus = "complete"`,
append `state.captureCompletedAt = "<timestamp>"`. Save state.json.
append `state.captureCompletedAt = "<timestamp>"` and
`state.assetStats` from `input/asset-manifest.json`'s `stats` block.
Save state.json.

Continue to Phase 2 (Analyze).
Original file line number Diff line number Diff line change
Expand Up @@ -89,19 +89,22 @@ For the HTML at `<projectsDir>/<NNN>-<slug>/input/index.html`:
- Reference a public CDN
- Concatenate into the per-template animations file

10. **Asset references** — count all relative paths
(`assets/...`, `./images/...`, etc.) and absolute paths to
external hosts. Decide the asset strategy:
- **Public source host**: rewrite to absolute URLs pointing at
that host
- **Local-only source host** (`localhost`, `127.0.0.1`): two
paths — vendor to `/assets/` in the repo (proven; ~38 MB
acceptable for one-off bespoke), or migrate to DA `/media/`
(out of scope unless asked). Default: vendor.
- **NOTE**: image URLs INSIDE DA cells must be absolute even
when template/fragment refs are root-relative (Media Bus
requires absolute). See `knowledge/learnings.md` 2026-05-19
Media Bus entry.
10. **Asset manifest** — read `input/asset-manifest.json` produced by
`asset-collect.mjs` in Phase 1. The script has already classified
every in-scope asset (raster images, videos, fonts), decided its
per-asset strategy, downloaded local/unreachable ones, and rewritten
references in `index.html`. No manual asset classification is needed.

Record the manifest's `stats` summary and any `warnings` in
`notes.md`. Surface CORS warnings for cross-origin fonts as a risk
item. Record the manifest path in `decisions.json` under
`assetManifest`.

**Reminder on DA cell image URLs**: DA cell image refs must use
relative paths (`images/filename.jpg`), not absolute branch URLs.
The calling pipeline's `rewriteImageRefs` rewrites these to DA
Media Bus URLs before the DA push. This is a Generate-phase
concern — note it here so Generate doesn't miss it.

## Produce `notes.md`

Expand Down Expand Up @@ -267,9 +270,13 @@ this directly. Suggested shape:
"externalLibs": [
{ "name": "lenis", "css": "assets/lenis.min.css", "js": "assets/lenis.min.js", "strategy": "vendor" }
],
"assetStrategy": "vendor",
"assetBase": "http://127.0.0.1:8080/path/to/page/",
"vendorAssetsTo": "assets/"
"assetManifest": "input/asset-manifest.json",
"assetStats": {
"total": 4,
"byStrategy": { "absolute": 2, "vendor": 2, "da-media": 0 },
"byType": { "font": 2, "image": 2, "video": 0 }
},
"assetWarnings": []
}
```

Expand Down
Loading
Loading