Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b21396a
feat(web): add web plugin with browser automation and page analysis s…
catalan-adobe May 30, 2026
e97c742
refactor(web): commit to playwright-cli as single browser layer
catalan-adobe Jun 2, 2026
9440ec0
fix(web): fix eval assertions and stripEnvelope bug
catalan-adobe Jun 2, 2026
7b0ec67
fix(web): fix four HIGH review findings
catalan-adobe Jun 2, 2026
c6f0a5a
fix(web): fix MEDIUM review findings
catalan-adobe Jun 2, 2026
daec9af
docs(domain-mask): remove internal proxy mechanics section
catalan-adobe Jun 2, 2026
7006e1d
docs(browser-probe): condense Step 3 into table, trim consumer section
catalan-adobe Jun 2, 2026
b91087c
docs(browser-probe): remove duplicate signal table, tighten mapping t…
catalan-adobe Jun 2, 2026
8968388
docs(cdp-ext-pilot): split tips into troubleshooting reference
catalan-adobe Jun 2, 2026
3869a7c
docs(page-prep): trim ~50 lines via reference extraction
catalan-adobe Jun 2, 2026
a8e6265
docs(page-prep): remove repeated cmp-match/heuristic explanations and…
catalan-adobe Jun 2, 2026
8a83e96
docs(page-prep): move format schemas to references/formats.md
catalan-adobe Jun 2, 2026
1721e4b
docs(page-prep): trim explanatory rationale passages
catalan-adobe Jun 2, 2026
f4f1b18
docs(page-prep): remove narration, IIFE explanation, restated mode info
catalan-adobe Jun 2, 2026
9bdf86a
docs(reduce-page): drop why-pattern block, compress Phase 1 JSON, rem…
catalan-adobe Jun 2, 2026
05cfa64
docs(visual-tree): remove Pipeline section and redundant tip
catalan-adobe Jun 2, 2026
e4c8b20
refactor(web): rename reduce-page → page-reduce, visual-tree → page-tree
catalan-adobe Jun 2, 2026
3687651
fix(browser-probe): resolve symlinks in isMain check, fallback for mi…
catalan-adobe Jun 4, 2026
1e20661
fix(cdp-ext-pilot): fall back to tab mode when no content script cont…
catalan-adobe Jun 4, 2026
7202e67
fix(page-prep): correct playwright-cli screenshot syntax in Step 9b
catalan-adobe Jun 4, 2026
d4aa441
fix(page-prep): exclude off-screen elements from DOM residual check
catalan-adobe Jun 4, 2026
14793cd
fix(page-collect): write tmp files inside output dir, not /tmp/
catalan-adobe Jun 4, 2026
d3330a8
docs(web): add contributor docs for playwright-cli constraints and lo…
catalan-adobe Jun 4, 2026
08a2324
feat(web): add page-langs skill for webpage language detection
catalan-adobe Jun 8, 2026
9897b94
docs(page-langs): add validation checkpoint and error-recovery guidance
catalan-adobe Jun 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,20 @@
"repository": "https://github.com/adobe/skills",
"license": "Apache-2.0"
},
{
"name": "web",
"source": "./plugins/web",
"description": "Browser automation and web page analysis skills using playwright-cli: connect via CDP, probe bot protection, dismiss overlays, capture DOM trees, reduce pages to skeletons, extract page resources.",
"version": "1.0.0",
"category": "web",
"keywords": ["browser", "playwright", "cdp", "web-scraping", "page-analysis", "automation"],
"author": {
"name": "Adobe"
},
"homepage": "https://github.com/adobe/skills",
"repository": "https://github.com/adobe/skills",
"license": "Apache-2.0"
},
{
"name": "aem-edge-delivery-services",
"source": "./plugins/aem/edge-delivery-services",
Expand Down
3 changes: 3 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,6 @@

# Stardust
/plugins/stardust @paolomoz

# Web (browser automation and page analysis)
/plugins/web @catalan-adobe
11 changes: 11 additions & 0 deletions plugins/web/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"name": "web",
"description": "Browser automation and web page analysis skills using playwright-cli: connect via CDP, probe CDN bot protection, dismiss overlays, capture spatial DOM trees, reduce pages to skeletons, and extract structured page resources.",
"version": "1.0.0",
"author": {
"name": "Adobe"
},
"repository": "https://github.com/adobe/skills",
"license": "Apache-2.0",
"keywords": ["browser", "playwright", "cdp", "web-scraping", "page-analysis", "automation"]
}
83 changes: 83 additions & 0 deletions plugins/web/docs/playwright-cli-constraints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# playwright-cli Constraints

All web plugin skills use `playwright-cli` as their browser layer. This document
covers constraints that affect skill authors — behaviours that differ from the
Playwright API and will silently break your skill if you're not aware of them.

## File Path Restrictions

`playwright-cli` restricts all file I/O to the **project root** and the
**`.playwright-cli/`** directory. Absolute paths outside these roots are denied
at runtime with a `File access denied` error.

Affected commands:
- `screenshot --filename <path>`
- `run-code --filename <path>`

**Do not use `os.tmpdir()` or `/tmp/` for any file that playwright-cli reads or
writes.** Use the output directory (which must be project-relative) or
`.playwright-cli/` instead.

```js
// ✗ Breaks — /tmp/ is outside allowed roots
const configPath = join(tmpdir(), `my-skill-${process.pid}-config.json`);

// ✓ Works — output dir is project-relative
const configPath = join(outputDir, `.tmp-${process.pid}-config.json`);
```

Clean up temp files after use to avoid polluting the output directory.

## Screenshot Syntax

The `screenshot` command takes an **optional element selector** as its positional
argument, not a file path. Passing a file path as a positional argument causes a
`Unexpected token while parsing css selector` error.

```bash
# ✗ Wrong — path is parsed as a CSS selector
playwright-cli -s <session> screenshot /path/to/file.png

# ✓ Correct — use --filename flag
playwright-cli -s <session> screenshot --filename .playwright-cli/file.png
```

The `-s <session>` flag is required. The path must be within the allowed roots
(see above). After saving, use the `Read` tool to view the image.

## eval Expression Constraints

`playwright-cli eval` wraps your input as `() => (EXPR)` internally. This means:

- **Semicolons silently fail** — the wrapper expects a single expression, not
multiple statements separated by `;`. The command exits 0 but returns nothing.
- **`return` is not valid** — you're inside an arrow function expression body.
- **IIFEs work** — `(function(){ ...; return value; })()` is a valid expression.
- **Comma operator works** for chaining side effects:
`(a.remove(), b.remove(), 'done')`

```js
// ✗ Silent failure — semicolons split into statements
playwright-cli eval "a.remove(); b.remove(); 'done'"

// ✓ Comma operator
playwright-cli eval "(a.remove(), b.remove(), 'done')"

// ✓ IIFE
playwright-cli eval "(function(){ a.remove(); b.remove(); return 'done'; })()"
```

## initScript Path Resolution

When building a `--config` JSON that includes `browser.initScript`, paths must
also be within the allowed roots. Temp script files written to `/tmp/` will be
rejected.

Write initScript files to the output directory or `.playwright-cli/` and clean
them up after the session closes.

## Session Naming

Session names passed via `-s <name>` persist across calls in the same
working directory. Always close sessions explicitly with
`playwright-cli -s <name> close` to avoid stale sessions blocking future runs.
64 changes: 64 additions & 0 deletions plugins/web/docs/testing-locally.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Testing Skills Locally

This document explains how to test changes to web plugin skills in a Claude Code
session before opening a PR.

## Setup

Copy the skills you want to test into a project-scope `.claude/skills/` directory
in your worktree. Claude Code loads project-scope skills before global ones, so
your local copies take effect in any session started from that worktree.

```bash
# From the worktree root
mkdir -p .claude/skills

for skill in plugins/web/skills/*/; do
cp -r "$skill" ".claude/skills/$(basename $skill)"
done
```

Use copies, not symlinks. Symlinks to directories cause a path mismatch in
`isMain` guards that use `import.meta.url` — the guard sees the real path but
`process.argv[1]` has the symlink path, so the script's `main()` never runs.

## Precedence Limitation

Project-scope skills only override globally installed skills for skill names that
**do not already exist globally**. If a user has `cdp-connect` installed globally,
your project-scope copy of `cdp-connect` will be ignored — the global version wins.

This means:
- **New skills** (e.g. `browser-probe`, `page-tree`, `page-reduce`) — project-scope
works correctly; invoke them with the `Skill` tool as normal.
- **Updated existing skills** — the global version loads. To test changes, either
update the global install directly (`~/.claude/skills/<name>/`) or read and
follow the project-local `SKILL.md` manually, pointing scripts at the local path.

## Syncing Edits Back

The `.claude/skills/` directory is untracked (add it to `.gitignore` if needed).
Edits you make to test a fix must be **manually synced back** to `plugins/web/skills/`
before committing — the repo tracks the plugin source, not the test copies.

```bash
# After editing .claude/skills/<name>/scripts/foo.js
cp .claude/skills/<name>/scripts/foo.js plugins/web/skills/<name>/scripts/foo.js
git add plugins/web/skills/<name>/scripts/foo.js
```

## Starting a Test Session

Start Claude Code from the worktree root. The project-scope skills load at
session start — changes to `.claude/skills/` after session start are not picked up
until the next session.

```bash
cd <worktree-root>
claude
```

Invoke skills via the `Skill` tool as you normally would. The base directory
printed at skill load time confirms which copy loaded:
- `Base directory: /path/to/worktree/.claude/skills/<name>` → project-scope copy
- `Base directory: /Users/<you>/.claude/skills/<name>` → global install
1 change: 1 addition & 0 deletions plugins/web/skills/browser-probe/.releaserc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"extends": "../../../../../release.config.cjs"}
130 changes: 130 additions & 0 deletions plugins/web/skills/browser-probe/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
name: browser-probe
license: Apache-2.0
compatibility: Requires playwright-cli on PATH. Run `playwright-cli --help` for usage.
description: >-
Probe a URL with escalating headless browser configurations to detect CDN bot
protection (Akamai, Cloudflare, DataDome, AWS WAF) and produce a
browser-recipe.json that downstream playwright-cli consumers use to bypass
blocking. Runs an automated escalation ladder: default headless → stealth
script injection → system Chrome (TLS fingerprint fix) → persistent profile.
Use BEFORE any playwright-cli interaction with an untrusted domain. Triggers
on: browser probe, site blocked, headless blocked, CDN blocking, bot
detection, browser recipe, can't load page, 403 error page, access denied.
---

# Browser Probe

Detect CDN bot protection blocking headless Chrome and produce a browser recipe
for downstream `playwright-cli` consumers. Node 22+ required. No npm
dependencies.

## When to Use

Run **before** any `playwright-cli` interaction with an untested domain, or when
a downstream script reports a blocked/empty page (403, "access denied", "captcha").

## Script Location

```bash
if [[ -n "${CLAUDE_SKILL_DIR:-}" ]]; then
PROBE_DIR="${CLAUDE_SKILL_DIR}/scripts"
else
PROBE_DIR="$(dirname "$(command -v browser-probe.js 2>/dev/null || \
find ~/.claude -path "*/browser-probe/scripts/browser-probe.js" \
-type f 2>/dev/null | head -1)")"
fi
```

## Workflow

### Step 1 — Run the probe

```bash
node "$PROBE_DIR/browser-probe.js" "$URL" "$OUTPUT_DIR"
```

The script tries up to 5 browser configurations, stopping at the first success:

1. **default** — headless Chromium (baseline)
2. **stealth** — headless Chromium + JS stealth init script (patches `navigator.webdriver`, plugins, languages)
3. **stealth-ua** — headless Chromium + JS stealth + User-Agent override (removes `HeadlessChrome` from HTTP UA header via `--user-agent` launch arg)
4. **chrome** — system Chrome (`--browser=chrome`) + JS stealth + UA override (fixes TLS fingerprint detection)
5. **persistent** — system Chrome + JS stealth + UA override + persistent profile (cookie/session challenges)

Output: `$OUTPUT_DIR/probe-report.json`

### Step 2 — Read the report

Load `probe-report.json`. Check `firstSuccess`:
- If non-null: a configuration worked. Proceed to Step 3.
- If null: all configurations failed. Skip to Step 5.

### Step 3 — Interpret results

Match `detectedSignals` against the Provider Signature Table in
`references/stealth-config.md` to confirm why blocking occurred and validate
that `firstSuccess` is the minimum sufficient config.

### Step 4 — Generate recipe

Write `browser-recipe.json` to `$OUTPUT_DIR`:

```json
{
"url": "<probed URL>",
"generated": "<ISO timestamp>",
"cliConfig": {
"browser": {
"browserName": "chromium",
"launchOptions": { "channel": "<from firstSuccess step>" }
}
},
"stealthInitScript": "<full script from stealth-config.md if stealth was needed>",
"notes": "<1-2 sentence explanation of what was detected and why this config>"
}
```

**Config mapping from `firstSuccess`:**

| firstSuccess | channel | args | stealthInitScript |
|---|---|---|---|
| `default` | — | — | null |
| `stealth` | — | — | from reference |
| `stealth-ua` | — | `--user-agent=<realistic UA>` | from reference |
| `chrome` | `chrome` | `--user-agent=<realistic UA>` | from reference |
| `persistent` | `chrome` | `--user-agent=<realistic UA>` | from reference |

If `firstSuccess` is `persistent`, add `"persistent": true` to the recipe.

### Step 5 — Report results

**If a configuration worked:**
```
Browser probe complete for <url>.
Working config: <firstSuccess>
Detected: <detectedSignals or "no bot protection detected">
Recipe: <path to browser-recipe.json>
```

**If all configurations failed:**
```
Browser probe failed for <url>. No headless configuration could load the page.
Tried: default, stealth, stealth-ua, chrome, persistent
Detected signals: <detectedSignals>

Options:
1. Use --headed flag for manual browser interaction
2. Provide pre-captured data (DOM snapshot, screenshots) manually
3. Check if the URL requires authentication or VPN access
```

Do NOT produce a recipe when all steps fail. Do NOT silently continue
with a broken configuration.

## How Consumers Use the Recipe

Pass `--config=<path-to-cliConfig>` to `playwright-cli open`. If the recipe has
`stealthInitScript`, add it to `browser.initScript` in the config (not via `eval` —
eval is expression-only). If `"persistent": true`, also pass `--persistent`.
Run `playwright-cli --help` for the full command reference.
18 changes: 18 additions & 0 deletions plugins/web/skills/browser-probe/evals/evals.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"skill_name": "browser-probe",
"evals": [
{
"id": 1,
"prompt": "Check if https://example.com has bot protection and get a browser recipe for it",
"expected_output": "A browser-recipe.json is generated showing the detected protection level and recommended configuration.",
"files": [],
"assertions": [
{
"type": "command_succeeds",
"command": "node --check scripts/browser-probe.js",
"description": "Browser probe script has valid syntax."
}
]
}
]
}
1 change: 1 addition & 0 deletions plugins/web/skills/browser-probe/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{ "name": "browser-probe", "version": "0.0.0-semantically-released", "private": true }
Loading