adobe · catalan-adobe · May 30, 2026 · Jun 2, 2026 · Jun 2, 2026 · Jun 2, 2026
@@ -48,6 +48,20 @@
       "repository": "https://github.com/adobe/skills",
       "license": "Apache-2.0"
     },
+    {
+      "name": "web",
+      "source": "./plugins/web",
+      "description": "Browser automation and web page analysis skills: detect the browser layer, connect via CDP, probe bot protection, dismiss overlays, capture DOM trees, reduce pages to skeletons, extract page resources.",
+      "version": "1.0.0",
+      "category": "web",
+      "keywords": ["browser", "playwright", "cdp", "web-scraping", "page-analysis", "automation"],
+      "author": {
+        "name": "Adobe"
+      },
+      "homepage": "https://github.com/adobe/skills",
+      "repository": "https://github.com/adobe/skills",
+      "license": "Apache-2.0"
+    },
     {
       "name": "aem-edge-delivery-services",
       "source": "./plugins/aem/edge-delivery-services",

@@ -38,3 +38,6 @@
 
 # Stardust
 /plugins/stardust                                      @paolomoz
+
+# Web (browser automation and page analysis)
+/plugins/web                                           @catalan-adobe
@@ -0,0 +1,11 @@
+{
+  "name": "web",
+  "description": "Browser automation and web page analysis skills: detect the available browser layer, connect via CDP, probe CDN bot protection, dismiss overlays, capture spatial DOM trees, reduce pages to skeletons, and extract structured page resources.",
+  "version": "1.0.0",
+  "author": {
+    "name": "Adobe"
+  },
+  "repository": "https://github.com/adobe/skills",
+  "license": "Apache-2.0",
+  "keywords": ["browser", "playwright", "cdp", "web-scraping", "page-analysis", "automation"]
+}
@@ -0,0 +1 @@
+{"extends": "../../../../../release.config.cjs"}
@@ -0,0 +1,160 @@
+---
+name: browser-probe
+license: Apache-2.0
+description: >-
+  Probe a URL with escalating headless browser configurations to detect CDN bot
+  protection (Akamai, Cloudflare, DataDome, AWS WAF) and produce a
+  browser-recipe.json that downstream playwright-cli consumers use to bypass
+  blocking. Runs an automated escalation ladder: default headless → stealth
+  script injection → system Chrome (TLS fingerprint fix) → persistent profile.
+  Use BEFORE any playwright-cli interaction with an untrusted domain. Triggers
+  on: browser probe, site blocked, headless blocked, CDN blocking, bot
+  detection, browser recipe, can't load page, 403 error page, access denied.
+---
+
+# Browser Probe
+
+Detect CDN bot protection blocking headless Chrome and produce a browser recipe
+for downstream `playwright-cli` consumers. Node 22+ required. No npm
+dependencies.
+
+## When to Use
+
+Run this skill **before** any `playwright-cli` interaction with a domain you
+haven't tested, or when a downstream script reports a blocked page. Common
+triggers:
+
+- First interaction with a new domain
+- `capture-snapshot.js` produces empty/error snapshots
+- Page title contains "error", "denied", "blocked", "captcha"
+- HTTP 403 responses from headless browser
+
+## Script Location
+
+```bash
+if [[ -n "${CLAUDE_SKILL_DIR:-}" ]]; then
+  PROBE_DIR="${CLAUDE_SKILL_DIR}/scripts"
+else
+  PROBE_DIR="$(dirname "$(command -v browser-probe.js 2>/dev/null || \
+    find ~/.claude -path "*/browser-probe/scripts/browser-probe.js" \
+    -type f 2>/dev/null | head -1)")"
+fi
+```
+
+## Workflow
+
+### Step 1 — Run the probe
+
+```bash
+node "$PROBE_DIR/browser-probe.js" "$URL" "$OUTPUT_DIR"
+```
+
+The script tries up to 5 browser configurations, stopping at the first success:
+
+1. **default** — headless Chromium (baseline)
+2. **stealth** — headless Chromium + JS stealth init script (patches `navigator.webdriver`, plugins, languages)
+3. **stealth-ua** — headless Chromium + JS stealth + User-Agent override (removes `HeadlessChrome` from HTTP UA header via `--user-agent` launch arg)
+4. **chrome** — system Chrome (`--browser=chrome`) + JS stealth + UA override (fixes TLS fingerprint detection)
+5. **persistent** — system Chrome + JS stealth + UA override + persistent profile (cookie/session challenges)
+
+Output: `$OUTPUT_DIR/probe-report.json`
+
+### Step 2 — Read the report
+
+Load `probe-report.json`. Check `firstSuccess`:
+- If non-null: a configuration worked. Proceed to Step 3.
+- If null: all configurations failed. Skip to Step 5.
+
+### Step 3 — Interpret results
+
+Load the stealth configuration reference at `references/stealth-config.md` and match the
+`detectedSignals` array against the Provider Signature Table.
+
+Key interpretation rules:
+- `cloudfront-block` or `stealth` fails but `stealth-ua` succeeds →
+  CloudFront WAF UA-based blocking (matches `HeadlessChrome` in HTTP
+  User-Agent header). Common on pharma/enterprise sites. Simple fix,
+  no TLS concerns. `stealth-ua` is the minimum working config.
+- `cloudfront` without `cloudfront-block` → CloudFront present but not
+  actively blocking. Default config may work.
+- `akamai-server` or `akamai-bot-manager` → TLS fingerprint blocking.
+  System Chrome is the fix. Stealth + UA alone is insufficient.
+- `cloudflare-ray` without `cloudflare-challenge` → Cloudflare present
+  but not actively blocking. Default config may work.
+- `cloudflare-challenge` → Active JS challenge. System Chrome + stealth
+  + UA usually resolves it.
+- `datadome` → Aggressive detection. System Chrome + stealth + UA required.
+- `aws-waf` → Usually UA-based. Stealth + UA often sufficient.
+- No signals + blocked → Unknown protection. Persistent profile is last
+  resort.
+
+### Step 4 — Generate recipe
+
+Write `browser-recipe.json` to `$OUTPUT_DIR`:
+
+```json
+{
+  "url": "<probed URL>",
+  "generated": "<ISO timestamp>",
+  "cliConfig": {
+    "browser": {
+      "browserName": "chromium",
+      "launchOptions": { "channel": "<from firstSuccess step>" }
+    }
+  },
+  "stealthInitScript": "<full script from stealth-config.md if stealth was needed>",
+  "notes": "<1-2 sentence explanation of what was detected and why this config>"
+}
+```
+
+**Config mapping from `firstSuccess`:**
+
+| firstSuccess | cliConfig.launchOptions | stealthInitScript |
+|---|---|---|
+| `default` | `{}` (no channel, no args) | `null` (not needed) |
+| `stealth` | `{}` (no channel, no args) | Full stealth script from reference |
+| `stealth-ua` | `{ "args": ["--user-agent=<realistic UA>"] }` | Full stealth script from reference |
+| `chrome` | `{ "channel": "chrome", "args": ["--user-agent=<realistic UA>"] }` | Full stealth script from reference |
+| `persistent` | `{ "channel": "chrome", "args": ["--user-agent=<realistic UA>"] }` | Full stealth script from reference |
+
+If `firstSuccess` is `persistent`, add a `"persistent": true` field to the
+recipe so consumers know to use `--persistent`.
+
+### Step 5 — Report results
+
+**If a configuration worked:**
+```
+Browser probe complete for <url>.
+  Working config: <firstSuccess>
+  Detected: <detectedSignals or "no bot protection detected">
+  Recipe: <path to browser-recipe.json>
+```
+
+**If all configurations failed:**
+```
+Browser probe failed for <url>. No headless configuration could load the page.
+  Tried: default, stealth, stealth-ua, chrome, persistent
+  Detected signals: <detectedSignals>
+
+  Options:
+  1. Use --headed flag for manual browser interaction
+  2. Provide pre-captured data (DOM snapshot, screenshots) manually
+  3. Check if the URL requires authentication or VPN access
+```
+
+Do NOT produce a recipe when all steps fail. Do NOT silently continue
+with a broken configuration.
+
+## How Consumers Use the Recipe
+
+Any script using `playwright-cli` can consume `browser-recipe.json`:
+
+1. Write `cliConfig` to a temp file (e.g., `/tmp/probe-cli-config.json`)
+2. If recipe has `stealthInitScript`, write it to a temp file and add
+   it to the config's `browser.initScript` array (do NOT use
+   `playwright-cli eval` — eval only accepts pure expressions, not
+   multi-statement scripts)
+3. Pass `--config=/tmp/probe-cli-config.json` to `playwright-cli open`
+4. Proceed with normal `goto <url>` and workflow
+
+If recipe has `"persistent": true`, also pass `--persistent` to `open`.
@@ -0,0 +1,18 @@
+{
+  "skill_name": "browser-probe",
+  "evals": [
+    {
+      "id": 1,
+      "prompt": "Check if https://example.com has bot protection and get a browser recipe for it",
+      "expected_output": "A browser-recipe.json is generated showing the detected protection level and recommended configuration.",
+      "files": [],
+      "assertions": [
+        {
+          "type": "command_succeeds",
+          "command": "node -e \"require('./scripts/browser-probe.js')\"",
+          "description": "Browser probe script loads without syntax errors."
+        }
+      ]
+    }
+  ]
+}
@@ -0,0 +1 @@
+{ "name": "browser-probe", "version": "0.0.0-semantically-released", "private": true }
@@ -0,0 +1,98 @@
+# Stealth Configuration Reference
+
+## Stealth Init Script
+
+Inject via `initScript` in the playwright-cli config (NOT via `eval` —
+eval only accepts pure expressions, not multi-statement scripts). Write
+this script to a temp file and add the path to `browser.initScript` in
+the config. It runs before any page JS loads, patching browser
+fingerprints that headless detection relies on.
+
+```js
+(function() {
+  // Hide webdriver property (primary headless signal)
+  Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
+
+  // Add realistic plugins (headless Chrome has empty plugins array)
+  Object.defineProperty(navigator, 'plugins', {
+    get: () => [
+      { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer', description: 'Portable Document Format' },
+      { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai', description: '' },
+      { name: 'Native Client', filename: 'internal-nacl-plugin', description: '' },
+    ],
+  });
+
+  // Set realistic languages (headless may report empty)
+  Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
+
+  // Add chrome runtime object (missing in headless)
+  window.chrome = { runtime: {} };
+})()
+```
+
+## User-Agent Override
+
+Chromium's headless mode injects `HeadlessChrome` into the HTTP User-Agent
+header. Many WAFs (especially CloudFront) use simple string matching on this
+token as a first-pass bot filter. This is an HTTP-level signal — JS stealth
+patches cannot change it.
+
+Fix: pass a realistic UA via Chrome launch arg in a `playwright-cli` config file:
+
+```json
+{
+  "browser": {
+    "browserName": "chromium",
+    "launchOptions": {
+      "args": ["--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"]
+    }
+  }
+}
+```
+
+Usage: `playwright-cli -s=<session> open --config=<path-to-config>`
+
+## Stealth HTTP Headers
+
+These headers mimic a real Chrome session. Currently not injectable via
+`playwright-cli` (no `extraHTTPHeaders` support). Documented for future use
+or for scripts using Playwright API directly.
+
+| Header | Value |
+|--------|-------|
+| `Accept` | `text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8` |
+| `Accept-Language` | `en-US,en;q=0.9` |
+| `Accept-Encoding` | `gzip, deflate, br` |
+| `Cache-Control` | `no-cache` |
+| `Sec-Ch-Ua` | `"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"` |
+| `Sec-Ch-Ua-Mobile` | `?0` |
+| `Sec-Ch-Ua-Platform` | `"macOS"` |
+| `Sec-Fetch-Dest` | `document` |
+| `Sec-Fetch-Mode` | `navigate` |
+| `Sec-Fetch-Site` | `none` |
+| `Sec-Fetch-User` | `?1` |
+| `Upgrade-Insecure-Requests` | `1` |
+
+## Provider Signature Table
+
+Maps observable signals (from `playwright-cli network` response headers and
+page content) to CDN bot detection providers and typical remedies.
+
+| Signal | Provider | Confidence | Typical fix |
+|--------|----------|------------|-------------|
+| `server: AkamaiGHost` or `server: AkamaiNetStorage` | Akamai | medium | System Chrome (`--browser=chrome`) — TLS fingerprint |
+| `bm_sz` cookie in `set-cookie` | Akamai Bot Manager | high | System Chrome — TLS fingerprint |
+| `_abck` cookie in `set-cookie` | Akamai Bot Manager | high | System Chrome — TLS fingerprint |
+| `stealth` blocked + `stealth-ua` succeeds (no provider headers) | CloudFront UA filter | high | UA override (`--user-agent` launch arg) |
+| `cf-ray` header present | Cloudflare | medium | Stealth script often sufficient |
+| Page title contains "Just a moment" or "Checking your browser" | Cloudflare Challenge | high | System Chrome + stealth |
+| `x-datadome` header present | DataDome | high | System Chrome + stealth |
+| `x-amzn-waf-action` header present | AWS WAF | medium | Stealth script (UA-based detection) |
+| `x-cdn: Imperva` or `x-iinfo` header | Incapsula/Imperva | medium | System Chrome + stealth |
+| Page title contains "Access Denied" + `server: AkamaiGHost` | Akamai hard block | high | System Chrome — TLS fingerprint |
+| `server: CloudFront` or `x-amz-cf-id` header | CloudFront | medium | Stealth script (often UA-based) |
+| Page title contains "The request could not be satisfied" | CloudFront WAF block | high | UA override or stealth script |
+| `stealth` (JS-only) succeeds, `default` blocked | JS fingerprint detection | high | Stealth script sufficient |
+| `stealth` fails but `stealth-ua` succeeds | HTTP UA-based blocking | high | UA override (`--user-agent` launch arg) |
+| Page title matches `/error\|denied\|blocked\|403\|captcha/i` + no known provider | Unknown WAF | low | Escalate to persistent profile |
+| `status: 403` + `bodyLength < 500` | Generic block | low | Escalate through all steps |
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"extends": "../../../../../release.config.cjs"}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{ "name": "browser-probe", "version": "0.0.0-semantically-released", "private": true }