Skip to content

[WIP] Improve blocking pattern detection in submit mode#2804

Draft
yyq1043-cloud wants to merge 3 commits into
soxoj:mainfrom
yyq1043-cloud:improve-blocking-detection
Draft

[WIP] Improve blocking pattern detection in submit mode#2804
yyq1043-cloud wants to merge 3 commits into
soxoj:mainfrom
yyq1043-cloud:improve-blocking-detection

Conversation

@yyq1043-cloud

Copy link
Copy Markdown
Contributor

Summary

Expand Cloudflare and anti-bot detection in the submit mode ("check_features_manually") to catch more blocking patterns with clearer error messages.

Before:
Only detected /cdn-cgi/challenge-platform, now: pattern, and "Sorry, you have been blocked".

After:
Detects 6 common blocking patterns:

  • Cloudflare challenge platform
  • Cloudflare turnstile ("Now checking your browser")
  • Cloudflare general blocks
  • Generic "blocked" pages
  • Access denied pages
  • Cloudflare-specific header presence

Example improvement:
When a Cloudflare turnstile is detected, the user now sees:

Cloudflare turnstile detected, skipping

instead of the generic message.

Closes #2668 (partial: addresses the "filter by errors" TODO)

yyq1043 added 3 commits June 26, 2026 20:05
Adds quick-access links to web archives (Wayback Machine and archive.is)
in profile URL report blocks, allowing users to see historical snapshots
of profile pages when they still exist but are no longer accessible.

Closes soxoj#247
Expand Cloudflare/anti-bot detection to catch more blocking patterns
and provide clearer error messages for each type.

Addresses the 'filter by errors' TODO from submit mode improvements.
When a site blocks automated access, include the HTTP status codes
from the existing/non-existing account responses in the error message
for easier debugging.
@yyq1043-cloud

Copy link
Copy Markdown
Contributor Author

Hi! I'd like to request review for this PR: Improve blocking pattern detection in submit mode
Requesting review from: @soxoj
Thank you! πŸ™‚

@soxoj

soxoj commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Thanks for the PR! A few blockers before this can be merged:

  1. Scope mismatch. Title and Complete --submit mode: urlProbe, activation, cookies, status_codes, update-existingΒ #2668 say "submit mode blocking detection", but 2 of 3 files (simple_report.tpl, simple_report_pdf.tpl) add web.archive.org / archive.is links to HTML/PDF reports β€” unrelated. Please split into two PRs.

  2. Wrong pattern. "CF-Chl-Alg-List:" is an HTTP header name, but first_html_response is the response body β€” headers won't appear there, so this check never fires. Also, "Now checking your browser" is labeled "Cloudflare turnstile", but that string is the old interstitial challenge page; Turnstile is a widget with different markers.

  3. Silent regression. The original "\t\t\t\tnow: " pattern is dropped without explanation β€” please keep it or justify removal.

  4. No tests for the new patterns.

  5. English-only markers. Many sites in maigret are non-English; consider whether matching only English strings is enough, or if these need locale-agnostic signals.

@soxoj soxoj marked this pull request as draft June 29, 2026 13:50
@soxoj soxoj changed the title Improve blocking pattern detection in submit mode [WIP] Improve blocking pattern detection in submit mode Jun 29, 2026
@yyq1043-cloud

Copy link
Copy Markdown
Contributor Author

Hi! I'd like to request review for this PR: [WIP] Improve blocking pattern detection in submit mode
Requesting review from: @soxoj
Thank you! πŸ™‚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Complete --submit mode: urlProbe, activation, cookies, status_codes, update-existing

2 participants