Skip to content

fix: support CONNECT and absolute-form parsing with http >= 1.4.1#912

Open
SameerVers3 wants to merge 1 commit into
cloudflare:mainfrom
SameerVers3:fix/http-parsing-bug
Open

fix: support CONNECT and absolute-form parsing with http >= 1.4.1#912
SameerVers3 wants to merge 1 commit into
cloudflare:mainfrom
SameerVers3:fix/http-parsing-bug

Conversation

@SameerVers3

Copy link
Copy Markdown

Fix HTTP/1 CONNECT and absolute-form request parsing failures when using http crate >= 1.4.1.

Changes:

  • Parse non-origin targets as a full Uri in set_raw_path().
  • Safely handle empty paths in raw_path() to avoid panics.

Fixes #909

@SameerVers3 SameerVers3 force-pushed the fix/http-parsing-bug branch from 55667e9 to 41fef7e Compare June 14, 2026 08:24
@torinnd

torinnd commented Jun 15, 2026

Copy link
Copy Markdown

(with a similar caveat as my issue re: LLM-assistance, plus me not having access to patched_http1)

Thanks for picking this up so quickly — the parse_uri form-detection approach is cleaner than what I'd been carrying locally, and I've adopted it.

One note about the non-UTF-8 path: The lossy branch now routes through parse_uri, so a target that isn't valid UTF-8 and doesn't start with / goes to str::parse::<Uri>() instead of path_and_query(). Since Uri::from_str treats a no-scheme/no-slash string as authority-form rather than a path, I think this changes the behavior the two patched_http1-gated tests rely on:

let raw_path = b"Hello\xF0\x90\x80World"; // invalid UTF-8, no leading slash
let req = RequestHeader::build("GET", &raw_path[..], None).unwrap();
assert_eq!("Hello\u{FFFD}World", req.uri.path_and_query().unwrap());

Previously path_and_query() (with the patched http crate) produced a path-and-query of Hello\u{FFFD}World. With parse_uri, the lossy string "Hello\u{FFFD}World" has no scheme and no leading slash, so parse::<Uri>() should parse it as authority-form (or reject it), making path_and_query() return None and the .unwrap() panic. So I'd expect test_invalid_path and test_override_invalid_path to fail under --features patched_http1.

I can't verify directly since I'm on the stock http. Is patched_http1 exercised in CI here? And is the reinterpretation of non-slash lossy targets as authority-form intended (in which case those two tests probably want updating) or should the lossy branch keep using path_and_query() to preserve the old path-form behavior?

@SameerVers3

Copy link
Copy Markdown
Author

Yes, you are right, I missed that. Thanks for pointing it out!

I'll keep using the original path_and_query() builder for the lossy branch to preserve the path-form behavior.

@SameerVers3 SameerVers3 force-pushed the fix/http-parsing-bug branch from 41fef7e to 507cbad Compare June 15, 2026 20:37
@torinnd

torinnd commented Jun 16, 2026

Copy link
Copy Markdown

Yeah, that seems more correct! One nit, I think this is now a bit stale:

// store the raw path bytes only if it is invalid utf-8
raw_path_fallback: Vec<u8>, // can also be Box<[u8]>

Maybe it's "raw target bytes used when serializing, set for non-UTF-8 paths and for URIs with no path-and-query (authority-form / absolute-form without a path)"?

@SameerVers3

Copy link
Copy Markdown
Author

Good catch! I will update the comment

@SameerVers3 SameerVers3 force-pushed the fix/http-parsing-bug branch from 507cbad to 066363f Compare June 16, 2026 16:31
@duke8253 duke8253 added the bug Something isn't working label Jun 19, 2026
@torinnd

torinnd commented Jun 22, 2026

Copy link
Copy Markdown

@andrewhavck fyi, I think the CI failures in this feature are unrelated. If it's interesting, I've sicced an LLM with context of the codebase on the issue and it said the following:

It's a teardown ordering bug in the test, not a defect in read_body_or_idle. Request and response are the two halves of stream 1. The server writes its response with end_of_stream = false and never sends a response EOS — it only calls read_body_or_idle to wait for the client's request EOS. The client reads one response DATA frame, sends its request EOS, then returns, dropping the still-open response RecvStream. Dropping an open RecvStream makes the h2 client emit RST_STREAM(CANCEL) on stream 1, which resets the whole stream and races the server's read — so read_body_or_idle intermittently observes Reset(StreamId(1), CANCEL, Remote) instead of the clean empty-DATA-EOS and the .unwrap() panics.

Surfacing a remote CANCEL as an error is correct, so the fix belongs in the test: hold the client's stream open until the server has finished reading the request EOS (e.g. a shared tokio::sync::Notify signalled after read_body_or_idle returns), so teardown happens strictly after the read. I haven't reproduced it locally — the analysis is from reading the test — so worth confirming before landing.

I don't do much Rust development, but I've ran into similar test tear-down races in similar codebases in other languages and think the assessment is plausible.

@CodyPubNub

Copy link
Copy Markdown
Contributor

Ran into the same http 1.4.1 breakage and put up #923 as an alternative, but a few things land differently and I figured they're worth comparing:

  • No-path absolute-form: the raw_path_fallback here re-emits http://host on the wire, but §3.2.1 wants origin-form to send at least /. Mine normalizes to / and keeps the authority on the Uri so callers can still get at it.
  • CONNECT: parse::<Uri>() also takes host-only (example.com) and userinfo (user@host:port). §3.2.3 is just uri-host ":" port, so those should get rejected.
  • http://host?q: worth canonicalizing the stored Uri, not only the wire bytes, since proxy_h2 rebuilds the H2 :path straight off uri.path_and_query().

Re the lossy branch above, I agree keeping it on path_and_query() is the right call. One note: test_invalid_path/test_override_invalid_path already fail at base on http ≥ 1.4.1 (non-UTF-8, no leading slash), so that's pre-existing either way.

Either approach works for me. I'm happy to fold into whichever the maintainers want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP/1 CONNECT and absolute-form requests fail to parse with http crate >= 1.4.1 (set_raw_path rejects non-origin-form request targets)

5 participants