It is incorrect to "normalize" // in HTTP URL paths

The claim
It has been reported that collapsing // to / inside an HTTP URL path is not normalization. RFC 3986—the URI generic syntax—allows empty path segments, so a double slash literally encodes a zero‑length segment between two separators. In plain English: // is not a typo. It's syntactically meaningful.
The technical heart of the matter
RFC 3986 defines segment = pchar, which permits an empty segment, and path‑abempty = ( "/" segment ) therefore allows a slash followed by an empty segment. Dot‑segments ('.' and '..') are the special cases intended for hierarchical resolution and are removed during resolution; otherwise path segments are opaque. HTTP (RFC 9110) adopts that same path grammar for request targets. So any transformation that collapses // to / changes the parsed sequence of segments and thus changes the identifier the URI denotes. Simple as that.
Why you should care
Why should you care? Because collapsing empty segments can change which resource a request targets. Caches, reverse proxies, CDNs, routers and origin servers may treat /a//b and /a/b differently. That can break redirects, invalidate caches, create security oddities, or just lead to subtle bugs that are maddening to debug. Think “not the same page” rather than “cleaner URL.”
Practical takeaway
Don’t apply file‑system intuition here. URL path normalization should follow the RFCs: remove dot‑segments where the resolution rules require it, but don’t erase zero‑length segments by fiat. If you run web infrastructure or write URL handling code, test how your stack treats consecutive slashes and consult RFC 3986 and RFC 9110 before you start rewriting paths.
Sources: runxiyu.org, Hacker News
Comments