dev101.io

URL Parser

Break URLs into parts — scheme, host, port, path, query, fragment, IDN, and PSL.

Runs online in your browserNo installNo accountFree

Loading tool…

How to use URL Parser

  1. Paste a URL into the top input — absolute, relative, with credentials, or with an IPv6 literal host.
  2. Click Add base URL if the URL is relative or protocol-relative, then paste an absolute base like `https://example.com`.
  3. Read the Cleaned URL card at the top for the canonical form after WHATWG normalisation.
  4. Scan the Parts card for protocol, origin, host, hostname, port, pathname, hash, and credentials.
  5. Check the Domain breakdown row for the subdomain / SLD / TLD split and any IDN / punycode pair.
  6. Inspect each path segment for raw, decoded, and re-encoded forms — segments with a `re-encodes` badge change under round-trip.
  7. Edit the Query parameters table to tweak keys, values, duplicates, or ordering; the Cleaned URL updates on every keystroke.
  8. Copy any field with its per-row Copy button, or hit Share to produce a link that restores the whole session.

URL Parser

A client-side URL inspector. Paste any URL — absolute, relative, with credentials, with an IPv6 literal host, with a punycode IDN — and see every component split out with per-field copy buttons, an editable query-parameter table, and a live Cleaned URL at the top. Runs in your browser, never touches a server, uses only the WHATWG URL and URLSearchParams globals plus a hand-rolled RFC 3492 punycode decoder for IDN display.

Why a dedicated URL parser

Every time you debug an OAuth redirect, a tracking-link misconfiguration, or a CDN rewrite rule, you end up writing the same one-liner: new URL(x); u.searchParams.getAll("utm_source"); u.pathname.split("/"). It works, but silent bugs creep in. Is localhost a domain or a special case? Is the port 0 or an empty string? Did that trailing slash become an empty last segment or get stripped? Is xn--something a punycode label, and what's the Unicode form? The built-in URL parser answers the first question but forces you to answer the rest by hand, every time.

The URL Parser tool runs the full pipeline in one go. It classifies the host as domain, ipv4, or ipv6. It splits domains against a built-in mini-PSL so sub.example.co.uk correctly shows co.uk as the TLD and example as the SLD. It percent-decodes every path segment and compares against the re-encoded form, flagging segments that the browser would re-normalise. It groups duplicate query keys so ?a=1&a=2&a=3 shows three rows with an ×3 badge, and it lets you edit those rows live — including reordering — with the Cleaned URL staying in sync on every keystroke. The whole state round-trips through the URL fragment, which browsers never transmit to servers, so a shareable parse of a gnarly production link never exposes the link itself.

What it does

  • Component breakdown. Protocol, origin, host (with port), hostname, port, pathname, hash — every WHATWG URL field with a one-click copy button.
  • Credential handling. Percent-decodes username and password into a redacted display; a Reveal toggle shows plaintext on demand, Hide masks it again.
  • Host classification. Marks IPv4 (127.0.0.1), IPv6 ([::1]:80), and domain hosts with distinct badges. Empty hosts (data URLs) are flagged; opaque schemes like mailto: and javascript: are recognised.
  • IDN / punycode. Shows both Unicode and punycode forms for hosts with xn-- labels via a built-in RFC 3492 decoder. Unicode input is displayed in its ASCII-compatible encoding (what the DNS sees) next to the human form (what you typed).
  • PSL split. Extracts subdomain / SLD / TLD using ~80 curated two-label suffixes. Falls back to single-label TLDs for .com, .io, .dev, and intranet hosts like localhost.
  • Path segments. Lists each segment with raw, decoded, and re-encoded forms. A warning badge appears when re-encoding differs from raw — a common CDN-routing landmine.
  • Editable query table. Key, value, reorder, delete, add — every edit round-trips through URLSearchParams so duplicate keys and special characters are preserved. A per-row dup badge (×3) surfaces repeated keys.
  • Relative resolution. Provide a base URL and relative, root-relative, or protocol-relative inputs resolve using the WHATWG algorithm — the same one <a href> uses.
  • Cleaned URL. The canonical output after new URL(...) normalises ports, case, default-port stripping, and percent-encoding. Copy with one click.
  • Shareable state. Input URL and optional base round-trip through the hash fragment, so shared links restore the whole session without hitting a server.

Under the hood

The transform is a thin layer over two browser primitives: the WHATWG URL parser and URLSearchParams. Both have been living-standard since 2012 and ship in every evergreen browser, so the tool inherits the same parse behaviour as fetch(), <a href>, and window.location. That is load-bearing: a URL parser that disagrees with the browser's own parser lies to you right when you need the truth.

On top of that layer, the tool adds four enrichment passes:

  • Host classification. A regex for IPv4 dotted-quads and a check for the bracketed […] IPv6 literal form run against URL.hostname. Anything else is a domain; an empty hostname (data:, javascript:, some file: URLs) is flagged empty.
  • Mini Public Suffix List. A curated table of ~80 two-label suffixes — co.uk, com.au, co.jp, com.br, co.in, co.kr, com.cn, gouv.fr, gc.ca, and friends — is checked against the last two labels. On a hit, the next label becomes the SLD; on a miss, the last label is the TLD. Classifies most real-world hostnames without the ~250 KB Mozilla PSL.
  • Query decoding. URLSearchParams is iterated directly to preserve duplicate keys — URL.searchParams.get("k") returns only the first value and silently swallows multiplicity bugs. Entries keep original order and are also grouped by key so the UI can badge duplicates while you edit each occurrence.
  • Punycode decoding. A minimal RFC 3492 decoder runs per label on any xn-- prefix, so mixed-script hostnames render the Unicode form of each label while leaving ASCII labels untouched. No external library, no Intl dependency, no network.

Path segments get careful treatment. /caf%C3%A9/list splits into ["caf%C3%A9", "list"]. Each is decoded with decodeURIComponent (guarded for malformed sequences), then re-encoded with encodeURIComponent. When re-encoded differs from raw — + for space, uppercase hex, legacy encodings — the segment gets a re-encodes badge. That warning saves thirty minutes of tracing CDN 404s when a legacy encoding survives the cache key but fails the origin lookup.

Credentials are percent-decoded and masked by default; Reveal opts into plaintext without persisting outside component state. The cleaned URL keeps credentials verbatim because silently stripping them would change the URL's meaning.

Privacy

URLs in production logs sit next to trace IDs, session cookies, user IDs, and signed-token query parameters that are effectively bearer credentials. A server-side URL parser would receive every paste in its request logs, so debugging a production redirect could leak the signed JWT in ?auth=… to whoever runs the site. This tool runs entirely in browser JavaScript against the native URL API. No fetch, no XHR, no analytics on inputs, no third-party scripts between your clipboard and the output grid. The Share button encodes state into the URL fragment (#s=…), which browsers never transmit to servers by design. Close the tab and the inputs are gone — no local storage, no IndexedDB, no service-worker cache.

What's not supported (yet)

  • Full Mozilla Public Suffix List. The mini-PSL handles common country-code TLDs; exotic ones fall back to single-label. Use tldts in Node for the full list.
  • data: URL inner parsing. Data URLs are marked opaque; the embedded MIME type and body are not decoded here — use the Base64 and URL Encode tools.
  • mailto: / javascript: opaque schemes. They parse and surface href and protocol, but get no structured host/path breakdown — there is no host or path to split.
  • Matrix parameters. Semicolon-separated path params (;jsessionid=…) are not split; handle them in application code.
  • Signed-URL verification. AWS SigV4, GCS signed URLs, Azure SAS — parsed and editable, but signatures are not verified or regenerated.

Related tools

  • URL Encode / Decode — percent-encode or decode individual URL values before pasting back into the parser.
  • Fix Protocol — normalise inputs missing a scheme (www.example.com, //example.com) before parsing.
  • Slugify — convert human strings into canonical ASCII-safe path segments so the parser doesn't flag a re-encodes warning on every segment.
  • JWT Decode — inspect JSON Web Tokens that travel in query parameters like ?auth=… or ?id_token=….
  • JSON Formatter — pretty-print JSON values extracted from a ?state= parameter.

Frequently asked questions

How is this different from just calling new URL() in the console?

The WHATWG `URL` constructor gives you the raw fields — protocol, host, pathname, search, hash — but stops there. It doesn't split the path into segments, doesn't enumerate duplicate query keys as grouped rows, doesn't tell you whether the hostname is IPv4, IPv6, or a domain, doesn't run a public-suffix-list split into subdomain/SLD/TLD, and doesn't surface the Unicode display form next to the punycode one. The URL Parser tool runs all of that on top of `new URL()` so you can see every component at a glance, round-trip edits in place, and copy any single field without selecting text out of the browser console output.

Does it support internationalised domain names (IDN)?

Yes. When you paste a Unicode hostname like `https://例え.jp`, the WHATWG `URL` parser converts it to punycode (`xn--r8jz45g.jp`) — that's how it gets sent on the wire and that's what the tool stores in the cleaned URL. The tool also ships a built-in RFC 3492 punycode decoder, so when a hostname contains `xn--` labels the Unicode display form is shown next to the ASCII form with an `idn` badge. No external library required, no server round-trip, and the decoder runs per label so mixed-script hostnames like `shop.xn--80ak6aa92e.com` render each label correctly.

Why don't you use the full Mozilla Public Suffix List?

The full PSL is a ~250 KB text file with thousands of entries and its own licensing considerations. For a privacy-first browser tool, shipping it would blow the bundle budget for a feature most users don't need. The URL Parser instead carries a curated table of about 80 common two-label TLDs — `co.uk`, `com.au`, `co.jp`, `com.br`, and friends — which correctly classifies the vast majority of real-world URLs. Anything else falls back to a one-label TLD, which is the right answer for `example.com`, `acme.io`, and other single-label public suffixes. If you need the full PSL, the Node ecosystem has `tldts` and `psl`; we deliberately don't bundle them.

Can I edit the query parameters in place?

Yes. The Query parameters table lets you edit any key or value, delete rows, reorder them with the up/down arrows, and add new rows at the bottom. Every edit re-assembles the URL live using `URLSearchParams` — which means duplicate keys are preserved, special characters are re-encoded correctly, and the Cleaned URL at the top stays in sync. When you're done, hit Copy cleaned URL to grab the result, or Share to produce a link that restores the whole state.

How do I get query params out of a URL with this tool?

Paste the URL into the top input. The Query parameters table below the Parts card enumerates every `key=value` pair in original order, with a per-row Copy button for the value. Duplicate keys like `?tag=js&tag=ts&tag=go` appear as three separate rows with an `×3` badge so you can see the multiplicity without running `searchParams.getAll`. If you only want one value, click its row; if you want the full query string, copy the `search` field from the Parts card; if you want the object form, inspect the grouped view and copy the JSON-like structure row by row.

Does the tool handle relative URLs?

Yes, with a base URL. Click Add base URL to reveal the second input, paste an absolute URL like `https://example.com/app/` into it, and then type `../admin?x=1` into the top input. The WHATWG resolver applies the standard algorithm — path-relative references climb the base path, root-relative references replace it, protocol-relative references inherit the scheme — and the resulting absolute URL appears in the Cleaned URL card. This is the exact resolution `<a href>` uses in the browser, so what you see is what a click would produce.

What happens with userinfo credentials like https://user:pass@host?

They are parsed, percent-decoded, and shown in the Credentials row of the Parts card, but masked by default. Click Reveal to see the plaintext form; click Hide to mask it again. The cleaned URL preserves the credentials verbatim because stripping them would silently change the URL's meaning — a common source of 401s when people copy a URL out of a debugging tool. If you want a credential-free copy, open the cleaned URL, remove the `user:pass@` segment, and paste it back in; the tool re-parses instantly.

Related tools