TransWikia.com

Which special characters are safe to use in url?

Webmasters Asked on December 15, 2021

Which special characters are safe to use in url?

5 Answers

RFC 2396 is actually obsolete and was superseded by RFC 3986.

The unreserved special characters (safe to use without encoding) (other than letters and digits) are:

- . _ and ~

Answered by alds on December 15, 2021

This question popped up first, of course, when I googled up "URL safe characters", as most people would. I think it's worthy to put up a straightforward answer to a concise question. From the horse's— ugh, RFC2396— I mean, Sir Timothy's mouth:

2.3. Unreserved Characters

   Data characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include upper and lower case
   letters, decimal digits, and a limited set of punctuation marks and
   symbols.

      unreserved  = alphanum | mark

      mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

   Unreserved characters can be escaped without changing the semantics
   of the URI, but this should not be done unless the URI is being used
   in a context that does not allow the unescaped character to appear.

"Upper and lower case letters" in this context are understood as defined earlier in the section 1.6 of the same standard:

The following definitions are common to many elements:

   alpha    = lowalpha | upalpha

   lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
              "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
              "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"

   upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
              "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
              "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

   digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
              "8" | "9"

   alphanum = alpha | digit

So the answer is, URL-safe characters are good old ASCII-7 Latin characters A through Z in lower and upper case, decimal digits 0 through 9, and a handful of non-alphanumerics explicitly enumerated in the mark production rule of the grammar in Sec. 2.3.


If the question is to be understood about the HTTP/HTTPS URL (note that RFC2396 defines the URI), the semantic treatment of the RFC2396 syntax as resource locators for the HTTP[S] protocol is currently standardised by RFC7230, Sec. 2.7. Nevertheless, inferring that the set of "URL-safe" characters is larger than that defined by the RFC2396 from the observation that they are not treated specially in RFC7230 Sec. 2.7 would not be a future-proof move; a possible future RFC7230 update may ascribe semantics to more characters that are outside of the "URL-safe" RFC2396 set, rendering such an inference ex statu quo invalid.

TL;DR, it is the safest and future-proof approach to treat the set of URL-safe characters defined in RFC2396 as the largest possible and non-extensible, and not extend it with those that are currently okay/safe/non-special per RFC7230: this may change. The RFC2396 set, in contrast, cannot.

Answered by kkm on December 15, 2021

The answers here are good, but there is one more exception I think is worth mentioning - non-english characters. Referencing this SF question here, characters like ñ (as in Español) are perfectly legitimate, IF they have been encoded in your DNS correctly.

You have to use Punycode within your DNS to get them to resolve in modern browsers (the entry for español is xn--espaol-zwa) but these are now perfectly safe to use in domain names, as they're easy for non-english-speakers to type as well.

Answered by Mark Henderson on December 15, 2021

The safe characters are a-z, A-Z, 0-9, and _ - (underscore and minus), that besides the reserved characters who are used for the parameters.

Other characters will give problems in some degree. example: if one parameter is an array ?param=array[content] ie will show an url whit the square brackets url encoded, which look ugly and impossible to dictate.

But the problem is not only it's ugly, lets say you have a jpg with a character beside the safer ones, many times the browser will be unable to download it getting a 404. This is a problem of older browsers and some mobile browsers.

How to test this?

  • put a bunch of images/js/css with the characters you want to test in the names in a public page with many visitors
  • Make the 404 page send you a email every time it get a hit

I have an inbox with 14000 emails proving my point.

Answered by The Disintegrator on December 15, 2021

The following characters have special meaning in the path component of your URL (the path component is everything before the '?'):

  ";" | "/" | "?"

In addition to those, the following characters have special meaning in the query part of your URL (everything after '?'). Therefore, if they are after the '?' you need to escape them:

  ":" | "@" | "&" | "=" | "+" | "$" | ","

For a more in-depth explanation, see the RFC.

Answered by Thomas Bonini on December 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP