TransWikia.com

Escape a variable for use as content of another script

Unix & Linux Asked by Walf on December 31, 2021

This question is not about how to write a properly escaped string literal. I couldn’t find any related question that isn’t about how to escape variables for direct consumption within a script or by other programs.

My goal is to enable a script to generate other scripts. This is because the tasks in the generated scripts will run anywhere from 0 to n times on another machine, and the data from which they are generated may change before they’re run (again), so doing the operations directly, over a network will not work.

Given a known variable that may contain special characters such as single quotes, I need to write that out as a fully escaped string literal, e.g. a variable foo containing bar'baz should appear in the generated script as:

qux='bar'''baz'

which would be written by appending "qux=$foo_esc" to the other lines of script. I did it using Perl like this:

foo_esc="'`perl -pe 's/(''')/\1\\\1\1/g' <<<"$foo"`'"

but this seems like overkill.

I have had no success in doing it with bash alone. I have tried many variations of these:

foo_esc="'${file//'/'\''}'"
foo_esc="'${file//'/'\''}'"

but either extra slashes appear in the output (when I do echo "$foo"), or they cause a syntax error (expecting further input if done from the shell).

6 Answers

TL;DR: skip to the conclusion.

While several shells/tools have builtin quoting operators some of which have already been mentioned in a few answers, I'd like to stress here that many are unsafe to use depending on:

  • what is being quoted
  • context in which the quoted string is used.
  • the locale in which the quoted output is generated
  • the locale in which that generated quoted output is later used.

Several things to consider:

  • in some contexts, it's important the empty string be represented as '' or "". For instance, if it's to be used in sh -c "cmd $quoted_output" it matters if we want what was quoted to be passed as one argument to cmd. In sh -c "var=$quoted_output; ...", it doesn't matter whether the empty string is represented as '', "" or as the empty string.

    The $var:q operator of zsh represents the empty string as the empty string, not '', "" nor $''.

    The ${var@Q} operator of bash (itself copied from mksh which behaves differently in this regard), represents an empty $var as '', but an unset $var as the empty string:

    $ empty_var= bash -c 'printf "<%s>n" "${empty_var@Q}" "${unset_var@Q}"'
    <''>
    <>
    $ empty_var= mksh -c 'printf "<%s>n" "${empty_var@Q}" "${unset_var@Q}"'
    <''>
    <''>
    $ empty_var= zsh -c 'printf "<%s>n" "${empty_var:q}" "${unset_var:q}"'
    <>
    <>
    
  • some of those quoting operators will use a combination of '...', , "..." or $'...'. The syntax of the latter varies between shells and between versions of a given shell. So for those operators that do use it or can use it depending on the input, it's important that the result be used in the same shell (and same version thereof). That applies at least to:

    • the printf %q of GNU printf, bash, ksh93, zsh
    • zsh's $var:q, ${(q)var}, ${(q+)var}, ${(qqqq)var},
    • mksh's ${var@Q}
    • bash's ${var@Q},
    • the typeset/declare/export -p output of ksh93, mksh, zsh
    • the alias/set output of bash, ksh93, mksh, zsh
    • the xtrace output of ksh93, mksh, zsh

    In any case $'...' is not (yet¹) a standard sh quoting operator, and beware that non-Bourne-like shells such as rc, es, akanga, fish have completely different quoting syntax. There is simply no way to quote a string in a way that is compatible with every shell in existence (though see this other Q&A for some ways to work around it).

  • some shells decode their input as characters before interpreting the code in it, some don't, and some do it sometimes, and sometimes not.

    Some shells (like bash) also make their syntax conditional on the locale. For instance, token delimiters in the syntax are the characters considered as blanks in the locale in yash and bash (though in bash, that only works for single-byte ones). Some shells also rely on the locale's character classification to decide what characters are valid in a variable name. So for instance Stéphane=1 could be interpreted as an assignment in one locale, or as the invocation of the Stéphane=1 command in another.

    The sequence of bytes 0xa3 0x5c represents the £ string in the ISO-8859-1 (aka latin1) character set, the α character in BIG5, or an invalid sequence of bytes in UTF-8. happens to be a special character in the shell syntax, including within "..." and $'...'. ` is also a (dangerous) character whose encoding can be found in the encoding of other characters in some locales.

    Byte 0xa0 is the non-breaking-space character in a great number of single-byte character sets and that character is considered as blank in some locales on some systems, and as such as a token delimiter in the syntax of bash or yash there.

    That byte is also found in the UTF-8 encoding of thousands of characters including many alphabetical ones (like à, encoded as 0xc3 0xa0).

    I'm not aware of any charset in use in any locale of any ASCII-based systems that have characters whose encoding contains the encoding of ' though.

    Some shell quoting operators output $'u00e9' or $'u[e9]' for the é character for instance. And that in turn, when used, depending on the shell, and the locale at the time of interpreting or running the code that uses it will be expanded to its UTF-8 encoding or in the locale's encoding (with variation in behaviour if the locale doesn't have that character).

    So, it's not only important that the resulting string be used in the same shell and shell version, but also that it be used in the same locale (at least for those shells that do some character encoding/decoding). And even then, several shells (including bash) have or have had bugs in that regard.

    Any quoting operator that uses $'...', "...", or backslash for quoting or that leaves some non-ASCII characters unquoted is potentially unsafe.

    Or in other words, only the ones that use '...' are safe in that regard. That leaves:

    • zsh's ${(qq)var} operator
    • The alias output of dash/bash,bosh (at least current versions).
    • The export -p of dash/bosh (at least current versions).
    • the set output of dash (at least current versions).

    Though of those only the first is documented and committed to always use single quotes (though note the caveat about rcquotes below).

    Also note that yash can't cope with data that can't be decoded in the locale's charset, so there's no way to pass arbitrary data to that shell (at least in the current version).

    Ironically, the output of the locale utility has the problem (as it's required to use "..." to output implied settings), and it's typically intended to be used to input code in a locale that is different from that where locale was invoked (to restore the locale).

  • The NUL character (0 byte) cannot occur in an environment variable or in arguments of a command that is executed by way of the execve() system call (that's a limitation of that system call that takes those env and arguments strings as C-style NUL-delimited strings). Except in zsh, NUL cannot be found in shell variables or builtin arguments or more generally shell code either.

    A 0 byte however can be read and written alright from/to a file or pipe or any I/O mechanism.

    In zsh it can be stored in a variable, read and written, passed as argument to builtins like in any modern programming language (such as python or perl).

    But bear in mind that if you quote a NUL with any method that leaves it as-is (as opposed to $'', $'x0', $'u0000', $'C@' for instance), regardless of how it is quoted, the result cannot be passed in an argument or env var to an executed command, and no other shell will be able to make use of that NUL character.

    That's possibly to bear in mind if you take external input in zsh, as in IFS= read -r var. If a NUL byte is included in that line read from stdin, $var and ${(qq)var} will contain it which may restrict what you can do with it.

    That's one case where using the $'...' form of quoting can be preferable (if the other caveats associated with that form of quoting (see above) can be addressed).

  • If the resulting quoted text is to be used in shell code located inside backticks, beware that there's an extra layer of backslash interpretation. Always use $(...) in place of `...`.

  • Some characters are only special in some context. For instance = is special in the words that precede the command name (as in a=1 cmd arg), but not after² (as in cmd a=1), though there are some special cases in some shells for commands like export, readonly...

    ~ is special in some contexts and not others.

    Not all quoting operators will quote those.

    Some characters are special in some shells but not in others, or only when some option is enabled...

    Even digits are special in some contexts. For instance sh -c "echo ${quoted_text}>file" would not output the quoted text in file, if 2 was not quoted as '2' for instance.

  • in zsh, the rcquotes option affects how single-quoted strings are interpreted (and generated by its quoting operators). When enabled, a single quote can be represented in a single-quoted string with '' like in the rc shell. For instance, "foo'bar" can also be written 'foo''bar'.

    So it's important that the quoted string generated when rcquotes is enabled be only interpreted by zsh instances that also have rcquotes enabled.

    A ${(qq)var} produce by a zsh with or without rcquotes should be safe to use in zsh -o rcquotes, but notes that in zsh -o rcquotes, concatenating single quoted strings would result in a single quote being inserted between them.

    $ quoted_text="'*'"
    $ zsh -o rcquotes -c "echo $quoted_text$quoted_text"
    *'*
    

    same as:

    $ rc -c "echo $quoted_text$quoted_text"
    *'*
    

    You can work around it by inserting "" in between the two:

    $ zsh -o rcquotes -c "echo $quoted_text""$quoted_text"
    **
    

    While in rc and derivatives (where "..." is not a quoting operator, '...' being the only kind of quotes, hence the need to be able to insert ' within them), you'd use ^:

    $ rc -c "echo $quoted_text^$quoted_text"
    **
    

In conclusion

The only quoting method that is safe (if we limit to Bourne-like shells and disregard yash and `...` or rogue locales, and assume the data doesn't contain NUL characters) is single quoting of everything (even the empty string, even characters you'd imagine never to be a problem), and represent the single quote character itself as ' or "'" outside of the single-quotes, as was the initial intent in your question.

To do that you can use:

  • zsh's ${(qq)var} operator (or "${(qq@)array}" for an array), assuming the rcquotes option is not enabled.

  • a function like:

    shquote() {
      LC_ALL=C awk -v q="'" '
        BEGIN{
          for (i=1; i<ARGC; i++) {
            gsub(q, q "\" q q, ARGV[i])
            printf "%s ", q ARGV[i] q
          }
          print ""
        }' "$@"
    }
    

    or

    shquote() {
      perl -le "print join ' ', map {q(') . s/'/'\\''/gr . q(')} @ARGV" -- "$@"
    }
    
  • ksh93/zsh/bash/mksh:

    quoted_text='${1//'/'\''}'
    

    (don't double-quote the expansion and don't use it outside of scalar variable assignments, or you'll run into compatibility problems between different versions of bash (see description of compat41 option))


¹ The POSIX specification of $'...' was initially targetted for Issue 8 of the Single UNIX Specification, expected to be released in 2021 at the earliest, but it looks like it's not going to make it (consensus on a resolution was not reached in time). So, we'll probably have to wait at least another decade before $'...' is added to the standard

² except when the -k (keyword) option of the Bourne shell and some of its derivatives is enabled

Answered by Stéphane Chazelas on December 31, 2021

In PHP, you can use the escapeshellarg function, that transforms a general string into a bash argument string.

Answered by Kiruahxh on December 31, 2021

There are several solutions to quote a var value:

  1. alias
    In most shells (where alias is available)(except csh, tcsh and probably others csh like):

    $ alias qux=bar'baz
    $ alias qux
    qux='bar'''baz'
    

    Yes, this works in many sh-like shells like dash or ash.

  2. set
    Also in most shells (again, not csh):

    $ qux=bar'baz
    $ set | grep '^qux='
    qux='bar'''baz'
    
  3. typeset
    In some shells (ksh, bash and zsh at least):

    $ qux=bar'baz
    $ typeset -p qux
    typeset qux='bar'''baz'             # this is zsh, quoting style may
                                         # be different for other shells.
    
  4. export
    First do:

    export qux=bar'baz
    

    Then use:
    export -p | grep 'qux=' export -p | grep 'qux='
    export -p qux

  5. quote
    echo "${qux@Q}"
    echo "${(qq)qux}" # from one to four q's may be used.

Answered by ImHere on December 31, 2021

Bash provides a printf builtin with %q format specifier, which performs shell escaping for you, even in older (<4.0) versions of Bash:

printf '[%q]n' "Ne'er do well"
# Prints [Ne'er do well]

printf '[%q]n' 'Sneaky injection $( whoami ) `ls /root`'
# Prints [Sneaky injection $( whoami ) `ls /root`]

This trick can also be used to return arrays of data from a function:

function getData()
{
  printf '%q ' "He'll say hi" 'or `whoami`' 'and then $( byebye )'
}

declare -a DATA="( $( getData ) )"
printf 'DATA: [%q]n' "${DATA[@]}"
# Prints:
# DATA: [He'll say hi]
# DATA: [or `whoami`]
# DATA: [and then $( byebye )]

Note that the Bash printf builtin is different than the printf utility which comes bundled with most Unix-like operating systems. If, for some reason, the printf command invokes the utility instead of the builtin, you can always execute builtin printf instead.

Answered by Dejay Clayton on December 31, 2021

Bash has a parameter expansion option for exactly this case:

${parameter@Q} The expansion is a string that is the value of parameter quoted in a format that can be reused as input.

So in this case:

foo_esc="${foo@Q}"

This is supported in Bash 4.4 and up. There are several options for other forms of expansion as well, and for specifically generating complete assignment statements (@A).

Answered by Michael Homer on December 31, 2021

I guess I didn't RTFM. It can be done like so:

q_mid='\''
foo_esc="'${foo//'/$q_mid}'"

Then echo "$foo_esc" gives the expected 'bar'''baz'


How I'm actually using it is with a function:

function esc_var {
    local mid_q='\''
    printf '%s' "'${1//'/$mid_q}'"
}

...

foo_esc="`esc_var "$foo"`"

Modifying this to use the printf built-in from Dejay's solution:

function esc_vars {
    printf ' %q' "$@" | cut -b 2-
}

To heed Stéphane's warnings about incompatibilities between different versions of bash, regarding single quotes inside double-quoted expansions, the bullet-proof function becomes:

esc_vars() {
    local fmt
    fmt='%s'
    local v
    while [ $# -gt 0 ]; do
        v='${1//'/'\''}'
        printf "$fmt" "$v"
        fmt=' %s'
        shift
    done
}

Answered by Walf on December 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP