Why did "terminal commands" never get a version of SQL "parameterized queries"?

Question

I was taught horrible bad practice when I initially "learned" SQL, which baked in user-submitted input with quotes and attempted to "escape" this (in the beginning, I didn't even escape it at all...). I then had to spend many years unlearning this, to instead do things like:
SELECT * FROM table WHERE id = $1;

And then the $1's data is sent separately to the database, not part of the actual query string, to make it impossible for "SQL injections" to happen.
However, terminal commands frequently need to be sent untrusted user input, such as:
generate_PDF.exe --template="a path goes here" --title-of-report="arbitrary title from user"

Every time I have to run such a command, I'm scared to death that my "terminal argument escape" function isn't working correctly, or has some unknown bug, so that users can make a title along the lines of "; rm -rf /; to execute arbitrary code on my machine.
This becomes even more of a serious issue when the normal "OS quotes" cannot be used, such as:
pg_dump --format custom --file "a real path" --exclude-table="schema name"."table name"

The "schema name"."table name" part has to be provided in full from the user, and thus I have to attempt to verify the syntax myself, as it cannot just be quoted in its entirety with the "terminal argument escaper" function wrapping it all. (Even if it might be possible in this specific context, I'm talking in general and just using this as an example of when it gets "hairy".)
This has made me wonder why the terminal commands, for example in PHP (since I use this myself for everything) cannot be done like this:
pg_dump --format custom --file $1 --exclude-table=$2

And then we send the actual arguments separately as an array of strings, just like with the "parameterized queries" in SQL databases?
Please note that the $1 and $2 here do not refer to PHP variables, but to "placeholders" for the "engine" which interprets this and which lives either in PHP or the OS.
Why is this not a thing? Or maybe it is, only I haven't heard of it? I'm continuously baffled by how many things which I constantly need and use just "sit there and rot" while they keep releasing a new programming language every week which nobody uses. I feel more and more frustrated about how "stale" everything I care about seems, but this risks getting off-topic, so I'll stick to the question I've just asked for now.

Ljm Dullaart · Answer

Quoting is not your problem. Not sanitizing your input is a problem.
Passing untrusted user data to any underlying middleware (database, shell or anything) is a recipe for disaster. You need to bring the trust in your input data up to the level that you are sure that the middleware can handle it.
Most languages allow the command line arguments to be passed as separate arguments to the function calls. PHP does so with:
pcntl_exec ( string $path [, array $args [, array $envs ]] ) : void

But that does not permit you to just passing directly user data to it, without sanitizing it. If you have an HP printer, you may try to put x01b;(0U in your title string and see what that does to your reports.

wireghoul · Answer

The answer is that no-one ever designed an OS/shell system (AFAIK) that supported executing commands as a two step operation. The reason why parameterized statements are so great and safe is because they do the operation in essentially two steps, first they parse the syntax (with placeholders instead of data) then they fill in the data in the parse tree and execute the statement. This prevents the data from ever having syntactical impact on the query. However this requires the database system/server/library to all support this method.
Unfortunately since safe interfaces don't exist for everything you will often see the security advice "sanitize input" because the consumer code using these interfaces don't have the option of redesigning them and thus input validation becomes the easiest option to implement.
Having said all of that, there are things that can make using userdata in shell commands safer:

Passing data to the command via stdin rather than arguments
Terminate command arguments by adding -- to the command
Correctly escaping data
Replace user data with fixed values if possible
Validate/Sanitize input

Functions like PHP's escapeshellcmd and escapeshellarg can help, but are actually quite complex to use correctly and does not eliminate all types of attacks. I recently gave a talk that covers much of the above at a virtual OWASP event, you might find it interesting. https://www.youtube.com/watch?v=t2tKkiQxfH0

Steffen Ullrich · Answer

Or maybe it is, only I haven't heard of it?

Many languages provide a way to give command and arguments as separate parameters and thus avoid any interpretation done by the shell. This addresses the same problem as separating SQL commands and parameters with parameter binding. For example in Perl:
system("pg_dump", "--format custom", "--file", $1, "--exclude-table", $2)

Similar functionality is provided by the various exec functions in libc where also command and arguments are strictly separated. Or subprocess in Python, os.exec.Command in Golang, system in Ruby etc. I did not find similar functionality in PHP though, i.e. it seems to insist on using a single command string interpreted by the shell with all the problems this causes.

multithr3at3d · Answer

The core issue is that raw data is being passed to a shell, causing shell interpretation of metacharacters (`, $(), ;, |, ||, &, &&, etc), allowing arbitrary chaining/nesting of commands.
When you are working in the shell, the usual way to avoid interpretation of shell characters is to surround the offending argument with single quotes. Many languages offer a similar solution to the problem.
In PHP, you have escapeshellarg, which wraps single quotes around an argument, causing the shell to interpret the argument as only a single argument without metacharacter interpretation. The main inconvenience is that you need to call this function on every single argument that could have been provided by the user. There is also escapeshellcmd, which can be wrapped around the entire command instead to ensure that exactly one command is executed.
These methods still spook me a bit, since you are basically relying on the function's ability to properly escape 100% of unwanted characters, and maybe some strange edge cases remain where it could be bypassed. See @Steffen's answer for other languages that are a bit cleaner.

Why did "terminal commands" never get a version of SQL "parameterized queries"?

4 Answers

Add your own answers!

Ask a Question