Proof of the undecidability of compiler code optimization

Question

While reading Compilers by Alfred Aho, I came across this statement:

The problem of generating the
optimal target code from a source program is undecidable in general.

The Wikipedia entry on optimizing compilers reiterates the same without a proof.
Here's my question: Is there a proof (formal or informal) of why this statement is true? If so, please provide it.

Gilles 'SO- stop being evil' · Answer

Like many undecidability results about program analysis, this is a consequence of Rice's theorem.
Consider the function $g := x mapsto mathsf{if} ; f(x) ; mathsf{then} ; 0 ; mathsf{else} ; x$ where $C$ may contain variables. An optimal optimizer must optimize calls to this function to $0$ if $f(x)$ is true for every $x$.
The property “this function evaluates to $mathsf{true}$ regardless of the value of its argument” is a non-trivial property of partial functions, therefore in accordance to Rice's theorem there is no algorithm to decide it.
Taking the contraposite, for any correct optimizer $T$, there exists a function $f_T$ which is always true, but such that $T$ does not realize that $f_T$ is always true and therefore does not optimize calls to $g$ to $0$.
The one part of this proof that I didn't model formally above is “an optimal optimizer must optimize …”: I didn't define what it means for an optimizer to be optimal. This requires a cost model over the programming language. One possible cost model is that every function call costs $1$. In this model, replacing $g(x)$ by $0$ decreases the cost by at least 1 so an optimal optimizer must do it.
This example works with any cost model where a function application has a nonzero cost and the cost of evaluating a program is larger or equal than the cost of evaluating any subprogram that gets evaluated as part of the evaluation of the larger program. For example, any “sane” small-step semantics is of this type: calling a function is a beta reduction, and the cost of evaluating a program is the number of instances of reduction rules that get applied.

ComFreek · Answer

With some programming languages and computation models, you could even say that optimizing programs is the same as running them — just with a subset of computation rules. Then, if you allow enough optimizations to happen, it is clear that optimization faces the same non-termination problems as running programs.
Let me elaborate on my claim by providing a concrete example of a made-up language and its computational semantics.
Consider the following source code of a very primitive imperative language [1].
i := 0;;

IF (i == 0) do:
  SKIP
;;

WHILE true do:
  i := i + (1 + 1) ;;
  i := i + 3

I hope the language grammar becomes clear from the above sample. It supports statements such as assignments variable name := ..., if conditions IF ... do:, while loops WHILE ... do:), and concatenated statements ... ;; .... Moreoever, it allows scalar expressions such as boolean conditions ... == ..., true, false, and simple arithmetic like ... + 3 within some of the previous statements.
Evaluation by Rules
Let us now invent a computational semantics for running such programs. Concretely, we will do so in terms of small-step semantics, i.e. a binary evaluation relation between contextual statements: we write p,Γ ⤳ p',Γ' if program p with variable states Γ transitions (computes, evaluates) to program p' with variable states Γ'.
Then, the interpreter for our language works as follows: upon a program p, it picks the default start state for variables Γ_ini and tries reducing it as long as possible:

p,Γ_ini ⤳ p',Γ' ⤳ p'',Γ'' ⤳ ... ⤳ RET

Hopefully, this ends with a special program RET denoting termination. But it may very well happen that running does not terminate at all. For instance, our sample program invokes such non-terminating behavior — at least with the (intuitive) semantics we give next.
We give the following rules on statements, where S, S', T are meta variables for statements, E, E' are meta variables for expressions, and Γ a meta variable for contexts, and all of them are implicitly all-quantified.

if S,Γ ⤳ S',Γ', then (S ;; T),Γ ⤳ (S' ;; T),Γ'
always (SKIP ;; S),Γ ⤳ S,Γ
if E,Γ ⇝ E', then (X := E),Γ ⤳ (X := E'),Γ
if E,Γ ⇝ E', then (IF E do S),Γ ⤳ (IF E' do: S),Γ
always (IF true do: S),Γ ⤳ S,Γ
always (IF false do: S),Γ ⤳ SKIP,Γ
always (WHILE E do: S),Γ ⤳ (IF E do: S ;; WHILE E do: S),Γ

where ⇝ is a similar small-step relation on expressions that I omit here. There, E,Γ ⇝ E' means that expression E in context Γ transitions to expression E'. Note that since expressins cannot change variable state in our language, we omit the context Γ on the right-hand side of ⇝.
Optimization by Restricted Evaluation
How can we now formulate optimization rules for our language? For instance, our intuition demands that in the program above the statement IF (i == 0) do: SKIP optimize to nothing.
It turns out we can achieve this with the very same tool of small-step semantics. For our purposes, we give the following set of optimization rules for the optimization relation ⤅:

if S,Γ ⤅ S',Γ', then (S ;; T),Γ ⤳ (S' ;; T),Γ'
if forall Δ, T,Δ ⤅ T',Δ, then (S ;; T),Γ ⤳ (S ;; T'),Γ (optimization does not need to be sequential as evaluation was)
always (IF E do: SKIP),Γ ⤅ SKIP,Γ
always (SKIP ;; E),Γ ⤅ E,Γ

With them, we see that our program above indeed first optimizes to i := 0 ;; (SKIP ;; WHILE true do: ...) (where I annotated parentheses explicitly) and then to i := 0 ;; (WHILE true do: ...) as desired.
Note that in contrast to the evaluation rules, here the exhaustive application of the optimization rules above does terminate — at least I hope this can be proven via induction. But this is just a consequence of our yet naive way of optimization. If we optimized many things further, we would also run into possibly non-terminating territory.
Correctness of Optimization
Importantly, optimization rules need to be derivable from the evaluation rules for sane programs, i.e. be a subset in some sense. Otherwise, our optimizations would be wrong. Regarding sanity, for instance our third optimization rule can only be derived if we assume that for E occurring in the if condition, we always have either E,Γ ⇝ ... ⇝ true or E,Γ ⇝ ... ⇝ false.
Moreover, our second rule is only derivable if we assume that the S contained therein never gets stuck. In richer languages, S may even throw an exception.
However, both previous assumptions do usually hold if our language is typed, our type theory ensures soundness ("well-typed programs never get stuck/throw exceptions"), and the input program for optimization is actually well-typed.
Indeed, the definitions of sanity of many programming language optimizers usually include well-typedness as a necessary condition.
The C language is a prime example for a case where sanity of programs encompasses much more than well-typedness alone: namely, many optimizations by compilers are only correct if the input programs do not exhibit undefined behavior.

[1]: the language and its semantics are heavily inspired from the language Imp presented in the Software Foundations series, Volume 1: Logical Foundations by Pierce, de Amorim, Casinghino, Gaboardi, Greenberg, Hriţcu, Sjöberg, Yorgey et al.

Pseudonym · Answer

For most interesting optimisations, I think this is implied by Rice's theorem. For real numbers, Richardson's theorem is also relevant here.

gnasher729 · Answer

It’s quite easy to write a program that will either print the smallest even N >= 4 which is not the sum of two primes, or run forever.
Run it through the optimiser. The optimised output would be either a program printing some long string of digits, or an empty loop. So that’s one hard mathematical problem your optimiser would have to crack.

user114966 · Answer

Optimized program must have the same behavior as the original program. Consider the following program:
int main() {
    f();
    g();
}

, where it's guaranteed that $f$ is pure function. The only question is: does it finish its execution? If it does, then we can replace main()'s body with g(). Otherwise, we should replace it with an infinite loop. Unfortunately, verifying whether f() finishes its execution is undecidable.
Another example is the program with body print(f(42)), where f is pure. The optimal program would just replace f(42) with its value. However, there is no algorithm that does this. We may try to compute it in compile-time, but it may never finish.
Another example (now without infinite loops). Assume that your program defines a context-free grammar and $f(x)$ checks whether string $x$ belongs to the language defined by this grammar (for any CFG we can build such $f$ automatically). Then if $f$ is a constant "true", then
if (f(x)) {
    g()
}

can be optimized to g(). Unfortunately, checking that grammar accepts all strings is called a universality problem and is known to be undecidable.

Proof of the undecidability of compiler code optimization

5 Answers

Evaluation by Rules

Optimization by Restricted Evaluation

Correctness of Optimization

Add your own answers!

Ask a Question