Introduction to Scalar Evolution with Iteration-based Intuition

May 21, 2020
Re-Edit: September 03, 2021

In this article I'll try to explain Scalar Evolution building on loop-based intuition rather than math-based recurrence relations. It seems to me that as programmers, it's easier for us to think in terms of loops rather than in terms recursive relationships.



What is Scalar Evolution ?

Scalar Evolution or SCEV as it's often abbreviated, is, in a very broad sense, an analysis of change (hence the "evolution" part) of scalar quantities in a program. To make that more concrete, think of our all time favorite loop:

for (int i = 0; i < UpperBound; ++i) {
  ...
}

And suppose that you want to understand how i changes. Here, obviously it changes by a constant +1 step, but now take a look at the loop below:

int main(void) {
  int i, j, k;
  i = 0;
  j = 0;
  k = 0;
  while (i < 200) {
    i += j;
    j += k;
    k += 1;
  }
  // {0,+,0,+,0,+,1}
}

It starts to get quite complicated all of a sudden. Even if we return back to the simple +1 loop, we don't have any notation to describe it.

But before we go there, we should explore the space of "change" a little more. "Change" or "evolution" are pretty generic terms. I mean:

int i = 0;
i = i + 1;
i = i + 3;
i = i + 7;

One could say that i changes or evolves here and I would agree. What is more, the evolution doesn't seem to have a particular pattern, which implies that we won't be able to describe it with a consistent, closed-form entity. These are all things that we have to take into consideration when trying to understand, analyze and describe evolution of scalar variables so that we can know and set limits.

For Scalar Evolution in particular, we constrain ourselves to "evolution of scalar variables within (natural) loops". And for it to have any meaning, this evolution must have some kind of pattern, specifically it should start from some value and then get incremented by a value in every iteration. Such evolutions can be described by chains of recurrences.

(Chain of) Recurrences

A recurrence is a mathematical model that describes a sequence which has a starting value and a step. The notation is: {starting value, operation, step}. For example, the recurrence {0,+,1} describes a value that starts from 0 and increments by 1.

We usually think of recurrences in terms of recursion. For example, when we see the recurrence above, we think of the sequence:
a0 = 0
an = an-1 + 1, n >= 1

But, the whole reason I'm writing this article is because thinking of recurrences, and thus scalar evolutions, with recursion is a little bit odd to me. It's great if we're talking about math and stuff (also consider that math don't have the concept of "loop") but in the context of programming, especially since we're describing changes in loops, thinking in terms of recursion seems unintuitive.

Instead, we can think of recurrences in terms of loops, that is, when seeing a recurrence, we can try to think of a loop that produces the sequence. There are two benefits to that.

First, programmers are usually more comfortable with iterative thinking than recursive thinking. So, if you come across a recurrence and you want to understand what sequence it describes, it will probably be easier to think of a loop that describes it rather than a recursive entity. Second, recurrences in compilers are used to describe values that appear inside loops. So, if you are imagining a recurrence as a loop, it will be more natural to see where it came from in the code.

So, the simplest kind of recurrence is the one we saw, which is {0,+,1}. I guess it's pretty easy for you to write a loop that increments a value by 1 and here is it:

for (int i = 0; i < UpperBound; ++i) {
  ...
}

There's a caveat here, which is that i gets values up to UpperBound, whatever that is. The sequence described above is infinite. The important thing to realize is that recurrences are used to describe the change of a variable and not its range or its set of values. For us the important thing is that it gets incremented by 1 every time and not up to where it goes (well... that's usually as important, but it's not on-topic right now).

Now, consider something more complicated: Chain of recurrences. The idea here is that the step is not a constant but something that evolves as well. And how do we describe something that evolves ? You guessed it... With a recurrence. For example, consider this recurrence: {0,+,{0,+,1}}. It describes a value that starts from 0 and gets incremented by something that starts from 0 and gets incremented by one.

Thinking of it abstractly, and/or in terms of recursion, can become confusing very quickly. But once you start thinking of it in terms of code and loops, it becomes intuitive. For instance, here, you want a value that increments by something in every iteration, but this something itself increments by 1 in every iteration. Thus, we naturally end up to the loop below:

int i = 0;
int j = 0;
while (i < UpperBound) {
  i += j;
  ++j;
}

Note that when writing it out, usually the extra braces are omitted, i.e. the above would be written as {0,+,0,+,1}.

For me, this simple chain of recurrences is quite hard to understand if I describe it with mathematical notation in a recursive way. But viewing it with the iteration-based approach, it seems quite easy. In fact, we can go crazy with chain of recurrences and as long as we stay within anything that any sane person would need to write into a loop, it's actually pretty easily understandable, for example, consider the recurrence: SCEV for i: {0,+,1,+,2,+,3,+,4}. Again, not easy to play with the values inside the loop, following the same reasoning as before: I want x to be incrementing by y, but y also needs to increment by z etc. So, we have:

 1 int i, j, k, l;
 2 i = 0;
 3 j = 1;
 4 k = 2;
 5 l = 3;
 6 while (i < UpperBound) {
 7   i += j;
 8   j += k;
 9   k += l;
10   l += 4;
11 }

Another thing you might want to consider is when a recurrence appears at the starting value and not at the step. Then, it is not called a chain of recurrences (and I don't know how it's called) but this is not the important thing here. How that: {{0,+,1},+,1} would be described with a loop ?

This is a classic example of nesting loops. But I may not even have to tell you that. You want a value that evolves, which implies that you want a loop, call it L1. Then, you want the starting value of this loop to evolve. This implies that you need a surrounding loop L2 and so you end up with:

for (int i = 0; i < UpperBound; ++i) {
  for (int j = i; j < UpperBound; ++j) {
    // SCEV for j: {{0,+,1},+,1} 
  }
}

That was a very short introduction to SCEV and chain of recurrences and before we end it, I want to mention some important caveats. First of all, the (chain of) recurrences that were described are often called "add recurrences" because you add something every time (in which the starting value and the something you add themselves can be chains of recurrences). There can be many other types of recurrences like "sub recurrence" or "mul recurrence" with their respective meanings but add recurrences are by far the most used in compilers.

Second, note that SCEV is based on mathematical frameworks, of which compiler people have implemented only a small fraction. This is not because they were lazy. For example, in LLVM, I think that this "limited" SCEV infrastracture might be something like 20.000 lines of code (of which I don't pretend to understand more than 0.02%).

SCEV in LLVM

I don't intend to make this article LLVM-focused and you can stop here if you don't care about it. For anyone interested, I'll describe some basics of SCEV in LLVM. First of all, your best friend is the command-line argument combo
-scalar-evolution -analyze in opt. This will run the scalar evolution analysis pass and it will annotate the code with its results. For example, see this godbolt snippet.

You'll notice that LLVM gives us more info than just the chain of recurrences, like trip counts, exit values, unsigned and signed ranges etc. Of particular interest is the exit value, which is the value that the variable will have after the last iteration of the loop. This is important for a lot of uses.

Another thing I should note is that the SCEV of a value is context-sensitive. Remember that we said that scalar evolution describes the evolution of a scalar value inside a loop. But which loop? If you consider again the following loop nest:

for (int i = 0; i < UpperBound; ++i) {
  int j = i;
  for (; j < UpperBound; ++j) {
    // SCEV for j inside the j loop: {{0,+,1},+,1} 
  }
  // SCEV for j inside the i loop: UpperBound 
}

Notice that the SCEV of j in the i-loop is the constant UpperBound. We know that after every run of the j-loop, j will end up equal to UpperBound. So, the context is important, and since the context we care about is loops, the context we define is loops. In LLVM, this is called a scope (when talking about SCEV; the term "scope" has many other uses in LLVM) and there's even a (widely-used) procedure called getSCEVAtScope(), in which we give it a Value and a Loop and it returns us the SCEV of this value inside the specific loop.

One implication of this is that asking for the SCEV of a value outside a loop is effectively asking its exit value. That is because "outside" of a loop is considered either the part before the loop or after the loop. The value before the loop is usually useless (or the value hasn't even been defined before the loop), so we're usually concerned about the value after the loop.

Note that all these are true for the outermost loops too. In LLVM, the parent of an outermost loop (which could be considered the function that encloses it) represents a scope as well. But, remember that functions like getSCEVAtScope() need a Loop for the scope, so we can't pass a Function. For this reason, we pass NULL and that signifies that we want the parent function.

Related Educational Material

I want to finish this discussion with some educational material that I think is helpful.