How Target-Independent is Your IR?


Mar 07, 2023

An esoteric exploration on the target independence of compiler IRs.



Target-independent IRs and target-dependent optimizations

The main reason to have a target-independent IR is that there are many optimizations who are independent of the target. The idea, then, is to implement these optimizations once, on a target-independent IR, and not for every source-target combination.

This idea works great when we are dealing with optimizations which are actually target-independent. A canonical example is dead-code elimination. The characteristics of the target don't matter when deleting code. You just delete as much as you can and it will always be beneficial no matter what.

Things get more complicated when we have transformations which are target-independent but the decision-making involved is not. This requires a little analysis. When talking about a compiler optimization, it makes sense to think of it as having at least three components: the transformation, the legality constraints and the profitability analysis (shameless plug of my thesis because it talks about this distinction extensively). The transformation is the kind of changes we perform to the code, like moving, inserting or deleting code. For example, in dead-code elimination, we literally delete statements. In loop tiling, we strip-mine loops by certain factors and interchange them. Legality is concerned with when it is legal to perform the transformation, and profitability analysis is concerned with when it is profitable to do it, and with what parameters.

For many optimizations, the transformation is target-independent but the profitability analysis is not. For example, you don't need to know anything about the target to perform loop-tiling. This implies that the language/IR does not need to capture target details. For example, you can perform loop tiling in e.g., Rust. However, to do loop-tiling effectively, you do need to know about the target, for example, you should at least know the size of the caches.

This gave rise to a popular direction in optimizing compilers where the IR, on which transformations are performed, is target-independent, but there is also target-specific information flowing around so that passes can perform the transformations effectively. In that way, we have a nice trade-off where we still have one IR, but we perform optimizations specialized per target.

That's at least the narrative I had in my mind until recently...

How target-independent is the LLVM IR?

Let's now take the most popular compiler IR ever, the LLVM IR, and examine its target (and later, source) independence. But first, we should think a bit harder about the definition of target independence.

The simple, and conventional, definition is that the IR does not have entities that are tied to a particular target. In that sense, the LLVM IR is mostly target-independent. For example, its core operations like add are not tied to a particular target.

In practice, the LLVM IR does not fully abide to this definition. For example, the inreg is target-specific and the same is true for a ton of intrinsics. But I would say these are not big problems, or at least, they are not that intellectually intriguing.

The more interesting problem is that the definition above is too simplistic to capture some adverse effects of target dependence that an independent IR is supposed to avoid. In particular, for an IR to be independent, we better add the constraint that whatever front-end lowers to it, does not need to know the resulting target. This is due to one of the main goals of target-independent IRs which is to reduce the explosion of source-target combinations.

In that sense, the LLVM IR is not target-independent, but the reasons are subtler. Consider this example. We have some simple C code translated, by Clang, to LLVM IR, for 3 different targets. Theoretically, the IR should have been the same for all three. The simplistic reason they are different is the Application Binary Interface (ABI). For example, with the x86-64 ABI (section 3.2.3), it's ok to treat the struct as a single integer, both for passing it and returning it 1. On the other hand, the MIPS ABI ("Argument Passing") e.g., tells us that if the function returns a struct, then the first argument should be a pointer to which the result will be stored.

But the question is: Why does the front-end have to deal with it? Notice that in LLVM IR we could both pass, and return, a struct by value. So, we would expect Clang to emit this IR for all targets and let each different back-end translate it to the target-specific form. To understand the discrepancy, we need to look deeper into what ABIs define.

Notice that the need for an ABI, for a target, arises only when we are dealing with concepts that are higher-level than, or anyway not existent, in the assembly language (of that same target). For example, the reason we need calling conventions is that the concept of an argument does not exist e.g., in x86 assembly. It is a higher-level concept, which can however be implemented using this assembly. The problem is that it can be implemented in many different ways and so to have programs that function correctly, all compilers need to agree to implement it in the same way.

The problem of dependence in the IR arises when the concept is higher-level, or non-existent, in the IR too! For example, the LLVM IR does not have classes (but one of course can implement them in many ways). To see why this is a problem, suppose the ABI defines calling conventions specific to classes. This is e.g., a reality in the x86-64 ABI. It states: "If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference ...".

The front-end can't implement classes arbitrarily, and then expect the back-end to apply the calling convention, because the concept of a class is lost in the translation! In other words, it will now be very hard for a back-end to, essentially, reverse-engineer, that certain sequences of instructions implement the concept of a class.

A subtler but similar example is something like the int type in C, whose bit-width depends on the target. The LLVM IR does not have such a type. It only has integers of a clearly specified width. A front-end, then, needs to know the target, so that it knows what bit-width to use for an int.

Given this problem, there are two solutions. The first is to translate such concepts using predictable (by the back-end) patterns, such that the back-end can recognize them. But this solution is unreliable and we need to be sure that the ABI will be applied perfectly. If, for example, the back-end fails to identify a class and apply a calling convention, the program can literally crash. The other solution is basically what LLVM employs, which is to have every front-end deal with the ABI.

The repercussions

Now, let's consider the drawbacks of the situation we are facing. The first and obvious one is that of the engineering effort and code duplication involved in all these front-ends replicating similar code just to deal with the ABI. This gave rise to libraries like llvm-abi.

Second, because the back-end is not responsible for the calling conventions, it actually is agnostic. This means that it will translate anything given to it, no matter if it agrees with the conventions. Since it is not the back-end that is responsible, it has to assume that whoever is, did the job correctly.

Finally, to return to our original problem, because we need different calling conventions based on the architecture, it means that LLVM IR is not really target-independent. You cannot generate IR without knowing the target. Furthermore, can you ship your LLVM IR to a different target than the one the front-end translated to? No. Can you combine two IR files generated for different targets? No.

Bonus: source independence

But at least the IR is source-independent right? Right? Not necessarily. Similarly to our definition of target independence, I would say that an IR is source independent if it can express any source and if IR files for different sources can be combined. Given our discussion above, we can already see that the latter is not necessarily true. For example, can you combine IRs generated for the same target but originating from different source languages? Not necessarily. For example, if there was an ABI based on Rust, then even if we were compiling to the same target, it wouldn't necessarily match with the IR from C.

The former is not a given either. For example, apparently Rust had a hard time translating to LLVM IR. One example is that infinite loops are fine in Rust in cases where they are not fine in C++. Because the LLVM IR uses the C++ memory model, Rust simply could not translate to it.

Random Notes

  • Something to keep in mind is that this article is meant to be more an abstract intriguing exploration than a showcase of practical problems. In particular, it is unclear to me whether these problems bother anyone too much. As I mentioned, there are libraries like llvm-abi and a bunch of threads in the LLVM mailing list, but I'm not sure anyone really cares. The reason is probably that there are just too few ABIs for the combinatorics to make a mess.
  • This article was heavily inspired by a discussion I had with Fabian Giesen. In fact, it was him who showed me the example with the struct.


Footnotes

  1. The x86-64 ABI is a bit complicated and it's out of scope to explain it here. The TL;DR is that the struct has only two fields that are in the INTEGER class and thus the whole struct is INTEGER. Because it takes up less than an eightbyte, i.e., 64 bits, it gets passed and returned in a single register.