How 99% of C Tutorials Get it Wrong

August 9th, 2020

Ok, I don't have any actual statistics, but most C tutorials out there are really bad. I've been thinking for years why that happens and I finally have a coherent way to express it.



Who Am I to Have an Opinion ?

Let me just mention that I like phrasing paragraph titles like a twitter troll.

So, I'm a compiler researcher, low-level programmer and a big fan of the C programming language. In general, I somewhat know C from both sides. That of the user and the implementor.

But this article did not arise only from my own opinion. The argument I'll present here, at least in its general form, is one which programmers who I know personally and I admire a lot (e.g. George Liontos) agree with. Also, programmers that I look up to (e.g. Casey Muratori) and who have terribly more experience than me.

And all of us have used C a lot. So, there must be something there...

Let's Get High

I'll start by discussing about some higher-level programming languages than C and the general conventions of programming that are popular today.

Programming languages like Python, Java, Haskell or Prolog are so called high-level languages. We usually think that this means they're far from the hardware. But let's take a moment to appreciate the subtleties of this fact. If one programs in such a language, they are thinking in a model, an abstract conceptual space, far from what happens in the hardware.

For example, if you're programming in an Object-Oriented way (e.g. in Java), you're thinking about objects that interact with one another like in the real world. Obviously, hardware doesn't work that way. People just used it to build their model on top of it. It's like when you're using Google Maps. You are thinking in a computer visualized space and you actually interact with it, you move up and down etc. But you have no idea how Google Maps achieved to e.g. show the mountain or tilt the camera.

The important thing is that you don't have to. If you had to, Google Maps would probably have failed miserably. Like imagine that if you did something a little bit odd, Google Maps freaked out and then the creators said "yeah, well, you have to learn how we produce these graphics to use our product".

The same is true for these high-level languages. The reason that their models were created in the first place is so that you can think in those models and avoid thinking about the way the hardware actually works. Because those models are thought to be easier for humans to deal with, less error-prone etc.

For instance, in Haskell, there are lists. You can do all the usual operations like merging etc. with them. Imagine if in Haskell, when merging two lists, sometimes you got a segmentation fault. It completely breaks the fact that we think in the space of conceptual entities we call lists and now we have to think how lists are actually implemented, in the hope that it would help us prevent such segmentation faults. Fortunately, high-level languages try very hard not to impose that on users.

But C is Like: Dude, I Don't Give a Single Shit

It honestly kind of is and this is not criticism. C was not made to provide "conceptual spaces" in which you can approach your problem and forget about the hardware. C was made to provide an easier way (basically, syntactic sugar) to write assembly.

Let me re-iterate that because it's key! If you're programming in C, you are supposed to have a pretty good idea of what happens in hardware, i.e. how you would roughly write it in assembly, and then use it just to make your life a little easier. You're also expected to know how a compiler works and what it will roughly do with your code. You're supposed to help the compiler, not treat it like a magic wand and pretend that you'll add -O3 and everything is going to be perfect.

In short, C never pretended that software is the platform.

It is quite obvious that if the creators of the language tell you "there's no such a thing as an array, no bounds checking, no nothing, there are only pointers" and you still pretend it's an array, you're going to have problems. Yet C tutorials do exactly that and other similar things, for example, with printf() / scanf(), usual arithmetic conversions and integer promotions, undefined behavior etc.

They treat them as if they don't exist or as if one doesn't have to think about them in every single program of even moderate complexity. And so, programmers think in these abstract spaces, ignoring the low-level details, something which inevitably leads to bugs. And if a bug causes your program to crash, it is one of the good scenarios in common uses (unless you're programming an airplane or something). The bad scenarios are security holes.

The ridiculous thing is that one has to learn exactly these details to understand and solve those bugs.

The Clowns are Still in the Car

And not knowing important implementation details of the language is only part of the problem. Because a C tutorial's purpose should not just be to teach you how to write "bug-free" code. If that was the case, why program in C in the first place ? Choose a language with a more easily comprehensible and well implemented conceptual model (instead of the actual model of the hardware) and program there.

People that still program in C do so because they can take full advantage of the hardware (usually for the sake of performance). So, even if you learn enough about C's implementation to not have bugs but still don't know how the hardware works and how to use it to its fullest, you won't be an effective C programmer.

So, please, let's not see more tutorials pretending that doing a bazillion allocations and frees is ok, that caches and branch prediction do not exist, that compilers are truly intelligent and not just tools etc. Because we are creating future programmers that will never be effective and instead program C like it's Java. Get the clowns out of the car!

To go that a step further, how much of a skilled C programmer is someone is directly related to how much they understand the hardware. To that end, devoting a lot of time to learn C is not even that important. It's a very small language.

Good Learning Resources

  • Pretty much, any low-level resource will help you write better C.
  • The C Programming language (aka K&R): This is a book written partly by the creator of C. While it is dated, it promotes some unsafe or bad practices etc., it's a very pragmatic book.
  • Expert C Programming: A book covering edge cases of C and other interesting things.
  • Hacking: The Art Of Exploitation: This book gave me amazing insight on the inner workings of the hardware back in the day. You can skip hacking info if you're not interested.
  • Handmade Hero: A whole professional quality game coded from scratch by one of the best programmers I have seen. It is very hard to watch all of it but I would say that 10-20 videos are enough to understand some very important general guidelines for programming.
  • Mike Acton's CppCon talk: Anything that has to do with Mike Acton is interesting. Always a step towards pragmatic data-oriented programming and away from unrealistic models for low-level software.