Code Chronicles #20: From C# to Machine Code

Many developers in the community are self-taught, jumping right into application development after watching lots of how-to tutorials and lots of trial-and-error processes. Many never had an opportunity to learn about some core concepts and understand what's happening behind the scenes of their writing due to occupation with learning other things. So, here we go for those who didn't read much about how C# code is interpreted by our machines or those who want to remind themselves.

Programming language generations

Generation Name	Sample Languages	Short Description
1st Generation	Machine Language	This is the lowest level of languages. Machine language is a binary language consisting of 1s and 0s that a computer's hardware can execute directly.
2nd Generation	Assembly Language	These languages use symbolic code which is easier to understand and use than raw binary. Instructions are written in a more human-readable form, but still correspond directly to machine instructions.
3rd Generation	C, C++, Java, C#, Python, etc.	These are high-level languages designed to be easier for humans to write and read. They offer higher abstraction from the hardware and are generally portable across multiple system architectures. The code needs to be compiled or interpreted to run.
4th Generation	SQL, MATLAB, SAS, etc.	These languages are even more abstracted and aim to reduce programming effort, improve readability, and maintainability. They often include built-in functions for common tasks and are used to manipulate databases and perform complex mathematical tasks.
5th Generation	Prolog, OPS5, Mercury	These languages are used in Artificial Intelligence (AI) research. They incorporate problem-solving approaches and the programmer only specifies what the outcome should be, not how to get there.

As visible from the table above, C# is a third-generation, high-level language that's relatively easy for humans to understand but must be compiled and interpreted to run. So, how does our C# code turn into something the machine can execute?

Compiling C# to CIL

When we build our application via Build in Visual Studio or dotnet build in the terminal, the C# compiler (csc.exe) kicks in. It translates our C# code into something called Common Intermediate Language (CIL). This is a lower-level language that still retains some high-level features. It's a kind of "assembly language for the .NET platform", and it contains instructions that can be executed by .NET runtime but are not specific to any processor architecture. This roughly explains the compilation part, but CIL still can't be interpreted by our machines, so how is this any better than having raw C# files?

JIT Compilation

JIT stands for Just-In-Time compilation, and when our .NET application runs, the JIT compiler takes CIL code and compiles it into native machine code that is specific to the processor architecture of the machine it's running on. This compilation is done just in time, hence the name, and is carried out method by method, as they are called.

Here's the step-by-step process of JIT compilation:

Loading: The CIL code is loaded into memory.
Verification: The runtime verifies that the CIL code is safe to run (e.g., type safety checks).
Compilation: The JIT compiler translates the CIL code into native machine code.
Optimization: The JIT compiler can apply various optimizations to improve performance.
Execution: The native machine code is executed by the processor.

Why do we need all of this?

You're now probably wondering why we need all this stuff. Couldn't we get rid of CIL and JIT and compile our C# code straight into the machine code? Why would anyone design it this way?

Using an intermediate language like CIL and JIT compiler is a design choice for the .NET platform to balance flexibility, portability, and performance.

Portability: One of the biggest advantages of using CIL and JIT compilation is portability. CIL is a platform-agnostic intermediate language, meaning .NET applications can run on any platform with a .NET runtime available, irrespective of the underlying hardware. When the application runs, the JIT compiler translates the CIL to the machine code for that specific platform.
Flexibility: The .NET platform supports multiple languages, such as C#, VB.NET, and F#, all compiling to CIL. This means you can write different parts of your application in different .NET languages, and they'll all be able to interact seamlessly since they're using the same intermediate language.
Performance: The JIT compiler doesn't just translate CIL to machine code. It also performs various optimizations during the translation process, such as inlining (replacing a function call site with the body of the called function), loop unrolling (increasing the number of repeated operations inside a loop to reduce the overhead of loop control), and more. This can lead to more efficient execution than if the C# code was translated directly to machine code.

In contrast, if you compile C# directly into machine code:

The compiled application would only run on the specific platform it was compiled for.
The flexibility of using different .NET languages would be lost.
Some of the JIT-specific runtime optimizations wouldn't be possible.

Show me the magic!

If you're wondering what the CIL syntax looks like, you can easily check it on your machine, but I'll also show you the sample here. We'll need to use the ILDASM (Intermediate Language Disassembler) tool provided by Microsoft, that's part of the .NET Framework SDK and allows us to peek at the CIL code that the source code was compiled into during the build process.

Below is a simple application that sums up the numbers from 1 to 10, but instead of a simple loop to do so, we've used Enumerable.Range to iterate through numbers and the LINQ Sum method to sum them up.

int sum = Enumerable.Range(1, 10).Sum();

Console.WriteLine($"The sum is {sum}");

Now, let's take a look under the hood and see what this magic syntax looks like. To use the ildasm command, you have to either add the directory containing the ildasm.exe to your system's PATH environment variable or be located in the folder where ildasm.exe resides. As previously mentioned, ildasmn is part of the .NET Framework SDK, and you can find it at the path where you installed the .NET SDK on your machine. Something like the path below, where X.X is the version number of the .NET Framework.

C:\Program Files (x86)\Microsoft SDKs\Windows\vX.XA\bin\NETFX X.X Tools\

After you've found it or added it to your system's PATH environment variable, you can execute the following command to open up your dll and see what the CIL code looks like. An alternative way to do it is to double-click ildasm.exe and browse your application dll to open it in the ildasm window.

ildasm YourProject.dll

This is the interpretation of our simple program above in the CIL syntax. You don't need to bother with details. This is exactly why there's a layer on top of this that is more human-understandable.

Conclusion

Turning our C# code into something a machine can execute is not so trivial after all. It involves several translation steps, starting with our high-level C# code being compiled into CIL, a lower-level language that's still independent of any specific hardware. From there, the JIT compiler takes over during runtime, translating the CIL into native machine code that's tailored to the specific platform the application is running on.

This mechanism enables the flexibility, portability, and performance optimization we've come to expect from the .NET platform. With tools like ILDASM, we can even peek under the hood and glimpse the lower-level code our high-level code gets translated into. I didn't want to bother you with the details of CIL today, but I wanted you to take a better look at it because, in the next chapter, we'll explore a technique that takes advantage of the fact that CIL exists as a step between our C# code and the actual code interpreted by the machine.

Remember, curiosity is what drives learning and progress in our field. So, don't shy away from diving a bit deeper into the inner workings of the technologies you work with. As you've seen today, even a tiny peek under the hood can open up a whole new world of things.