Thursday, July 28, 2011

Good Articles on Internals of C++ and Unix

A) C as a glorified assembler:

"C a glorified assembler" or "C a generalized assembler"

Actually I only really started understanding C when I understood 2 things about it.
1) Anything you code in C finally translates to assembly language.
2) How your compiler interacts with the O.S. loading/linking mechanism.

You can easily test this.
Use the 'gcc --save-temps ' feature of your gcc compiler to see the assembly language output.

C program to assembly code translation
>>>>>>>>
C program
>>>>>>>>
extern int printf(const char *, ...)
int main()
{
 printf( "Hello, World!\n");
 return 0;
}

#Retain temporary intermediate files generated during preproc, compiling, assembling etc.
gcc --save-temps test.c

>>>>>>>>
Pseudo-Assembly (see link for more accurate example)
>>>>>>>>
.data                              # data section starts at say 0x100
msg ds "Hello, World!\n"           # define string with label as msg and content as "Hello, World!\n"

.text
push sp                            # save the current stack pointer
push msg                           # push the function parameter onto stack
call _printf                       # make the library call
ret   0                            # essentially return 0 as errorvalue to bash/cmd.exe

>>>>>>>>

Since the C language is portable from one platform to another it kind-of acts like a virtual machine (without byte-code generation of-course).
The same C program will compile on different machine-compiler pairs to the specific assembly language and O.S. of the platform in question.

This makes for  chip/hardware specific assembly(Intel x86 or Motorola or ARM ...) portability.

Write Great Code :

What happens when you run your program?
The dynamics from Compilers to O.S. to Assembly language to Memory to CPU.
This book covers it all

You have 2 choices:
1) You can choose to spend years garnering small nuggets of internals of your Program and how it runs on your machine. You'd need a few books each on Operating System, Linker Loader, Assembly Language, Compiler Design, Hardware.
Next would be loads of patience to connect all this into a chimera/frankestein-of-sorts (horribly put together)
OR
2) You could read it all here - Integrated into a beautiful and coherent whole written by a master.

Write Great Code, Volume 1: Understanding the Machine
Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level
The Art of Assembly Language


Refer:
  1. C to Assembly Translation Article from EventHelix
  2. Inside the C++ Object Model : http://techtalkies.blogspot.in/2007/07/stub-bookreview-c-object-model-by.html
  3. X86 Calling (and stack cleanup) conventions: http://en.wikipedia.org/wiki/X86_calling_conventions
    X86 Disassembly Wikibook (download as pdf): http://en.wikibooks.org/wiki/X86_Disassembly
==================================================================

B) Interaction with O.S. loader/linker/virtual-memory:

The specific C compiler would also link the O.S. specific C runtime properly.

So gcc which is a portable compiler would link in.
a) mingw C runtime on Win32
b) linux C runtime on Linux.

The same C program would also convert to correct assembly/runtime on
Watcom C++, Visual C++ compiler, AIX, HP/UX compilers etc.

Note:
a) printf() comes from the C Standard Library
b) C runtime library contains the startup code which executes before and after main

Refer:
1) C++ runtime environments on HP-UX
2) Internals of C/C++ compiler implementations
3) How to use debug C runtime to debug your application
4) How to write a minimal Kernel in C (a very simple Hello world program which also describes the libc C runtime library)
5) Understanding System Calls
Note:
The C standard library calls like open(), close(), putc(), getc() etc actually just forward the call to O.S. system calls in the linux kernel. The C runtime actually maps a stdlibrarycall to an interrupt vector i.e. system call index number in a lookup table. Then it uses a software interrupt to call the system call (transfers from user mode to kernel mode and back)

==================================================================

These are a set of articles that take a look at what happens inside C, C++, Unix internals as we compile and run our programs. The info here is invaluable to debug compile errors, runtime errors and memory issues. You can go through these in your spare time. Take a print out to read in your spare time and go through these articles.

  1. What Happens When You Compile and Link a Program
  2. What a Compiler Turns Your C Code Into
  3. Virtual Base Classes Implementation
  4. How the C++ compiler mangles/decorates function namesUnix And C-C++ Runtime Memory Management For Programmers
  5. Under The Hood Look At Operating Systems Internals with Windows and Linux : http://techtalkies.blogspot.in/2010/08/operating-systems-and-linux.html
  6. mmap is not the territory (Part 1): http://techtalkies.blogspot.com/2010/09/mmap-is-not-territory-or-mapfail-sigbus.html
  7. mmap is not the territory (Part 2): http://techtalkies.blogspot.com/2010/09/mmap-is-not-territory-part-2.html

==================================================================