Trying to understand how my first program worked
Hello, bit sizes?#
I remember one of the assignments (in fact, literally the first assignment at my university ever - we skipped “hello world” for some reason) was to print the bit sizes of various types. So, after much pain, we went ahead and wrote
#include<stdio.h>
int main() {
printf("An int is %d bits long \n", sizeof(int));
printf("And float is %d bits long \n", sizeof(float));
printf("And char is %d bits long \n", sizeof(char));
}
and after learning the secret incantation of gcc
, we got out
An int is 4 bits long
And float is 4 bits long
And char is 1 bits long
Which, despite obviously not knowing the difference between bits and bytes, I think is a pretty impressive first program.
Afterwards, we learnt about functions. And I even wrote my own function! Of course, since I am a prodigy of programming it took my no time to create this absolute beauty:
#include<stdio.h>
int coolFunction(int number) {
return number + 3;
}
int main(){
printf("%d\n", coolFunction(3));
}
6
Ah so that’s pretty easy. I guess printf
and sizeof
are two
other functions that some old guys with beards wrote.
Something with that main
function was also important. Well, here I am, totally
on their level since I’ve also written my function.
It wasn’t until an embarassingly long time later that I realized that none of those functions are like each other. After learning what they meant I almost felt like I had finally understood a koan, unveiling immediately a bunch of facts about programming in C which I never thought of.
Breaking it down#
Let’s start with the abomination which was my function.
This is a standard function. Strict return type, strict input, strict computer science professors.
If I wrote coolFunction(4, "please");
, my compiler would throw error,
my lower palm would make an impact with my forehead, and I’d start again.
And, like, of course it would, there’s an extra argument there.
stdARG matey!#
Next, let’s look at the printf
. This is just some function defined in
stdio.h
, right? I guess the format string with it’s %d’s and %x and stuff must
serve some purpose with memory allocation?
Well actually, and I’m not sure how I didn’t notice this
despite using printf
every single time I used C, that it can take a variable amount
of arguments (i.e. it’s a Variadic function - Wikipedia)
I mean, duh, of course it can. I’ve been putting a random amount of floats, ints,
and chars all the time. Yet, for some reason, I decided “yep that’s normal, and I
will never try to create a function with variable parameters myself, because that
is impossible and has never been done before. Now I will printf
debug this code.”
So, despite nearly a full computer science degree, we never even heard of variadic arguments
in C - even though we used them every time. Well, they’re defined in the stdarg.h
, and
has a bit of syntax for declaring variadic functions, the dottybois (also known as the ellipse)
#include<stdio.h>
#include<stdarg.h>
int coolFunction(int number,...) {
return number + 3;
}
int main(){
printf("%d\n", coolFunction(3, "please", "oh", "yeah", "wassup", 1337));
}
6
Thank you C, very cool! But what if we actually wanted to access the variadic arguments?
We call this macro called va_start
provided in stdarg
to begin iterating over the
arguments. It takes a va_list
type so we’ll have to declare that as well. Finally we need
to use the va_arg
macro which expands the type that we give it. They’re all
polluting the namespace already from stdarg
already so no skin off our back
#include<stdio.h>
#include<stdarg.h>
int coolFunction(int number,...) {
int count = 3;
va_list args;
va_start(args, count);
for (int i = 0; i < count; ++i) {
printf("%d\n", va_arg(args, int));
}
return number + 3;
}
int main(){
printf("%d\n", coolFunction(3, 1, 2, 3));
}
1
2
3
6
Alright! We could create simple sum
, min
, and max
functions like this, but there
are a few things bothering me right here. First: the count
variable. I just arbitrarily
decided it would be 3. But if I have that information then why would I need variadic functions?
The answer is I won’t. It’s only useful if I don’t know the count
value from within the function.
So how do we know how many arguments we’re going to have to iterate through?
Well, and you’re going to think this is pretty lame, but we have to pass that as a variable.
For example, int coolFunction(int count, int number, ...)
.
But wait printf
doesn’t do that, right?
Actually, it just iterates through the format string and counts the amount of %’s (that are not escaped) and uses that as a count.
Here’s another problem: type information is lost. Say I wanted the same code but used coolFunction(3, "hello", 1, 3);
on the last line. I would need to have that type in my va_arg
macro call.
Is there any way I could check the type of my variadic argument and do something like
if(type(arg) == int) {
/* do this */
}
else {
/* do that */
}
Answer: no. Type information is lost. We’re just playing with pointers. That’s why the format string uses the %d and %x and %{whatever}. That’s the workaround - you put the types in.
Actually printf
is kind of genius in the way it makes you do a lot of the computer’s work:
- You tell the function how many arguments you expect
- You give the types of those arguments
And you didn’t even realize! You poor fool!
Compare this to Pascal, where you can write println("hello", 52, "this", 1337, 1337, "is valid");
and the system does the work for you. There’s nothing particularly special about the %
sign either,
we could just as easily write a int cool_printf(const char* format, ...);
that uses the $
character
as its escape character and $i
could be equivalent to %d
.
For more on this, check out stdarg.h - Wikipedia
sizeof#
Alright, now for the tricky one. First of all, sizeof
is a
compiler built-in. That means we can’t write our own version.
It’s also a macro, so it doesn’t need to abide by the simple mortal rules of the C syntax.
This makes sense since the compiler is responsible for knowing things about
my register sizes, architecture, and other low level goodness, and
since sizeof
tells me those things, I knew there would be something fundamental
about it.
But in fact sizeof
is quite powerful. When doing the blackmagicvoodoo of pointer
arithmetic it’s using sizeof under the hood. How’s that for fundamental?
Not only that, but sizeof can take a regular variable (like “number” above), it can take a type (like “int”) and it can take an expression (like “4 + 42”). Also because of its /macro/ness you can write
sizeof 5 + 5
which will be parsed as sizeof(5) + 5
which means the size of the number
5 (4 bytes on my computer) + 5 which is 9.
But sizeof(5 + 5)
will be sizeof(5+5) = sizeof(10)
which is 4 bytes.
In fact, mastery of sizeof is kind of a requirement for high-level C-fu especially for proper struct packing.
Finally, checkout this Implement sizeof Operator in C using Macro - interesting approach.
Conclusion#
So did I finally learn precisely what was going with that first program of mine, more than
three years later? Actually no. And to be honest, I don’t think I ever will unless I somehow
get on the gcc
core team. For example, I know main
and _start:
are linked somehow, but
how?
It’s actually stunning the tremendous amount of abstraction that we work under.
Today I sat down and tried to understand exactly what was going on in the tiniest of tiny C programs and found a fractal amount of depth containing macros, m4, compilers, and standard libraries.
I can’t even begin to imagine what that would look like in a language like python.
Although I’ve got to say, it’s been a fascinating dive. Maybe I’ll do another sometime. But for now, it seems I’m still whispering incantations and watching bits flip.