On my PC, which is a 3 GHz, Hyper-threaded Pentium 4 (about three years old), it counted 12.7 million times in 1/25th of a second which is 40 milliseconds. Out of interest I rewrote the code and timed the for loop with four different types for the loop variable a. They were byte, short, int and ulong. Not surprisingly the 64 bit ulong was slowest at 70 ms but int was actually fastest at 40 ms with both byte and short taking about 50 ms.
These apparently contradictory results (that 4-byte ints are faster than 1-byte bytes and 2-byte shorts) is because the CPU's ram is organized in 32 bit (4 byte) blocks. To fetch a single byte or two bytes, the CPU has to read a four byte block and then extract the individual bytes, so it is slightly slower than just reading the 4 bytes.
This wasn't a very accurate benchmark, as doing accurate timings of code is not an easy thing. I ran each version several times and took an average timing.
If you were to convert this to a single loop with a ulong loop variable, and time it counting to 100 million, it takes just under half a second. It counts at about 205 million per second. If the last value in the for loop is changed to to 18,446,744,073,709,551,615 (the largest value that a 64 bit ulong can hold), this loop would take about 89,984,117,433 seconds which is about 1,041,483 days or 2,853 years!
The moral of this is, when it comes to loop variables, use an int unless you need a bigger number than 4 billion (or 2 billion for signed numbers).
On the next page : A Quick Look at Classes

