Intel - Itanium Chips

Richard Lightman richard at
Sun Jan 7 07:34:31 PST 2001

Misquoted from Steve Hayashi on 2001/01/ 6 at 16:32 +0000:

64-bit integers:
Available for all sorts of cpu's using gcc's "long long" and
"unsigned long long" types. If you are older than pocket
calculators, then the algorithms you learned for long addition
and multiplication apply just as well to multi-word arithmetic
as they do to multi-digit arithmetic.

Floating point:
Back in the days when there were separarte chips for floating
point add and floating point multiply the industustry standardized
on IEEE754-1985 so the ouput of one chip could go to the input
of the other. The standard has not changed so the maximum precision
remains the same. The only things that have changed are the
numbers of floating point calculations that can be in progress
simultaneously, and the number of clock cycles required to complete
IEEE754-1985 is a bit woolly on the precision of things like log
and cosine. The 387 and 68881 did these in hardware but the 486 and
68040 did them in software. If these instructions have gone back
into hardware then the precision might depend on the CPU again.

64-bit pointers:
Handling a 5GB data structure with 32-bit pointers is slow
and fiddly. The 64-bit pointers available with alpha and
(t)itanic architectures make that sort of task easier
and faster. The bad news comes with the address translation
required for virtual memory. Converting 32 bit virtual
addresses to 32-bit physical is much easier than converting
64-bit virtual to about 34 bits of physical address. The
386 memory management unit is slow enough. The alpha uses
the same algorithm as the 386, but uses twice as many steps.
The 601 has a sensibly designed mmu which I suspect was
copied into the later PPC's. I have no idea what ia64 does.

Instruction parallelism:
ia64 puts 3 40-bit instructions in a 128 bit word, and tries
to issue two words per clock cycle. I have had an athlon and
a 686 each running about 4 instructions per clock. It is quite
possible that ia64 instructions achieve more than ia32

The reason for the ia64:
Intel squashing the competition. Intel first moved to a 33MHz
bus because many of there competitors couldn't. Intel want(ed)
us to use RDRAM because they had a good share option deal
with rambus, they do not need to pay royalties to rambus, and
intel gets any royalties over 1.5% that rambus charges intel's
competitors. The ia64 is little different. The idea is to
use more transistors than your competitors can. Here are the
ways to do it:

Speculative execution:
When this code has a conditional branch, execute both sides
of the branch in parallel. When the value of the condition
has been determined, discard the state generated by the
wrong branch. Now you can add extra execution units to
pump data into extra registers all for data that will
never be used.

Loop unrolling:
ia64 is bad at tight loops, so include about three
interleaved copies of the loop code. You will now need
a huge instruction cache to keep all those speculative
execution units running.

Inline functions:
ia64 is bad at subroutines, so include a copy of each
subroutine at each place it is called. Look at the size of
those programs!

The result:
A CPU that requires so much power that you need an extra power
supply on the mother board. It generates so much heat that
you have to clock well below the limit from the switching
speed of the transistors. I suspect ia64 will be very good
at algorithms that cannot be expressed in a parallel form,
and you benefit if you need several gigabytes of virtual
memory, but I am just guessing.

The real tests are things like which CPU makes a better
database, webserver, video compressor, LFS compiler,
or whatever you are using it for. I have had a
quick look around the web, but I hot not found
any itanium benchmarks. Anyone got any?


Unsubscribe: send email to lfs-discuss-request at
and put unsubscribe in the subject header of the message

More information about the lfs-dev mailing list