suggest 'make -j2' for SMP machines?

Miguel Bazdresch lfs-01 at thewizardstower.org
Tue Jun 5 05:46:55 PDT 2007


> Greg Schafer wrote:
>> A lot of seasoned SMP-building folks work on the basis of make -j X+1
ie: make -j3 if you have 2 cpus or 2 cores. As a person who has been
building in parallel for a long time, I strongly disagree with a
comment elsewhere in this thread about performance plummeting if
overutilizing.
>
> (Note this is all theoretical: I haven't really tested it.  I've built a
few packages with -j2 on my dual-core machine, but I never looked into
the optimal -j setting much.  So if you use X+1 and it works well, that
probably trumps my guesses.)

I recently got access to a dual-core laptop (1.6GHz core 2 duo, 512MB
RAM). I measured the machine's SBU using -j1, -j2, -j3, and -j4. The SBU
includes unpacking binutils, building, installing, and removing the
sources. Configure and Make install are not parallelized.

-j1:

real  3m20.447s
user  2m18.153s
sys   0m36.218s

-j2:

real  1m50.967s
user  2m8.160s
sys   0m32.510

-j3:

real  1m42.912s
user  2m7.948s
sys   0m33.666s

-j4:

real  1m46.869s
user  2m8.840s
sys   0m33.970s

-j5 returns almost the same results as -j4. I wonder how many jobs Make
actually creates when compiling binutils with make -j... Make -j (with no
arguments) creates as many jobs as possible, and the results are
interesting:

real  2m28.837s
user  2m14.148s
sys   0m37.246s

> If you have one CPU, and make runs two jobs, *and* both jobs are
CPU-bound, then performance will probably only be slightly worse than
running one job.  The overhead of switching between the two tasks will
take some time, but not very much.

I agree with all you say, but I believe you're missing something more
important than context switches: cache and bus conflicts. These are the
main reason performance suffers when a core is running more than one
active thread. The dual-core CPUs are very sensitive to having their data
and instructions ready the instant they need it.

Also, each application requires special tuning to really, really get every
drop of performance out of multicore CPUs. Valgrind or Intel's VTune (an
amazing tool) really help here. Make's -j is really a blunt hammer, it
just throws tasks to the cores without any special consideration for how
it will impact performance.

> And here's my guess for why X+1 works well: most compiles don't seem to
be entirely CPU-bound.  When compiling packages, I can see my (per-core)
CPU usage, and it's not usually 100%.  I suspect the cost of going after
the disk to load the source files in and write the object files out (not
to mention the temp files if you don't use -pipe) is much greater than
the cost of parsing and optimizing a small C file.  And lots of packages
(maybe most?) seem to be made up of many small C files.

I'd say the same thing.

--
Miguel Bazdresch






More information about the lfs-dev mailing list