suggest 'make -j2' for SMP machines?
lfs-01 at thewizardstower.org
Tue Jun 5 05:46:55 PDT 2007
> Greg Schafer wrote:
>> A lot of seasoned SMP-building folks work on the basis of make -j X+1
ie: make -j3 if you have 2 cpus or 2 cores. As a person who has been
building in parallel for a long time, I strongly disagree with a
comment elsewhere in this thread about performance plummeting if
> (Note this is all theoretical: I haven't really tested it. I've built a
few packages with -j2 on my dual-core machine, but I never looked into
the optimal -j setting much. So if you use X+1 and it works well, that
probably trumps my guesses.)
I recently got access to a dual-core laptop (1.6GHz core 2 duo, 512MB
RAM). I measured the machine's SBU using -j1, -j2, -j3, and -j4. The SBU
includes unpacking binutils, building, installing, and removing the
sources. Configure and Make install are not parallelized.
-j5 returns almost the same results as -j4. I wonder how many jobs Make
actually creates when compiling binutils with make -j... Make -j (with no
arguments) creates as many jobs as possible, and the results are
> If you have one CPU, and make runs two jobs, *and* both jobs are
CPU-bound, then performance will probably only be slightly worse than
running one job. The overhead of switching between the two tasks will
take some time, but not very much.
I agree with all you say, but I believe you're missing something more
important than context switches: cache and bus conflicts. These are the
main reason performance suffers when a core is running more than one
active thread. The dual-core CPUs are very sensitive to having their data
and instructions ready the instant they need it.
Also, each application requires special tuning to really, really get every
drop of performance out of multicore CPUs. Valgrind or Intel's VTune (an
amazing tool) really help here. Make's -j is really a blunt hammer, it
just throws tasks to the cores without any special consideration for how
it will impact performance.
> And here's my guess for why X+1 works well: most compiles don't seem to
be entirely CPU-bound. When compiling packages, I can see my (per-core)
CPU usage, and it's not usually 100%. I suspect the cost of going after
the disk to load the source files in and write the object files out (not
to mention the temp files if you don't use -pipe) is much greater than
the cost of parsing and optimizing a small C file. And lots of packages
(maybe most?) seem to be made up of many small C files.
I'd say the same thing.
More information about the lfs-dev