Another update of the errors hint (the last one for this month, I promise)

Alex Kloss alex at 22-music.com
Tue Oct 28 08:08:17 PST 2003


Hi, folks.

Here it is. A few more words about the SIG11 FAQ, SIG11 bug hunting and
kernel modules (gcc, depmod -F) are in it. This is definately the last
update... for this month :-)

LX
-------------- next part --------------
AUTHOR:	Alex Kloss <alex at 22-music.com>

DATE: 2003-10-26

LICENSE: GNU Free Documentation License Version 1.2

SYNOPSIS: What to do on errors

DESCRIPTION:
The LFS Book has a short, but nice chapter about errors. A longer essay about 
how to spot where the error is, how to describe it (on IRC or the mailing 
list), and possibly get around it is the goal of this hint.

PREREQUISITES:
Common sense, LFS, patience. Programming skills (optional).

HINT:
Almost every LFS adept has seen lines like:

- make[1]: Error
- Segmentation Fault
- ld returned signal 2: ...

The first urge is to write to the mailing list or on IRC something like:

I have an error in program <fill in whatever is appropriate>!

First of all, is it really an error? If you find the option "-Werror" in the 
lines that call gcc, the "error" you're facing could as well be a warning
(-Werror makes gcc handle all warnings as errors). You will often find warning 
and error messages mixed before the classical "make[1]: Error". A warning is 
something gcc complains about, but continues without error, while an error is 
something that stops you from compiling the package you are about to build.
To disable distracting warning messages, use "export CFLAGS="-w".

Mostly, further information about the errors are missing, which is a nuisance
for both the one who asks and the one who tries to answer, because of the
annoying dialogue that is often following. I have to admid that the LFS mailing
list and IRC never failed to solve my problems (and that in a rather cheerful
way), but I reached a point at where I wanted to solve as many of my problems
as possible. So I had to learn a much, which was undoubtedly fun.

WHAT KIND OF ERROR?

You have to distinguish between the different kinds of errors. The more you 
can tell about the error, the easier it is to solve.

This should be a normal hint, but I guess it is easier to draw a chart:

Question: When did it happen?   What happened?      Where did it happen?

                                                  , Compiling (gcc) ...
                              , ... not found ---<- Dependencies (depmod)
          Compile-time Error <                    ` Linking (ld)
       ,'                     `.
Error <                          Segfault
       `.                     ,'
          Run-time Error ----<          , full
                              ` Hangup <
                                        ` Prog 
        				  only

That looks pretty simple, eh? But that's only the beginning. We will have a 
look at each of these error types closely!

1. Compile-time Errors

First of all, check the package you are about to compile for files like README
and/or INSTALL. You can work around most errors by strictly following those 
instructions. 

When you are about to build your package, you sometimes get the error that 
something is missing or malformed or simply uncompileable.

1.1 ... not found

1.1.1 Compiling (gcc)

There is a lot gcc may be unable to find. If there is something to include, it 
may be the file that should be included, that is missing. The questions here 
are: 1. what's missing? and 2. what to do against?

1.1.1.1 Missing header file

If only a header file is missing, you will experience an error message like:

foo.c:3:31: /usr/include/bar.h: No such file or directory

If there's a file missing, you may want to search your system for it:

find / -name <filename> or
locate <filename> (run updatedb, if locate demands it)

If you don't find the file, the next question would be: where should this file
come from? Is there a prerequisite you forgot? Are all tools available in the 
required versions?

If the file is anywhere else than in the common include path (/usr/include,
/usr/local/include), you may add -I<uncommon include path> to the CFLAGS,
e.g. "export CFLAGS=-I/usr/X11R6/include". If the #include statement
contains a subdirectory, while the file to be included is in the common
directory, you'll have to edit the #include statement.

In most cases the file will be in a directory the developer did not expect.
The easiest way around that would be a symlink, but that is not a clean way. 
So we search the sources for occurrences of the "missing" file first:

grep -R "<missing file's path and name>" *.*

Now edit every file that uses the wrong path in it's #include statements. The
lazy user can utilize sed:

for i in *.*; do 
 mv $i $i.bak
 sed s|'<"missing" file>'|'<found file'>|g $i.bak > $i
done

This should solve the problem; you can continue building the package.

1.1.1.2 Missing declaration

Another fine error message goes about a missing declaration:

foo:124:4: bla undefined

if "bla" is a function from generic libraries (like glibc), it will probably be
documented with a manpage which holds information about which header file(s)
it needs to be included:

man bla

Look at /usr/share/man/man3 for documented function calls: The manpage will
look something like that:

--snip
FUNC(3)				Linux Programmer's Manual		FUNC(3)

NAME
	func, ffunc, vfunc - Example function without any use

SYNOPSIS
	#include <stdfunc.h>
	int func(char *format, ...);
	int ffunc(FILE *stream, const char *format, ...);

	#include <vstdfunc.h>
	int vfunc(const char *format, va list ap);

DESCRIPTION
...
--snap

In most of the cases the header file is not included where it's needed, so you
just write it into the file where it is missing: "#include <stdfunc.h>".

If the definition is not in any standard library, you will have to search the
codebase of the program you are about to compile for the function it's missing:

grep "<function name>" *.* | less

Now search for something like "#define bla ( const char * ...". If you don't
find anything, the function is likely to be included in other sources, so you
better check the requirements of the package you are about to compile, in case
something is missing.

If the file where the definition is included is a header file (*.h), simply
include it, otherwise copy and paste the definition into the file gcc is
complaining about.

1.1.2 Linking (ld)

Linking mostly fails because of missing libraries. Make sure your
/etc/ld.so.conf contains all directories with libraries in it. In case, another
directory is needed, use LDFLAGS: "export LDFLAGS=-L/usr/X11R6/lib" to include
XFree86's libraries for sure. "/lib" and "/usr/lib" are always included by
default and need not to be in there.

Another (occasional) error can occur if libs are not linked right. I only
saw it happen once when some program linked to libpng, but forgot about libz,
which is used by libpng, but needs to be linked to, too. So in the Makefile,
where I found "LIBS=-lpng", I completed it to "LIBS=-lpng -lz". Mostly the
function that is missing is given; you can try to grep it in the library
(binary matches).

1.1.3 Module Dependency checking (depmod)

Another error that only happens if the running kernel differs from the one the 
sources are compiled against (which could be the case when compiling in chrooted
mode) is the "unresolved dependency in module"-error. To get around that bug, 
run depmod with the "-F /usr/src/linux/System.map"-option. And be sure that you
are compiling the modules with the same compiler as you used when compiling the
kernel.

1.2 Segmentation Fault

This is most annoying. It means an application tries to get something from a 
file/pipe/device/environment variable that is not set and has no fallback if 
there is nil but rather dumps core and stop immediately. If the following in-
formation is not sufficing for you, you may want to have a look at the SIG11 
FAQ which can be found at http://www.bitwizard.nl/sig11 - but look at this 
section first.

1.2.1 Segfault during compilation

Segmentation faults during compilation are rarely seen. You only get SIG11 if
the memory is full while building a package and it will happen only on systems
with little memory. You can add a loop device to swap to expand your memory; 
this will make compilation much slower, but at least it will work on such 
devices that have insufficient memory:

dd if=/dev/zero of=/tmp/swapspace bs=1M count=128
losetup /dev/loop0 /tmp/swapspace
mkswap /dev/loop0
swapon /dev/loop0

will set up 128MB of swap space (or virtual memory). If it still fails,
increase the amount of disk space used (count=256; count=512; count=XXX). If 
you are done compiling or want to increase the size, remove the added swapspace
with:

swapoff /dev/loop0
losetup -d /dev/loop0
rm /tmp/swapspace

1.2.2 Segfault during execution

If a program segfaults, there is not much you can easily do to hunt the error
down unless you have some programming skills. Contact the developer and give
him a detailed view of your system; maybe in /var/log is something about the
error? If you want to hunt the bug down yourself anyway, read the SIG11 FAQ and
use strace which you will find at http://www.liacs.nl/~wichert/strace/ and is 
easily installed on the program; it may help you to find out what file/pipe/
environment string/etc the program is expecting to be available. Then try to 
grep the sources of the program which is segfaulting after the file/pipe/etc 
which failed. Add a fallback routine. A nice example is the gsview-4.4-patch.
gsview 4.4 tried to get the environment variable LANG, but had no fallback for
the case it was not set. The malignant part of the source looked like:

   strncpy(lang, getenv("LANG"), sizeof(lang)-1);

Which would have copied a part of the LANG(uage) environment variable without
the last character - if LANG was empty, it would have tried to copy -1 char-
acters, which resulted in a segfault. The easy solution would have been to set
LANG to something, but the better solution is to provide a fallback and change
the code to:

   strncpy(lang, (getenv("LANG") == 0) ? "C" : getenv("LANG"),sizeof(lang)-1);

That is a bit obfuscated for the C-illiterate, but it means "if LANG is 0, then
use 'C' instead of the LANG environment variable (which stands for standard), 
else use the LANG environment variable minus one char". Now it is your turn,
if you still want to get that bug by yourself!

1.3 Hangup

Hangups are the most annoying errors there are. Fortunately, they are as seldom
as annoying with Linux (unless you use bleeding edge sources only). Hangups are
mostly caused by endless loops, driver problems that leads to bus lockups, and
hardware issues (like defective capacitors in the CPU power supply, check for 
bursted ones). Infinite loops are easily spotted by the warnings of most 
compilers, the latter is harder to find. Try to downgrade the driver you think 
is responsible for the hangup and send a report to the relative mailing list.

1.3.1 Full Hangup

You recognize a full hangup by pressing the [CAPS LOCK] key. If the led is
flashing, the keyboard is still hooked to the console, so that's no full
hangup. Try pressing different keys then. If nothing else works, use a hard 
reboot (that is always the last means of getting back to work). If the
keyboard is still available, but the screen is blank, try to reboot with
[ALT][CTRL][DEL]. If even that doesn't work, you may be lucky enough to have 
the sysrq key feature compiled into your kernel. For further information, read
[/usr/src/linux/Documentation/sysrq.txt].

1.3.2 Program-only Hangup

If the program hangs up leaving the rest system intact, you can use the
appropriate of the kill/killall/xkill command to get rid of it. Program-only
Hangups occurs on infinite loops, e.g. trying to read from a blocked pipe, in
most cases the load will go up visibly.

1.4 Other errors

If you get an error message not covered by this hint, check the relevant 
mailinglists, enter the error message into google and look 1. if there is a 
newer version or 2. if a cvs version, if available, has the same error. If 
nothing else helps, ask in IRC or mail to the developers mailinglist or submit
a bug report. Remember to describe the error precisely and give enough 
information about the system you are trying to build the package on (logs,
versions, strace output, dmesg output, debug messages and so on).

May the source be with you!

CHANGELOG:
  [2002-01-04]
    * Started to write this hint.

  [2003-10-07]
    * Initial Version, small additions.

  [2003-10-08]
    * Almost forgot to give Tushar some credits, little changes and additions.
    * Small changes and corrections suggested by Bill Maltby

  [2003-10-26]
    * Adding a link to the SIG11 FAQ, some more stuff about segfaults and have 
      a few words about the depmod problem with different kernels.

CREDITS:
Thanks to teemu for reminding me on "-I" and "-l" as much as Tushar for the 
warning about warnings and ringing the bell of the "-w" option, not to
forget Bill for his corrections. Thanks to Gerard for inspiring me with his
LFS section about errors! :-)


More information about the hints mailing list