Purity Iteration Analysis - the results are in..

Greg Schafer gschafer at zip.com.au
Mon Mar 17 13:53:10 PST 2003


Hi

With all the pure lfs activity lately, it's only now that I've finally found
time to perform a proper "byte-for-byte" analysis of the build. This is
basically a follow-up to a post made by James Smaby archived here:-

http://archive.linuxfromscratch.org/mail-archives/lfs-dev/2002/12/0639.html

Thanks to James for his initial work on this issue.


Theoretically, one should be able to take a newly built LFS and use it to
rebuild itself (essentially perform Ch 6 again without the assistance of the
/static or /stage1 stuff) and have it reproduce identical bytes.

My goals were to:-

  1. Find out for myself just how many iterations it takes before code stops
     changing

  2. Find out how close the pure lfs build method is to achieving purity

The good news is that (almost) every single byte can be accounted for. The
ones that are not accounted for are inconsequential. But best of all, the
purity achieved using the new build method is excellent.

Just to confirm some of James' findings:-

  - ar "*.a" archives (static libs) contain date stamps of all the object
    files contained within

  - compressed ".gz" files also seem to contain a date stamp

  - some binaries and libs contain embedded date stamps

  - some debug symbols contain random information

  - performing all the test builds on the same day will help reduce the
    number of files that differ


Here are the steps I took to make the analysis:-

  - build a full pure lfs system (lets call this "iteration1")

  - umount everything associated with it (proc, mnt, dev/pts, whatever)
 
  - ensure to be running the bash shell from /stage1 (just in case of
    problems while stripping) i.e. enter the chroot using the
    /stage1/bin/bash or execute "exec /stage1/bin/bash --login"

  - strip all debug symbols and relocation information from the entire
    build. This is the basis I worked on:-

    * any files that end in "*.a" or "*.o", only strip the debug symbols
      (strip -p -g)

    * everything else, do a full strip (strip -p) 

    Here are the commands I used to achieve the above:-

    /stage1/bin/find / -path /stage1 -prune -o -path /dev -prune -o \
       -path /mnt -prune -o -path /proc -prune -o -name '*.[oa]' | \
       /stage1/bin/xargs /stage1/bin/strip -p -g

    /stage1/bin/find / -path /stage1 -prune -o -path /dev -prune -o \
       -path /mnt -prune -o -path /proc -prune -o ! -name '*.[oa]' | \
       /stage1/bin/xargs /stage1/bin/strip -p

    The 2nd command will issue a stack of warnings of the type "strip:
    blah: File format not recognized" but these are harmless and should be
    ignored.

    To see how many unstripped files remain:-

    find / -path /stage1 -prune -o -print | xargs file | grep "not stripped"

    There should only be these files that remain:-

    /usr/lib/locale/locale-archive
    /usr/lib/crt1.o
    /usr/lib/gcrt1.o
    /usr/lib/crti.o
    /usr/lib/libieee.a
    /usr/lib/libmcheck.a
    /usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.2/crtbegin.o
    /usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.2/crtbeginS.o
    /usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.2/crtbeginT.o
    /usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.2/crtend.o
    /usr/lib/gcc-lib/i686-pc-linux-gnu/3.2.2/crtendS.o

    Interesting files in that list are "/usr/lib/locale/locale-archive"
    which seems to get misidentified by the "file" utility as a "PDP-11
    separate I&D executable" and "/usr/lib/{libieee,libmcheck}.a" which are
    actually not ar '*.a" archives despite the file name extension.

  - exit the chroot and copy the entire tree (minus stage1, dev, mnt, proc)
    to a safe area e.g. ~/cmp/iter1

  - chroot back in to the original build and use it to rebuild itself,
    overwriting all the original files as you go.

  - repeat ad infinitum, but copy to ~/cmp/iter2 and so on.

Now that we have all the iterations in the safe area, we need to do a couple
of things to minimise the differing files:-

  - convert all the *.gz files back into their uncompressed original state
      "find ~/cmp -name '*.gz' | xargs gunzip"

  - get rid of all the symlinks
      "find ~/cmp -type l | xargs rm"

Now we can start diff'ing. To do the job I used a couple of quickly hacked
up scripts plus one I found on the net:-

  - DirCmp.sh -- find all differing files between 2 dirs
  - find_diffs.sh -- find all differing files (except for ar archives) and
    produce 1) list of binaries that differ and 2) actual diff of ascii
    files that differ
  - find_ar_diffs.sh -- find all differing ar archive files (by extracting
    the object files then diffing) between 2 dirs

To find out whether the idenified binary files are actually different or
just have different date stamp info, do something like this:-

  strings iter1/usr/bin/perl > 1
  strings iter2/usr/bin/perl > 2
  diff -u 1 2

--- 1   2003-03-17 21:21:28.000000000 +1100
+++ 2   2003-03-17 21:21:33.000000000 +1100
@@ -3299,7 +3299,7 @@
  USE_LARGE_FILES
 \n",
 "  Built under %s\n"
-02:53:16
+05:43:49
 Mar 17 2003
 ,"  Compiled at %s %s\n"
 config_vars(qw(


Now some facts:-

  * The builds DO stop changing i.e. iteration2 is identical to iteration3
    (except for the embedded date and time info)

  * Pure LFS is quite close to achieving 100% purity, but not quite there
    yet. The diff's that remain are the same as in current LFS. All the
    important toolchain stuff is perfect. The only real problem package is
    shadow, and some minor issues with a ncurses header file, some groff
    data files and perl's config.pm

  * If scripting the builds, be sure to set the "TZ" env var or be prepared
    for even more date trouble

  * the /usr/share/info/dir file is a fk'n debacle. Best policy is to use
    Jack Brown's suggestion at
    http://bugs.linuxfromscratch.org/show_bug.cgi?id=485

  * using the technique as outlined here is a great way to find bugs in the
    LFS build as evidenced by recent gcc "mmap_test" bug which has been
    there since at least gcc-3.1

I've attached some small tarballs. The first is 2v3 which represents the
diff of iter2 vs iter3. This is the benchmark as the only changes there are
date and time info with no differing code. The 2nd is 1v2 which represents
the current state of play with Pure LFS. The third is the quick scripts.

In summary, these binary files will always contain different bytes due to
embedded timestamps:-

/usr/bin/perl
/usr/bin/perl5.8.0    <--- Hardlink
/usr/bin/vim and iter2/usr/bin/vim
/usr/sbin/nscd
/usr/lib/perl5/5.8.0/i686-linux/CORE/libperl.a

If you install glibc locales, this file will contain 1 different byte per
locale for reasons I haven't yet determined:-

  cmp -l iter1/usr/lib/locale/locale-archive \
     iter2/usr/lib/locale/locale-archive

    9041   1   0
    9149   1   0
    9257   1   0
    9365   1   0
    9473   1   0
    9581   1   0
    9689   1   0
    9797   1   0
    9905   1   0
   10013   1   0
   10121   1   0

The shadow package has 2 problems. The first problem was mentioned on this
list ages ago by Kelledin and nothing has been done about it:-

  checking location of utmp... configure: WARNING: utmp file not found

This changes the bytes of the build (dunno whether it changes the
functionality but I don't really care). Obvious solution is to bring the
creation of the utmp and related files forward. Why not do it at the start
of Ch6 when we create the password group files?

The second problem is these files:-

/lib/libmisc.so.0.0.0
/usr/lib/libmisc.a

which is essentially the same problem for each file and boils down to this:-

  -/bin/passwd
  -Can't execute /bin/passwd
  +/usr/bin/passwd
  +Can't execute /usr/bin/passwd

I'm yet to look into the best way to fix it. Symlimk may do. Reinstall will
do the trick for sure :-)

The problem ncurses header file is:-

/usr/include/etip.h

  -#define ETIP_NEEDS_MATH_H 0
  +#define ETIP_NEEDS_MATH_H 1

Haven't delved into it yet.

Perl's config.pm has diff's due to missing hosts files and what not:-

  -hostcat=''
  +hostcat='cat /etc/hosts'

  -mydomain='.(none)'
  +mydomain='.localdomain'

  -perladmin='root at tigers-lfs.(none)'
  +perladmin='root at tigers-lfs.localdomain'

Groff's generated *.ps files have a weird diff:-

/usr/share/doc/groff/1.18.1/examples/grnexmpl.ps
/usr/share/doc/groff/1.18.1/examples/letter.ps
/usr/share/doc/groff/1.18.1/examples/macros.ps
/usr/share/doc/groff/1.18.1/examples/typeset.ps
/usr/share/doc/groff/1.18.1/examples/typewrite.ps
/usr/share/doc/groff/1.18.1/examples/webpage.ps
/usr/share/doc/groff/1.18.1/meintro.ps
/usr/share/doc/groff/1.18.1/meref.ps

something to do with:-

-def/PL 792
+def/PL 841.89

which I haven't delved into yet.

Also, groff's DESC files:-

/usr/share/groff/1.18.1/font/devlbp/DESC
/usr/share/groff/1.18.1/font/devlj4/DESC
/usr/share/groff/1.18.1/font/devps/DESC

contain:-

-papersize letter
+papersize a4

again, not looked into yet.



Overall, looking good methinks.

I'm not finished with this stuff yet but I'm putting up what I have now as
I'm going to be real busy for the next few days then I'm going on vacation
and will be AFK. Will try and update the pure LFS hint soon with latest
developments (add less gcc bootstraps, gcc test_mmap patch etc.)

Greg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2v3.tar.bz2
Type: application/octet-stream
Size: 2167 bytes
Desc: not available
URL: <http://lists.linuxfromscratch.org/pipermail/lfs-dev/attachments/20030318/77662d9e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1v2.tar.bz2
Type: application/octet-stream
Size: 4268 bytes
Desc: not available
URL: <http://lists.linuxfromscratch.org/pipermail/lfs-dev/attachments/20030318/77662d9e/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scripts.tar.bz2
Type: application/octet-stream
Size: 2534 bytes
Desc: not available
URL: <http://lists.linuxfromscratch.org/pipermail/lfs-dev/attachments/20030318/77662d9e/attachment-0002.obj>


More information about the lfs-dev mailing list