RFC - bootscript error reporting

IvanK. ivan at chepati.org
Thu Jan 29 09:26:05 PST 2004


Sorry for the top-post.

All of the suggested ideas are good.

Now, I'll throw in a new one in (didn't say it's a good one):  an initrd with 
enough utilities to (try to) "auto-correct" a broken lfs?

The scenario is as follows:
The system is not shutdown properly, it reboots and tries to fsck itself, but 
fails.  It writes a /fatal (or something like that) file, and reboots itself, 
after running lilo -R "linux initrd=/boot/<initrd.image>
(Yeah, I'm still using lilo!  This can be adapted for grub, can't it?).  Only 
problem right now is that if you have a lilo password (shame on you if you 
don't :-) ), you will be prompted for a password.  This defeats the whole 
purpose of the exercise.  I wonder if I pass 'bypass' to lilo -R if it'll 
bypass the password prompt...  gotta try that.

After the reboot, the initrd does the fsck thingie.  For example something 
like this:

for disk in /dev/[h,s]d[a-h];do for partition in `fdisk -l $disk | grep "Linux
$" | awk '{print $1}'`;do echo -ne "Checking $partition... " && echo -ne 
"fsck -a -C -T $partition\n";done;done

(get rid of the second echo to run the fsck.  This is only to verify the 
command)

Extending this idea, /fatal could be a directory with files in it with enough 
information for the rescue mini-system to correct the problems.

If we want to be clever, we can even check if all modules in /etc/modules.conf 
are present in /lib/modules/`uname -r` (especially the driver for eth0), etc.  
A whole world of possibilities?

Now as far as logging goes, I'm of the opinion we should be capturing as much 
as possibe of every warning/corrected problem/uncorrected problem to, say, /
var/log/init.log.  But this is tricky because if /var could not be mounted 
rw, where do we write the log? Perhaps write to the ram disk, and if it can 
correct /var (or /), mount it rw and dump the log into place?

This could be the ravings of a lunatic, but it is *possible*

And on another note, I retract my question regarding rhgb.  Even though I've 
made some "progress" in porting it to lfs, I realize it's better dealt with 
in a hint than in lfs-book, or even blfs-book.

IvanK.

On Thursday 29 January 2004 03:47 am, Jeremy Utley wrote:
> On Wed, 2004-01-28 at 15:11, James Robertson wrote:
> > I really like this.  As others have posted, not all errors are the same.
> >   We would need to come up with some kind of system to trap errors from
> > the different programs that are called within boot scritps.  Or if that
> > is too hard, then at least some system to agree on a "criticality" of
> > certain Sxx scripts.  Kxx scritps are important, but not as critical IMO
> > as Sxx scripts when entering a certain rc level.
> >
> > Also, there needs to be a mechanism in place (in the bootscritps) to
> > notify the administrator when he/she came back to the console that a
> > reboot occured at such-and-such a date and time and what the result of
> > any messages were.  This is especially important for the ones that get a
> > timeout value and move on.  Can we use mail for that?  I don't know, I
> > am not _that_ smart with Linux yet.
>
> My thought was actually much simpler, but it can only be done after the
> system is in a read/write mode.  Have the bootscripts, as part of the
> boot process, write to /var/log/bootscripts.log, any failures or
> warnings that come about as part of the boot process.
>
> Actually, now that I think about it...someone with more knowledge than
> me answer this - would it be possible to have a fifo device (via mkfifo)
> set up someplace where we could write messages to prior to the
> filesystem being mounted in r/w mode, then at S99log, cat the contents
> to that fifo into a log file?
>
> > On the simple pause idea, the time to pause needs to be easily
> > changeable in a /etc/sysconfig file of some kind.  Everyone will have a
> > different opinion as to how long they want the boot process to pause
> > before continuing.
>
> Good idea - will definately file that one away for future reference!
>
> > <opinion>
> > I am also not sure that LFS needs to be concerned with headless or
> > non-administrator-local machines.  I would love it in my production
> > environment, but most of our readers reboot at the console.  The issues
> > Jeremy brought up are more about that.  The easiest thing is simply to
> > fix the scritps to support the enter key or change the text to say
> > CRTL-J.
> >
> > I am only throwing this out for thought.  I actually would love to see
> > Jeremy's idea put into the scripts.
> > </opinion>
>
> While in a lot of ways, I agree, James, there are a number of people who
> are actually using LFS-based systems in a production enviornment.  I
> know that I myself have gotten bit by this one on a number of occasions,
> and it's very annoying when you have to drive 45 minutes to the
> datacenter, or drag a NOC technician away from their TV long enough to
> punch the reset button.  One of the key advantages to Linux (at least
> for me) is the ability to remotely administer the machine, and making
> this change to LFS actually makes that possible.  I can see now we could
> actually integrate an option like this into your previous request, like
> so:
>
> if [$PAUSE_DURATION == 0]
> then
>      read $i
> else
>      sleep $PAUSE_DURATION
> fi
>
>
> Comments?
>
> -J-




More information about the lfs-dev mailing list