RFC - bootscript error reporting

Bill's LFS Login lfsbill at nospam.dot
Thu Jan 29 12:03:36 PST 2004


On Thu, 29 Jan 2004, Kevin P. Fleming wrote:

> Jeremy Utley wrote:
>
> > Actually, now that I think about it...someone with more knowledge than
> > me answer this - would it be possible to have a fifo device (via mkfifo)
> > set up someplace where we could write messages to prior to the
> > filesystem being mounted in r/w mode, then at S99log, cat the contents
> > to that fifo into a log file?
>
> I don't think this is possible; a FIFO has to have a reader available
> for anyone to be able to write to it. Without an active reader, all
> writes will just block, which would not be too good in the bootscripts :-)

I'm not sure about this one. I used to think that (true for real *IX).
IIRC, someone pointed out that if the open and writes are non-blocking,
the Linux kernel will let you proceed. But you see where my next point
will go, right? BTW, I *think* I tested that non-blocking hypothesis and
it was correct. But I would have to test again to confirm. And it may be
that the kernel lets you open, but blocks you on write. I *seem* to
recall that as the result of the test.

Anyway, if it does allow writing with no reader, the bootscripts have a
possibility of eventually filling the buffer, at which point the process
will become blocked. If it is a process marked "wait" in the inittab, or
is spawned by some other process that awaits the completion of the
writing process, instant freeze of the system seems likely.

Regardless, starting a reader would not be a major problem. The real
issues become considerations in the design. What is logged and under
what conditions. My concern ATM is only with what is done if FS
corruption exists. As I stated in another post,

   never write to a corrupted file system

So, if the scripts are to log, they need to know if it is safe to do so.
That means they need to be aware of which FS the log is on (may not be
root) and if that is the corrupt FS and if the FS is mounted. From there
they can make decisions to not log, log somewhere else or drop into a
maintenance mode.

Last, what to do if the FIFO is on a corrupted FS? Can you trust it? Do
you *want* to? If the corruption is of a certain nature (corrupted
major/minor numbers) in the inode associated with the external name
(e.g. /dev/b_s_fifo), opening that device may cause communication with a
completely different device driver. Maybe with the driver for the root
FS? Any prediction on the results in that case?

The upshot is that it may be doable, but will not be a Trivial Pursuit
(TM).

>
> This issue (among others) is why I've been considering using an actual
> program to handle running bootscripts (like the simpleinit that's in
> util-linux, only with more functionality) instead of relying just shell
> functions to do it. In this sort of environment, the program could be
> the "reader" on the FIFO that any/all of the scripts write to as they
> find things to message about. That's a radical departure from the
> current system used in LFS, though.

The radical departure is not that scary, I think. What's scary is the
nature of considerations when considering what to do when failure
occurs. Using my example above, you can see that anything to do with FS
corruption or HD failure is fraught with design issues.

There is more leeway on things that are not FS/HD related and the best
design choice may be to *not* do anything re logging/fifo for HD
related, but all the other types of failures we *can* do something.

Now, let's consider platforms that have timers, ... nah! Just kidding!

:)

-- 
NOTE: I'm on a new ISP, if I'm in your address book ...
Bill Maltby
lfsbillATearthlinkDOTnet
Fix line above & use it to mail me direct.



More information about the lfs-dev mailing list