On removing hotplug from LFS

Alexander E. Patrakov patrakov at ums.usu.ru
Thu Oct 13 05:10:15 PDT 2005


Matthew Burgess wrote:
> Hi folks,
> 
> Note this isn't a fully fledged proposal/RFC (yet)!.  I'm trying to get 
> my head around, and document, the method of device node creation and 
> module loading preferred by upstream developers.
> 
> My work in progress can be seen at 
> http://www.linuxfromscratch.org/~matthew/udev-new-setup.txt.

This is essentially a downgrade to the LFS-6.0 state of affairs. A valid 
option, given that it is properly documented that modular kernels are 
not well-supported and there is no hardware detection at all.

>  I'd like 
> folks input regarding its clarity, usefulness and accuracy.


> Linux 2.6.14-rc2

The only thing that actually _needs_ 2.6.14 is the new rule for 
/dev/bus/usb, to be used instead of mounting obsolete /proc/bus/usb:

SUBSYSTEM=="usb_device", PROGRAM="/bin/sh -c 'X=%k X=$${X#usbdev} 
B=$${X%%%%.*} D=$${X#*.}; echo bus/usb/$$B/$$D'", SYMLINK+="%c", GROUP="usb"

(one long line, equivalent to the current insecure /proc/bus/usb 
permissions aka one overly-permissive "usb" group).

The rest works with just 2.6.12. I think it's pure luck that BLFS 
maintainers still use the "usb" group since this "one insecure usb 
group" setup works without hotplug (in fact, this group was introduced 
by me specifically with the purpose of avoiding hotplug installation in 
BLFS-6.0). So, if BLFS is happy with the old "/proc/bus/usb in fstab" 
setup, removal of hotplug will not break things on this side, i.e. there 
is absolutely no need to use 2.6.14-rc kernels.

> Udev  070 (I think we want to mandate the latest version here, although from the
>           Changelog it looks like 059 can do all the funky stuff we need).

Incomplete. We also need some extra rules for loading modules for 
hot-plugged devices. See the tail of udev-070/etc/udev/redhat/udev.rules 
(i.e. rules that mention MODALIAS or /sbin/modprobe).

> Module-init-tools 3.1 (We don't need anything from the 3.2-pre series do we?)

We do need the new "blacklist" keyword, in order to emulate the old 
"hotplug blacklist" functionality. It is a different question whether 
LFS targets only "single-machine" installations (where blacklists are 
never useful) or also allows to tar up LFS and untar it on a different 
computer.

> 4. Overview
> ~~~~~~~~~~~
> 
> The diagram below (with both thanks and sincere apologies to Alexander Patrakov)
> helps to provide a general feel for how the new configuration will work.  Note
> that this version only applies to entirely non-modular kernels.

There is no difference between modular and non-modular kernels. For e.g. 
PCI drivers, in both cases the xxxdrv_pci_probe() function is provided 
by the driver and called by the kernel when a device is detected. Look, 
there are many drivers that even don't have a single "#ifdef MODULE" in 
them!

So non-modularity only guarantees two aspects:

1) There is no need for anybody to load the driver.
2) Hotplug events generated by pci_register_driver() are always lost 
without initramfs.

> This diagram
> will be built upon throughout this document in order to tie together what
> happens when modular drivers enter the equation.  Imagine you've just started to
> see the kernel spewing messages to the console once it's finished decompressing,
> that's roughly where this diagram starts from.  The '*'s mean there are points
> that need clarifying; my queries are detailed underneath the diagram.
> 
>     +----------------+
>     |   Bus Driver   |
>     +----------------+
>            |
>            |
>     (probes hardware and for
>      each device loads...)

... absolutely nothing. But it creates, e.g., these sysfs entries (even 
if a proper driver for this device is not loaded):

# grep . /sys/bus/pci/devices/0000:00:0d.0/*
/sys/bus/pci/devices/0000:00:0d.0/class:0x040100
/sys/bus/pci/devices/0000:00:0d.0/device:0x0801
/sys/bus/pci/devices/0000:00:0d.0/irq:9
/sys/bus/pci/devices/0000:00:0d.0/local_cpus:1
/sys/bus/pci/devices/0000:00:0d.0/modalias:pci:v00001319d00000801sv00001319sd00001319bc04sc01i00
/sys/bus/pci/devices/0000:00:0d.0/subsystem_device:0x1319
/sys/bus/pci/devices/0000:00:0d.0/subsystem_vendor:0x1319
/sys/bus/pci/devices/0000:00:0d.0/vendor:0x1319

(thanks to Kay Sievers for the "grep . /dir/*" trick).

Also, a hotplug event is generated for each newly-created sysfs directory.

Userspace can use these entries to load the proper driver. Old hotplug 
scripts do this by looking at class, device, vendor, subsystem_device, 
subsystem_vendor, and consulting /lib/modules/`uname -r`/modues.pcimap 
file and the blacklist. RedHat (but not LFS yet) Udev rules do this by 
looking only at the "modalias" file, blacklisting is done in 
/etc/modprobe.conf with the 3.2-specific "blacklist" keyword.

You can test this alias by the following commands:

# grep 'pci:v00001319d00000801' /lib/modules/`uname -r`/modules.alias
alias pci:v00001319d00000801sv*sd*bc04sc01i* snd_fm801

Udev will do exactly this, minus the "-v" switch:

# modprobe -v 'pci:v00001319d00000801sv00001319sd00001319bc04sc01i00'
insmod /lib/modules/2.6.12.5-home/kernel/sound/pci/snd-fm801.ko

BTW you can try this command even if you don't have the relevant 
hardware. The module will load successfully anyway.

OK, let's assume that the proper device driver has been loaded somehow 
(either by explicit modprobe command called from udev rule or from the 
"modules" script, or as a non-module).

>            |
>            v
>     +----------------+
>     | Device Driver  |
>     +----------------+
>            |
>            |
>     (registers respective
>      kobject(s) with...)*

which ones? There are _two_ relevant sysfs entries for each driver. One 
of them is for the "bare" device (registered with the bus driver). The 
other one is the device itself. E.g.:

# grep . /sys/class/sound/pcmC0D0p/*
/sys/class/sound/pcmC0D0p/dev:116:16

>            |
>            |
>            v
>     +----------------+
>     |     sysfs      |
>     +----------------+
>            |
>            |
>     (some time passes and eventually
>      we get to...)
>            |
>            |
>            v
>     +----------------+
>     | S00mountkernfs |
>     +----------------+
>            |
>            |
>     (mounts and, as a side-effect,
>      populates /sys.

Inaccurate. sysfs is always populated, but invisible if not mounted.

>  More time passes
>      and we get to...)
>            |
>            |
>            v
>     +----------------+
>     |   S10udev      |
>     +----------------+
>            |
>            |
>     (mounts /dev and kicks off...)**
>            |
>            |
>            v
>     +----------------+
>     |   udevstart    |
>     +----------------+
>            |
>            |
>     (walks the /sys filesystem and uses
>      the information there to populate /dev)

udevstart walks not the entire /sys filesystem, but only /sys/class and 
/sys/block (not /sys/bus!). A replacement that walks the entire /sys 
filesystem (including /sys/bus), thus mostly replacing the old hotplug 
initscript, is called udevsynthetize (see below). Note: udevsynthetize 
does not reconstruct the entire environment of hotplug events, i.e. it 
doesn't provide perfect reconstruction. But it is sufficient to recover 
MODALIAS and thus load modules for PCI hardware. The unhappy party would 
be e.g. a custom script that tries to chmod something in /proc/bus/usb, 
because the DEVICE=/proc/bus/usb/???/??? variable is not reconstructed.

The old udev+coldplug setup had _two_ parties that walked sysfs:

1) udevstart, for /sys/class and /sys/block
2) the S??hotplug initscript, for /sys/bus

Note that they run in a totally different environment: udevstart works 
without /usr being mounted, and the S??hotplug initscript works with 
/usr already mounted. Thus, programs called from within udev rules can't 
rely upon /usr, and old-style /etc/hotplug.d helpers can (well, they can 
be called by the real hotplug events before that, but the initscript 
will call that handler once again then). Since those two steps are no 
longer separate with udevsynthetize, various bugs about /usr being not 
mounted can happen.

> So, that's the very basic scenario covered - everything is compiled into the
> kernel and that's coldplugging completed.  One important oversight here is the
> fact that in addition to registering their kobject(s) with sysfs, device
> drivers also send out hotplug events, via a netlink socket, to udevd.

Not 100% accurate. More accurate version: By registering their 
kobject(s) with sysfs, bus/device drivers send out hotplug events. These 
events have two receivers:

1) A program specified in /proc/sys/kernel/hotplug is exec()ed with some 
interesting environment variables.
2) A message is sent via the netlink socket and reaches udevd.

> Obviously, at the point the compiled-in drivers are sending these 
> events out, udevd isn't available without initramfs, so one can 
> reasonably ask "what happens to these hotplug events?"

> My intuition says that just throwing them away would be the most sensible thing
> to do,

that's what happens.

> as a) udevd isn't able to handle them and b) we're going to walk the
> /sys tree to create devices anyway, so I don't think they're necessary.

We are going to walk /sys only if we are not using initramfs. Otherwise, 
there's just no difference between hotplug and coldplug events.

> QUESTION 1: Can someone confirm my intuition is correct for once?

See above.

> QUESTION 2: Note the rather subtle change in S10udev - it no longer registers
>             /sbin/udevsend as the hotplug event handler.  Reading RELEASE-NOTES
>             for udev-059 states: 'The forked events can be disabled with:
>             echo "" > /proc/sys/kernel/hotplug'.  It sounds like this is what
>             we/upstream want to do.  Confirmation also required here, please :)

Correct. As soon as the first message is received via the netlink 
socket, udevd will not listen to anything else, so calling a program for 
each hotplug event is just a waste of time. But there's a caveat: udev 
installation does a "killall udevd", and this has a very bad consequence.

Suppose that some person already migrated their LFS to this new-style 
setup and has nothing in /proc/sys/kernel/hotplug. Now he/she wants to 
build a new LFS. He/she installs udev in chroot, and this kills udevd on 
the main system! Now when a flash drive is inserted, NOBODY outside 
chroot listens to the hotplug events, thus no module will be loaded and 
no device created outside chroot. A workaround is to modify the 
installation command for Udev:

make DESTDIR="/" install

(already done for the LiveCD, but for the other reason: the udevd 
process inside chroot prevents the Makefiles from unmounting /mnt/lfs/dev).

Also, it's a good idea to start udevd just before udevstart, otherwise a 
few first real hotplug events will be ignored due to the race between 
the two delivery mechanisms, one of which (/proc/sys/kernel/hotplug) 
doesn't preserve event order.

> There's an 
> important section missing from it at the moment - that of coldplugging a 
> modular kernel - but that's simply because I don't yet understand the 
> pros and cons of the various alternative approaches we can take with 
> that type of configuration yet.  I'd prefer to avoid mandating an 
> initramfs if at all possible, but if its the only way we can get 100% 
> correct results, then I guess it'll have to do!

There are four approaches:

a) mandatory initramfs with udeveventrecorder, like SuSE does. 
Guaranteed 100%-correct, as there are no coldplug events at all and thus 
there is no need to walk sysfs.

b) udevsynthesize, already described above. Will replace udevstart and 
S??hotplug at once. Known inaccurate and racy; looks like abandoned 
upstream. See
http://marc.theaimsgroup.com/?l=linux-hotplug-devel&m=112482471607099&w=2

c) kernel patch to replay hotplug events on demand. Looks like a 
promising (and 100% accurate for "add" events, except $SEQNUM) 
equivalent for "b". Due to the possibility of scanning /sys/bus and 
/sys/{block,class} at different time, avoids some (but not all) 
potential /usr-related bugs. Downside: too new, and shares the same 
"when to start e2fsck" race condition with "b". See
http://marc.theaimsgroup.com/?l=linux-hotplug-devel&m=112828738301128&w=2

d) no hardware detection, as in LFS-6.0

My preference is as follows. Let Matthew and/or Jim answer a simple 
question below. If anybody answers correctly, go with "b" or "c" (or 
maybe even "a") at his choice. If both answers are wrong, either go with 
"d" or (better) revert to static /dev. Keeping the current setup plus 
udev_run_hotplugd (i.e., not removing hotplug) is also an option in any 
case, because it is never a good idea to run ahead of upstream. The 
exact initramfs-free mechanism of replaying coldplug events is still in 
its development stage upstream.

The test question is:

Let's suppose that all relevant drivers (i.e., *hci-hcd and usblp) are 
loaded (or are non-modules). How many hotplug events will be generated 
by linux-2.6.12.x if one connects his USB printer to the computer? What 
is the relevant directory one level under /sys for each of them?

-- 
Alexander E. Patrakov



More information about the lfs-dev mailing list