Pushing UTF-8 support into LFS

Alexander E. Patrakov patrakov at ums.usu.ru
Sat Aug 6 21:59:06 PDT 2005


a sample LFS-like system that supports UTF-8 is available on a live CD. 
So, it may be a good idea to create an experimental branch of the LFS 
book that incorporates the same changes. LFS built according to that 
branch should work in both UTF-8 and traitional locales. So, patches 
that make things work in UTF-8 but break the non-UTF-8 case are a no-go.

Summary of changes (including those that are on the official non-UTF-8 
CD) is below. Details for each package (and screenshots that illustrate 
the problems) will be available on request.


sharutils: added to chapter 5 because the ncurses rollup shell script 
wants uudecode.

Ncurses: upgraded to 20050319 version (or at least applied the 
-altcharset-1 patch), built with --enable-widec. Compatibility linker 
scripts are created so that apps that want -lncurses are actually linked 
against -lncursesw.

LFS Bootscripts: the "console" script is rewritten.

sysklogd: the logic that treats bytes 0x80-0x9f as unprintable 
characters should be disabled, a patch is available.

coreutils: big patch from RedHat. Unfortunately, with bad bug history.

gawk: either a big patch from RedHat or a beta version (but it fails one 
test in its testsuite). When gawk-3.1.5 is released, no patches will be 
needed. Expect more bugs to show up in dfa.c.

grep: big patch from RedHat. Expect more bugs to show up in dfa.c.

GNU Groff-1.19.1: replaced with Debian Groff

gdbm: added to LFS as a dependency of man-db

man: replaced with man-db

diffutils: patch from RedHat

linux: a patch is necessary for dead keys to work in UTF-8 mode.


glibc: the CD uses a patch that alters the list of supported locales. 
no_NO and vi_VN.TCVN removals are bugfixes, the rest of the patch is a 
cosmetic tweak. libidn is nice too but also optional.

kbd: a patch is available that fixes all known keymaps that have 
backspace/delete problem, so that KEYMAP_CORRECTIONS are rarely needed.

texinfo: a minor patch exists that forces a fallback to English 
interface in multibyte locales.

readline: almost works as-is. RedHat also applies patches for the 
wrapping problem and for segfault in lftp.

vim: some of the upstream patches fix problems in multibyte locales. I 
applied all upstream fixed on the CD. Also it is necessary to remove 
translated non-ISO-8859-1 tutorials because they are unreadable in UTF-8 


cdrtools: a patch for mkisofs is needed in order to create 
Windows-readable CDs. Also the name of the author is transliterated so 
that non-ISO-8859-1 users can read it.

thunderbird, firefox: a patch is available that works around the problem 
with displaying dates in "expired certificate" dialog and with gpg 
messages in Enigmail. Needed for all non-ISO-8859-1 locales.

xfce: one Chinese message is mis-marked as Russian. A patch is available 
and accepted upstream.

Xorg: there's a patch that makes Xorg understand more glibc locale names.

nALFS: a patch is necessary to in order to display line drawing 
characters on Linux console properly.

GPM: built --without-curses, if you want mouse support on linux console 
please build ncurses --with-gpm instead.


mark broken packages that should not be installed on a system that 
supports UTF-8.


Alexander E. Patrakov

More information about the lfs-dev mailing list