New personal experimental book [warning: lots of UTF-8 in this]

Ken Moffat zarniwhoop at
Sun Sep 14 08:47:05 PDT 2008

On Sun, Sep 14, 2008 at 07:03:03AM -0600, Alexander E. Patrakov wrote:
> On Saturday 13 September 2008 21:31:48 Ken Moffat wrote:
> <resending privately, because LFS recognized this as spam, feel free
> to forward>
 OK, I'll try to CC it.
> > On Sat, Sep 13, 2008 at 05:12:00PM +0600, Alexander E. Patrakov
> > wrote:
> > > Let's clarify the situation a bit. There are three possible
> > > outcomes for
> > > "man foo" in the ru_RU.UTF-8 locale:
> > >
> > > 1) Glibberish (unacceptable, but, unfortunately, what happens if
> > > the
> > > system is misconfigured by an English-speaking editor who doesn't
> > > know
> > > how to test the configuration)
> > > 2) "No such manual page" (well, OK if it indeed doesn't exist)
> > > 3) English manpage (acceptable, although not ideal)
> > > 4) Russian manpage.
> >
> >  From the box I'm using today (clfs amd pure64 from a few months
> >  ago,
> > with your ncursesw change, man-1.6e (straight configure without
> > specifying any lang value), groff-utf8
> OK. However, due to some doubts expressed below, I would like to see
> the complete buildscript.
> > ken at bluesbreaker ~ $LC_ALL=ru_RU.UTF-8 man foo
> > Ничего про foo в руководстве нет
> Indeed, the message is correct. However, this is in urxvt, which is
> known to print some characters, not "invalid character" marks, when
> passed invalid UTF-8. I suspect this is what happened here, and the
> characters somehow became correct. Unless you reencoded the
> translation files yourself (as RedHat does), there is simply no code
> in Man that can lead to correct characters.
 Well spotted, Sir!  I'd forgotten that in my clfs builds I do
indeed recode the messages and pages.  So much for my memory, and my
apologies for spreading confusion or lies.  The "guilty" part of the
script is now attached,  I guess my point now becomes "if even I can
do it, any fool can get an all-UTF-8 system".  Sorry.

 That probably also explains why some quick testing in a tty on this
system (LFS-6.3) failed totally to provide anything other than
English messages  of 'No manual entry for foo'.
> Could you please retest in any VTE-based terminal (sakura from 
> is very lightweight, as well
> as termit from ) or in
> uxterm? While your results already look well-done and convincing, I
> think that it would be even better to post screenshots showing the
> output of "locale", "yes --help" and "man foo" in one terminal window.
 I'll maybe get around to trying other terms in a few days.  No
promises, I'm afraid.  Not quite sure if I've yet picked up
everything you want to establish - the earlier tests were just run
with 'LC_ALL=xx.YY-UTF-8 man foo' ?

> >  After installing groff-utf8 I make the following change to man.conf
> > to actually use it:
> >
> >   sed -i /^NROFF/s'/nroff -Tlatin1/groff-utf8 -Tutf8/' /etc/man.conf
> OK. I assume that adding "| iconv -f UTF-8 -t //TRANSLIT" to the end
> will make the line work also in non-UTF-8 locales (manual pages are
> still expected to be in UTF-8). Could you please test this?
 You suggested that, or something very like it, to me before.  On
that occasion I didn't manage to make it work.

> >  Long-term, UTF-8 is the only sensible solution for text encoding,
> Long-term, due to pollution on Earth, Mars is the only sensible
> destination for the mankind :) - i.e. please don't use "long term"
> arguments as a motivation to do something right now.
> <snip man pages that seem to be rendered correctly>
> >  The following locales provide an English error message:
> > hu_HU.UTF-8, id_ID.UTF-8, ja_JP.UTF-8, ko_KR.UTF-8, nb_NO.UTF-8,
> > nn_NO.UTF-8, sv_SE.UTF-8, tr_TR.UTF-8, vi_VN.UTF-8, zh_CN.UTF-8.
> preceded by spam about NLSPATH?

No, not on my modified clfs yesterday, nor on the LFS-6.3 I'm using

> -- 
> Alexander E. Patrakov

das eine Mal als Tragödie, das andere Mal als Farce
-------------- next part --------------
echo "converting bg" &&
iconv -t utf-8 -f cp1251 -o msgs/ msgs/ 2>>$LOG &&
mv -v msgs/ msgs/ 2>>$LOG &&
echo "converting ja" &&
iconv -t utf-8 -f euc-jp -o msgs/mess.ja.utf msgs/mess.ja 2>>$LOG &&
mv -v msgs/mess.ja.utf msgs/mess.ja 2>>$LOG &&
echo "converting ko" &&
iconv -t utf-8 -f euc-kr -o msgs/mess.ko.utf msgs/mess.ko 2>>$LOG &&
mv -v msgs/mess.ko.utf msgs/mess.ko 2>>$LOG &&
echo "converting ru" &&
iconv -t utf-8 -f koi8-r -o msgs/ msgs/ 2>>$LOG &&
mv -v msgs/ msgs/ 2>>$LOG &&
for M in msgs/mess.{da,de,es,fi,fr,it,nl,pt}; do
        echo "converting $M to utf-8" &&
        echo "converting $M to utf-8" >>$LOG &&
        iconv -t utf-8 -f iso-8859-1 -o ${M}.utf ${M} 2>>$LOG &&
        mv -v ${M}.utf ${M} >>$LOG 2>&1
done &&
for M in msgs/mess.{cs,hr,pl,ro,sl}; do
        echo "converting $M to utf-8" &&
        echo "converting $M to utf-8" >>$LOG &&
        iconv -t utf-8 -f iso-8859-2 -o ${M}.utf ${M} 2>>$LOG &&
        mv -v ${M}.utf ${M} >>$LOG 2>&1
done &&
echo "converting el" &&
iconv -t utf-8 -f iso-8859-7 -o msgs/mess.el.utf msgs/mess.el 2>>$LOG &&
mv -v msgs/mess.el.utf msgs/mess.el 2>>$LOG &&
# looks as if zh_TW.UTF-8 messages will be found,
# and tentatively this sed might still be needed
sed -i 's/mess.??/mess.?? mess.zh_TW.UTF-8/' msgs/ >>$LOG 2>&1 &&
echo "translating man pages" &&
echo "bg" &&
for P in man/bg/*.man; do
      iconv -t utf-8 -f cp1251 -o ${P}.utf $P 2>>$LOG &&
      mv -v ${P}.utf $P >>$LOG 2>&1
done &&
echo "ja" &&
for P in man/ja/*.man; do
        iconv -t utf-8 -f euc-jp -o ${P}.utf $P 2>>$LOG &&
        mv -v ${P}.utf $P >>$LOG 2>&1
done &&
echo "ko" &&
for P in man/ko/*.man; do
        iconv -t utf-8 -f euc-kr -o ${P}.utf $P 2>>$LOG &&
        mv -v ${P}.utf $P >>$LOG 2>&1
done &&
echo "latin1" &&
for P in man/{da,de,en,es,fi,fr,it,nl,pt}/*.man; do
        iconv -t utf-8 -f iso-8859-1 -o ${P}.utf $P 2>>$LOG &&
        mv -v ${P}.utf $P >>$LOG 2>&1
done &&
echo "latin2" &&
for P in man/{cs,hr,pl,ro,sl}/*.man; do
        iconv -t utf-8 -f iso-8859-2 -o ${P}.utf $P 2>>$LOG &&
        mv -v ${P}.utf $P >>$LOG 2>&1
done &&
echo "el" &&
for P in man/el/*.man; do
        iconv -t utf-8 -f iso-8859-7 -o ${P}.utf $P 2>>$LOG &&
        mv -v ${P}.utf $P >>$LOG 2>&1
done &&
echo "seds" &&
sed -i 's at -is@&R@' configure &&
sed -i 's at MANPATH./usr/man@#&@g' src/ &&

More information about the lfs-dev mailing list