Please review for Man-DB changes

Alexander E. Patrakov patrakov at gmail.com
Wed Oct 22 21:35:01 PDT 2008


DJ Lucas wrote:
> Guys, I'm obviously lacking creativity tonight. ;-)  I've posted a
> local copy of the book in my home dir on quantum.  I would like
> someone else (or many somebody elses) to review the textual changes
> on the man-db page for both technical and grammatical errors.
> 
> http://www.linuxfromscratch.org/~dj/LFS-MANDB/chapter06/man-db.html
> 
> Thanks in advance.

> Some packages provide UTF-8 man pages, which previous versions of
> Man-DB were unable to display. This limitation has been overcome in
> recent versions, and Man-DB can now convert man pages from legacy
> 8-bit encodings to UTF-8 (and vice-versa) on the fly.

I don't like the wording here. We need to mention two features separately:

1) conversion TO arbitrary encoding on the fly (was present in old 
versions of Man-DB, too, but is just a distracting factor here);

2) expectations about the input (changed, was hard-coded, now, in 
addition, looks into the extension of the directory).

Better, but IMHO still not acceptable for anything except -dev book:

================
Some packages provide UTF-8 man pages, which previous versions of Man-DB 
were unable to display correctly, because the expected (8-bit) encoding 
for each language was hard-coded in the source of Man-DB. Now Man-DB 
uses the extension of the directory name in order to determine the 
encoding of the manual pages stored there, and uses the built-in table 
only if the encoding is not speciried in the directory name. E.g., 
because of "UTF-8" in the directory name, it knows that all manual pages 
residing in /usr/share/man/fr.UTF-8 are UTF-8 encoded and, according to 
the built-in table, expects all manual pages residing in 
/usr/share/man/ru to be in KOI8-R.

On the other hand, the setup in Fedora Core expected all manual pages to 
be UTF-8 encoded and stored in directories without suffixes ".UTF-8".
================

Bruce: could you please try to criticise or shorten this?

> This used to be

"This" => "Disagreement about the expected encoding of manual pages".

> a rather annoying problem across different distributions, as packages
> written for one distribution would require changes to work on
> another.

> This script was written, and included in LFS to overcome
> this problem. The script will allow you to pass an in and out value
> to convert man pages to and from legacy 8-bit and UTF-8 encodings.

Technically, we don't need it. But it is still abused in BLFS to convert 
Midnight Commander hints after patching. We definitely don't need the 
script so close to the beginning of the page, I propose to move it to 
the "Non-English Manual Pages in LFS" section.

>  6.47.2. Non-English Manual Pages in LFS
> 
> Linux distributions have different policies concerning the character
> encoding in which manual pages are stored in the filesystem. E.g.,
> RedHat stores all manual pages in UTF-8, while Debian previously used

and still uses predominantly

> language-specific (mostly 8-bit) encodings. As mentioned above, this
> leads to incompatibility of packages with manual pages designed for
> different distributions.

> LFS previously used the same convention as Debian. This was chosen
> because Man-DB did not understand man pages stored in UTF-8 at the
> time of its introduction into LFS. For our purposes at that time,
> Man-DB was preferable to Man as it worked without any additional
> configuration in any locale.

OK.

> This is still true today as Man-DB with
> Debian patched Groff will now properly convert UTF-8 encoded man
> pages to the user's locale on the fly.

Only if they are placed correctly.

> Additionally, this combination
> provides support for Chinese and Japanese locales, and limited
> support for Korean, whereas Man does not.

Wrong. Man does work (if we ignore translations of error messages) with 
the same languages if used together with Debian-patched groff. The only 
difference is that Man has the pipeline constructed in the configuration 
file by the user, while Man-DB constructs the pipeline programmatically 
by applying knowledge about the expected input and output encoding of 
various programs. Obviously, a user can write the same pipeline into Man 
configuration file, but this would take several pages to explain.

> The current offering of Man
> as used in RedHat requires major modifications to both the Man and
> Groff packages,

true

> and still falls short on Chinese, Japanese, and
> Korean encodings.

not sure.

> Finally, it should be noted that most distributions, including
> Debian, are rapidly migrating to all UTF-8 encoded man pages.

Wrong. Most distributions (including Gentoo and Arch) completely ignore 
the problem, present to the user the unreadable mix of 8-bit and UTF-8 
pages in the same directory, and are thus broken.

The leading and government-sponsored Russian distribution (Alt Linux) 
still uses 8-bit (KOI8-R) manual pages. The only distributions that 
converted fully are RedHat derivatives. Debian only starts to get ready.

> Upstream packagers will very likely drop legacy encodings in favor of
> UTF-8, though adoption has been slow due to the hacks required to
> make the current Man and Groff packages work correctly together.

I don't know how to comment on this. Modern desktop packages come with 
DocBook documentation, not manual pages.

> The relationship between language codes and the expected encoding of
> legacy manual pages is listed below.
> 
> Table 6.1. <snip>

Up to this point, nothing is said (except in the text I proposed at the 
very top of my post) HOW Man-DB determines the encoding of a manual 
page. Theory should be given before examples, not in examples. This 
worked before, because the whole theory was expressed in the table.

> If upstream distributes the manual pages in a legacy encoding the
> manual pages can simply be copied to /usr/share/man/<language code>.
> For example, German manual pages can be installed with the following
> commands:
> 
> mkdir -p /usr/share/man/de cp -rv man? /usr/share/man/de

OK

> If upstream distributes manual pages in UTF-8 (i.e., “for RedHat”)
> instead of the encoding listed in the table above, they can either be
> converted from UTF-8 to the encoding listed in the table above, or
> they can be installed directly into /usr/share/man/<language
> code>.UTF-8.

OK. Here the script would go. Also I'd like to see comparison of both 
approaches. E.g., if the manual pages are installed with a Makefile, it 
is often easier to convert manual pages before installation than to 
patch the Makefile.

> For example, to install Spanish manual pages

Let's drop this buggy package and explain both techniques with French 
manual pages.

-- 
Alexander E. Patrakov



More information about the lfs-dev mailing list