Please review for Man-DB changes
Alexander E. Patrakov
patrakov at gmail.com
Sat Oct 25 08:47:39 PDT 2008
DJ Lucas wrote:
> Many other distributions ignore the on disk encodings completely,
> leaving the end user with a mix of improperly encoded manual pages.
Well, the end user doesn't care how the manual pages are encoded on
disk. The only thing that matters is if they are displayed correctly.
And I can't translate the sentence into Russian, because I don't know
how an encoding can be ignored by the distribution. Issues can be
ignored, and encodings can be mishandled.
And you lost the important bit from your previous mail, that in such
distributions some pages (that match the de-facto Man setup) are
readable, while others display as completely "illegible" lines of
And BTW, Lingvo (the leading online English<->Russian dictionary)
doesn't even list your intended meaning among the list of available
translations for "illegible". They think that this word can apply only
to handwriting or typesetting, and is a synonym for "blurry", or "too
small to read". I.e., it means something which can be characterized with
a certain degree of "illegibility", while we are talking about perfectly
displayed, but wrong characters (and one cannot talk about "more
correct" or "less correct" characters). So, please choose another word
> When man encounters an unexpected encoding, it will display the contents
> as configured, resulting in completely illegible text.
Man (original) doesn't _know_ the encoding. It just passes the manual
page through a pipeline designed (deliberately or by copying others'
setup blindly) to process text in a certain encoding. Garbage in,
garbage out. Yes, that's essentially what you said, but not all Man
implementations have enough brains to "expect" some encoding - the
original Man just pipes text through the static user-configured pipeline.
Sorry, it is too late here for me to try suggesting a better wording. I
will do this tomorrow if you don't do it yourself while I sleep.
>>> Man-DB uses a
>>> built-in table (see below) to find the correct serach directory for
>>> manual pages based on the user's locale settings.
>> No, it doesn't look into the table in this case. See add_nls_manpath()
>> in http://www.chiark.greenend.org.uk/~cjwatson/bzr/man-db/trunk/src/manp.c
>> It iterates over all subdirectories and tests whether the subdirectory
>> is for the user's language, completely disregarding the encoding.
> ...ships with manual pages in legacy encodings. Man-DB uses a built-in
> table (see below) to determine the on disk encoding of the manual pages
> found for a user's locale. If the directories found do not contain the
> ".UTF-8" extension, Man-DB checks the table, and performs the necessary
> conversion. E.g., because of "UTF-8" in the directory name...
It doesn't work this way. Suppose that the user's locale is
ll_CC.CODESET. Man looks for subdirectories of /usr/share/man that,
after removing a possible suffix, reduce to either ll_CC or ll. For each
of the directories found with a suffix, it uses the suffix as the
encoding. If the directory has no suffix, Man-DB checks the table.
"UTF-8" has no special meaning, but your text creates a false impression
that it does. E.g., if /usr/share/man/ru.CP1251 existed, Man-DB would
expect to find CP1251-encoded manual pages there. Again, please read the
source. Oh, you did.
> Some interesting reading in the source. Looks like at least
> unpack_locale_bits() does not care what the codeset is, but it's checked
> in encodings.c. So:
> ...If the directories found do not contain an extension, Man-DB checks
> the table, and performs the necessary conversion. E.g., because of
> "UTF-8" extension in the directory name...
It always performs the necessary conversion (e.g., in ru_RU.KOI8-R
locale, it can use manual pages from /usr/share/man/ru.UTF-8), so let's
drop or move "and performs the necessary conversion". Also, in UTF-8
locales, it does _double_ conversion: first to the encoding from the
table, then (after processing with Groff) back, because Groff doesn't
understand UTF-8. Other than that, good.
Alexander E. Patrakov
More information about the lfs-dev