Please review for Man-DB changes

DJ Lucas dj at linuxfromscratch.org
Wed Oct 22 23:38:10 PDT 2008


Alexander E. Patrakov wrote:
> DJ Lucas wrote:
>   
>> Guys, I'm obviously lacking creativity tonight. ;-)  I've posted a
>> local copy of the book in my home dir on quantum.  I would like
>> someone else (or many somebody elses) to review the textual changes
>> on the man-db page for both technical and grammatical errors.
>>
>> http://www.linuxfromscratch.org/~dj/LFS-MANDB/chapter06/man-db.html
>>
>> Thanks in advance.
>>     
>
>   
>> Some packages provide UTF-8 man pages, which previous versions of
>> Man-DB were unable to display. This limitation has been overcome in
>> recent versions, and Man-DB can now convert man pages from legacy
>> 8-bit encodings to UTF-8 (and vice-versa) on the fly.
>>     
>
> I don't like the wording here. We need to mention two features separately:
>
> 1) conversion TO arbitrary encoding on the fly (was present in old 
> versions of Man-DB, too, but is just a distracting factor here);
>
> 2) expectations about the input (changed, was hard-coded, now, in 
> addition, looks into the extension of the directory).
>
> Better, but IMHO still not acceptable for anything except -dev book:
>
> ================
> Some packages provide UTF-8 man pages, which previous versions of Man-DB 
> were unable to display correctly, because the expected (8-bit) encoding 
> for each language was hard-coded in the source of Man-DB. Now Man-DB 
> uses the extension of the directory name in order to determine the 
> encoding of the manual pages stored there, and uses the built-in table 
> only if the encoding is not speciried in the directory name. E.g., 
> because of "UTF-8" in the directory name, it knows that all manual pages 
> residing in /usr/share/man/fr.UTF-8 are UTF-8 encoded and, according to 
> the built-in table, expects all manual pages residing in 
> /usr/share/man/ru to be in KOI8-R.
>
> On the other hand, the setup in Fedora Core expected all manual pages to 
> be UTF-8 encoded and stored in directories without suffixes ".UTF-8".
> ================
>
> Bruce: could you please try to criticise or shorten this?
>
>   
OK.  Thank you.   Everyone fell free to have at it with that text or 
something similar.  I'm out of time for tongiht and I'll be out until 
Friday evening CDT.
>> This used to be
>>     
>
> "This" => "Disagreement about the expected encoding of manual pages".
>
>   
OK.  That sounded better.  I lost that text about 10 edits back.
>> a rather annoying problem across different distributions, as packages
>> written for one distribution would require changes to work on
>> another.
>>     
>
>   
>> This script was written, and included in LFS to overcome
>> this problem. The script will allow you to pass an in and out value
>> to convert man pages to and from legacy 8-bit and UTF-8 encodings.
>>     
>
> Technically, we don't need it. But it is still abused in BLFS to convert 
> Midnight Commander hints after patching. We definitely don't need the 
> script so close to the beginning of the page, I propose to move it to 
> the "Non-English Manual Pages in LFS" section.
>
>   
It is used in several places in BLFS right now.  Saving the for in do 
loops justifies its existence, but you are probably correct in that it 
could move further down the page.
>>  6.47.2. Non-English Manual Pages in LFS
>>
>> Linux distributions have different policies concerning the character
>> encoding in which manual pages are stored in the filesystem. E.g.,
>> RedHat stores all manual pages in UTF-8, while Debian previously used
>>     
>
> and still uses predominantly
>
>   
 From what Colin Watson said back in September, they are moving to all 
UTF-8 pages.

"In the distributions I'm most directly involved with, namely Debian and 
Ubuntu, everything is set up for UTF-8 output by default, and we've 
arranged for the packaging tools to automatically convert pages to UTF-8 
on installation with the aid of some helper tools I ship with man-db;"
>> language-specific (mostly 8-bit) encodings. As mentioned above, this
>> leads to incompatibility of packages with manual pages designed for
>> different distributions.
>>     
>
>   
>> LFS previously used the same convention as Debian. This was chosen
>> because Man-DB did not understand man pages stored in UTF-8 at the
>> time of its introduction into LFS. For our purposes at that time,
>> Man-DB was preferable to Man as it worked without any additional
>> configuration in any locale.
>>     
>
> OK.
>
>   
>> This is still true today as Man-DB with
>> Debian patched Groff will now properly convert UTF-8 encoded man
>> pages to the user's locale on the fly.
>>     
>
> Only if they are placed correctly.
>
>   
Yeah, that should be explained right there.
>> Additionally, this combination
>> provides support for Chinese and Japanese locales, and limited
>> support for Korean, whereas Man does not.
>>     
>
> Wrong. Man does work (if we ignore translations of error messages) with 
> the same languages if used together with Debian-patched groff. The only 
> difference is that Man has the pipeline constructed in the configuration 
> file by the user, while Man-DB constructs the pipeline programmatically 
> by applying knowledge about the expected input and output encoding of 
> various programs. Obviously, a user can write the same pipeline into Man 
> configuration file, but this would take several pages to explain.
>
>   
OK.  So append "without considerable modifications to the default 
configuration."
>> The current offering of Man
>> as used in RedHat requires major modifications to both the Man and
>> Groff packages,
>>     
>
> true
>
>   
>> and still falls short on Chinese, Japanese, and
>> Korean encodings.
>>     
>
> not sure.
>
>   
Maybe not if used against Debian patched Groff.  Upstream Groff is still 
broken I believe because of the lack of line breaking code.  I should 
probably just eliminate that all together because it can work.
>> Finally, it should be noted that most distributions, including
>> Debian, are rapidly migrating to all UTF-8 encoded man pages.
>>     
>
> Wrong. Most distributions (including Gentoo and Arch) completely ignore 
> the problem, present to the user the unreadable mix of 8-bit and UTF-8 
> pages in the same directory, and are thus broken.
>
> The leading and government-sponsored Russian distribution (Alt Linux) 
> still uses 8-bit (KOI8-R) manual pages. The only distributions that 
> converted fully are RedHat derivatives. Debian only starts to get ready.
>
>   
OK.  But I am still under the impression that is the expected future.  
That doesn't change the fact that it is totally incorrect as written and 
needs to corrected to show the proper state.
>> Upstream packagers will very likely drop legacy encodings in favor of
>> UTF-8, though adoption has been slow due to the hacks required to
>> make the current Man and Groff packages work correctly together.
>>     
>
> I don't know how to comment on this. Modern desktop packages come with 
> DocBook documentation, not manual pages.
>   
:-)  The point of both of the above points is to make known that we will 
be seeing more UTF-8 encoded manual pages...especially with both Debian 
and RedHat going that route.  It still needs rewording, or removal.

>   
>> The relationship between language codes and the expected encoding of
>> legacy manual pages is listed below.
>>
>> Table 6.1. <snip>
>>     
>
> Up to this point, nothing is said (except in the text I proposed at the 
> very top of my post) HOW Man-DB determines the encoding of a manual 
> page. Theory should be given before examples, not in examples. This 
> worked before, because the whole theory was expressed in the table.
>
>   
IMO, the text you provided previously about the named directories, and 
mentioned a second time when discussing the new ability of Man-DB should 
be sufficient.  Explaining the whole process from command to display is 
kind of overkill, though it would sufficiently justify the choice of 
man-db with Debian Groff over Man/Groff.  Short of the named 
directories, it is covered thoroughly in your man-i18n hint.  Would a 
link there make sense being that it deals with Man instead of Man-DB?
>> If upstream distributes the manual pages in a legacy encoding the
>> manual pages can simply be copied to /usr/share/man/<language code>.
>> For example, German manual pages can be installed with the following
>> commands:
>>
>> mkdir -p /usr/share/man/de cp -rv man? /usr/share/man/de
>>     
>
> OK
>
>   
>> If upstream distributes manual pages in UTF-8 (i.e., “for RedHat”)
>> instead of the encoding listed in the table above, they can either be
>> converted from UTF-8 to the encoding listed in the table above, or
>> they can be installed directly into /usr/share/man/<language
>> code>.UTF-8.
>>     
>
> OK. Here the script would go. Also I'd like to see comparison of both 
> approaches. E.g., if the manual pages are installed with a Makefile, it 
> is often easier to convert manual pages before installation than to 
> patch the Makefile.
>   
Sounds good.
>   
>> For example, to install Spanish manual pages
>>     
>
> Let's drop this buggy package and explain both techniques with French 
> manual pages.
>
>   
OK.  I thought about doing that too, but French man pages include shell 
scripts to do the conversion before installation so it's not a good 
place to show off convert-mans.  Granted, the necessity of convert-mans 
is gone, I'd still like to keep it as opposed to littering all of the 
books with for in do loops.  As it is now, our current policy (legacy 
encodings) still works and should stay, but I'd imagine that we'll be 
seeing more UTF-8 pages in the future.  I'm pretty much done until late 
Friday (around this time) or Saturday.  I'll find one with a make file 
to use as an example.

Thanks again for all of your help on this Alexander.

-- DJ Lucas


-- 
This message has been scanned for viruses and
dangerous content, and is believed to be clean.




More information about the lfs-dev mailing list