Man and Groff - Author Feedbacks - UTF-8 Comments
wl at gnu.org
Fri Jan 20 12:09:09 PST 2006
> I have checked out today's Groff from Savannah CVS. The test results are
> 2) The relocation stuff segfaults, so I had to disable it by editing
Backtrace, please, or give a recipe to repeat it.
> groff -K KOI8-R -Tutf8 -mandoc /usr/share/man/ru/man5/passwd.5 | iconv
> -f UTF-8 -t //TRANSLIT
> This, of course, scales to more (but not all) languages by changing
> KOI8-R to the character set in which the manual pages for that
An important note to add (citing an email which I've recently sent to
the groff list):
regarding our discussion about converting to Unicode I just want to
mention that this process basically disables hyphenation for
characters which aren't ASCII. Unfortunately, this is an
unavoidable problem without a quick fix.
Reason is that all non-ASCII characters are converted to the \u[...]
form which no longer takes place in the hyphenation process. This
problem will persist until GNU troff natively supports UTF-8 as
input encoding -- hyphenation *must* take place at the input
character level, and I won't repeat Knuth's error who made TeX apply
hyphenation at the glyph level, causing problems for languages which
are represented in more than a single font encoding (Russian, for
With other words, the only people who will be completely happy about
preconv are Vietnamese because they don't use hyphenation at all :-)
> 4) The "-k" and encoding autoguessing is a bad idea because not
> every manual page is tagged properly (e.g., the passwd(5) manual
> page is not tagged).
I don't think so. Nobody is forced to use `-k' by default.
> Everyone will end up using -K with the explicit encoding specified
> (and, in fact, that's Man's, not user's responsibility).
This might be true for man pages, but groff is used for other tasks
> 5) New Groff is still not able to format Japanese manuals. Is there
> any timeline for this?
I won't apply the Debian patch for Japanese support since it lacks a
general solution. Instead, I ask for volunteers which helps me to
make GNU troff really Unicode aware! On the output side some only
minor improvements are necessary to manage character classes (mainly
for CJK scripts); on the input side it's necessary to widen character
handling from 8bit to (signed) 32bit -- this is a lot of work.
More information about the cross-lfs