Man and Groff - Author Feedbacks - UTF-8 Comments

Werner LEMBERG wl at gnu.org
Fri Jan 20 12:09:09 PST 2006


> I have checked out today's Groff from Savannah CVS. The test results are 
> below.
> 
> 2) The relocation stuff segfaults, so I had to disable it by editing 
> src/libs/libgroff/Makefile.sub.

Backtrace, please, or give a recipe to repeat it.

> groff -K KOI8-R -Tutf8 -mandoc /usr/share/man/ru/man5/passwd.5 | iconv 
> -f UTF-8 -t //TRANSLIT
> 
> This, of course, scales to more (but not all) languages by changing
> KOI8-R to the character set in which the manual pages for that
> language.

An important note to add (citing an email which I've recently sent to
the groff list):

  regarding our discussion about converting to Unicode I just want to
  mention that this process basically disables hyphenation for
  characters which aren't ASCII.  Unfortunately, this is an
  unavoidable problem without a quick fix.

  Reason is that all non-ASCII characters are converted to the \u[...]
  form which no longer takes place in the hyphenation process.  This
  problem will persist until GNU troff natively supports UTF-8 as
  input encoding -- hyphenation *must* take place at the input
  character level, and I won't repeat Knuth's error who made TeX apply
  hyphenation at the glyph level, causing problems for languages which
  are represented in more than a single font encoding (Russian, for
  example).

  With other words, the only people who will be completely happy about
  preconv are Vietnamese because they don't use hyphenation at all :-)

> 4) The "-k" and encoding autoguessing is a bad idea because not
> every manual page is tagged properly (e.g., the passwd(5) manual
> page is not tagged).

I don't think so.  Nobody is forced to use `-k' by default.

> Everyone will end up using -K with the explicit encoding specified
> (and, in fact, that's Man's, not user's responsibility).

This might be true for man pages, but groff is used for other tasks
also.

> 5) New Groff is still not able to format Japanese manuals. Is there
> any timeline for this?

I won't apply the Debian patch for Japanese support since it lacks a
general solution.  Instead, I ask for volunteers which helps me to
make GNU troff really Unicode aware!  On the output side some only
minor improvements are necessary to manage character classes (mainly
for CJK scripts); on the input side it's necessary to widen character
handling from 8bit to (signed) 32bit -- this is a lot of work.


    Werner



More information about the cross-lfs mailing list