preconv and hyphenation

Werner LEMBERG wl at gnu.org
Mon Jan 23 22:37:49 PST 2006


I wrote:

> regarding our discussion about converting to Unicode I just want to
> mention that this process basically disables hyphenation for
> characters which aren't ASCII.  Unfortunately, this is an
> unavoidable problem without a quick fix.
> 
> Reason is that all non-ASCII characters are converted to the \u[...]
> form which no longer takes place in the hyphenation process.

What a nonsense!  Sorry for the confusion.  *Of course* those entities
are hyphenated!  Here are the rules for a sample entity `\[u00C4]'
(`Ä').

  1. Check whether `u00C4' is compliant to the AGL (Adobe Glyph List)
     and represents a Unicode character -> yes.

  2. Try to normalize it, using Unicode's normalization form D ->
     `u0041_0308'.

  3. Map it to a groff glyph name, if possible -> `:A'.

  4. For hyphenation, use the input character associated with `:A' as
     set up with either .tr or .trin.  Using latin1.tmac, for example,
     you find

       .trin \[char196]\[:A]

     `char196' is basically *not* a glyph name but an input character
     (the naming is a historical accident), representing 0xC4.

  5. To support, say, German hyphenation, proper .hcode values must be
     set up (this corresponds to TeX's \lccode and \uccode):

       .hcode Ä ä

     This maps `Ä' to `ä' for hyphenation.

With other words: To make hyphenation work groff needs a proper input
encoding (set up with .tr or .trin), proper .hcode values, and
associated hyphenation patterns set up for this input encoding.
Contrary to TeX, .hcode values can be changed even in the middle of a
paragraph.

I'll eventually add the above explanation to groff.texinfo.


    Werner



More information about the cross-lfs mailing list