UTF-8

Joe Ciccone jciccone at gmail.com
Sat Jan 21 17:06:18 PST 2006


After doing some research on my own. I personaly would like to build
without UTF-8 support because of the following problems that have been
mentioned.

1. Man isn't UTF-8 ready yet, but support is planned.
      Man-DB seems like overkill for this application.
2. Groff isn't UTF-8 ready yet, but support is in the working.
3. A cd burned using a UTF-8 locale will only be readable in UTF-8 systems.
4. Various incompatibilities with the packages in BLFS that were
mentioned in this thread and probably more that no-one has looked into yet.

The most interesting thing from a programmers point of view is the way
the characters are handled. This is the reason why incompatibilites
exist. A non-UTF-8 character, char, is 4 bits whereas a UTF-8 character,
wchar, is 32bits. It's hard to write code to properly support both types
of locales. Also, wchar processing code is slightly slower then char
processing code. Most programmers try to avoid it, including myself.

One thing that makes me lean towords UTF-8 support is the fact that some
locales only work with UTF-8. Right now, that is not enough to swing me.
But, That does not mean that UTF-8 shouldn't be looked into for the
future when UTF-8 becomes the standard, and no doubt it will eventualy
because of languages like russian. Those people shouldn't be deprived of
support for their languages, but at the same time its hard not to break
what is already working.

It might not be a bad idea to wait until upstream(groff, man,
everything) supports UTF-8 better. I wish I had know all of this earlier
because I could have presented it before the UTF-8 book merged.



More information about the cross-lfs mailing list