New personal experimental book [warning: lots of UTF-8 in this]

Ken Moffat ken at linuxfromscratch.org
Sat Sep 13 08:31:48 PDT 2008


On Sat, Sep 13, 2008 at 05:12:00PM +0600, Alexander E. Patrakov wrote:
> 
> Let's clarify the situation a bit. There are three possible outcomes for "man 
> foo" in the ru_RU.UTF-8 locale:
> 
> 1) Glibberish (unacceptable, but, unfortunately, what happens if the system is 
> misconfigured by an English-speaking editor who doesn't know how to test the 
> configuration)
> 2) "No such manual page" (well, OK if it indeed doesn't exist)
> 3) English manpage (acceptable, although not ideal)
> 4) Russian manpage.
> 
 From the box I'm using today (clfs amd pure64 from a few months ago,
with your ncursesw change, man-1.6e (straight configure without
specifying any lang value), groff-utf8

ken at bluesbreaker ~ $LC_ALL=ru_RU.UTF-8 man foo
Ничего про foo в руководстве нет
ken at bluesbreaker ~ $

 I can't read it, but it looks plausibly translated.  For pages that
exist in russian on my system, see examples below.

 After installing groff-utf8 I make the following change to man.conf
to actually use it:

  sed -i /^NROFF/s'/nroff -Tlatin1/groff-utf8 -Tutf8/' /etc/man.conf


 This system for my use is intended to support _only_ UTF-8 locales,
and I don't have any pressing need to convert any other text
(sometimes I get mail where e.g. '£' (pound sign) is replaced, but I
can live with that).  People who have to use files in other
encodings will need to find solutions which work for them.

 The following examples are intended to show what appears to work -
they are pasted from urxvt.  I can't read the ideograms, but they
look plausible.  For console users, all the alphabetic versions
(latin alphabets, cyrillic alphabets, greek) are expected to work
e.g. with my sigma-consolefonts font if you have nothing better.
Sorry about the long lines in a few of these.

 Long-term, UTF-8 is the only sensible solution for text encoding,
in the same way that a terminal on an X desktop is the only way to
read some languages.  In my view, packages such as man-db are
prolonging the pain of the transition by encouraging people to use
legacy encodings.  But, for me as an English speaker the pain is
minimal.  Others may conclude that the pain of conversion to UTF-8
should be deferred.

1. Versions of apropos.1 from man

bg_BG.UTF-8
ИМЕ
       apropos - търсене на низ в базата от данни на whatis

cs_CZ.UTF-8
JMÉNO
       apropos - hledej řetězec v databázi whatis

da_DK.UTF-8
NAVN
       apropos - gennemsøg 'whatis' databasen for tekststrenge

de_DE.UTF-8
NAME
       apropos - durchsucht die whatis Datenbank nach Zeichenketten

el_GR.UTF-8
ONOMA
       apropos - ερευνά τη βάση δεδομένων whatis για συμβολοσειρές

es_ES.UTF-8
NOMBRE
       apropos - busca `cadenas' en la base de datos "whatis"

fi_FI.UTF-8
NIMI
       apropos - etsi whatis-tietokannasta merkkijonoja

fr_FR.UTF-8
NOM
       apropos - recherche de chaînes de caractères dans la base de
données whatis

hr_HR.UTF-8
IME
       apropos - traži niz u whatis bazi podataka

it_IT.UTF-8
NOME
       apropos - ricerca stringhe nel database di whatis

ja_JP.UTF-8
名前
       apropos - whatis データベースより文字列を検索する。


ko_KR.UTF-8
NAME
       apropos - whatis 데이타베이스의 문자열을 검색한다

nl_NL.UTF-8
NAAM
       apropos - zoek een gegeven string in de whatis database

pl_PL.UTF-8
NAZWA
       apropos - wyszukuje łańcuchy znaków w bazie whatis

pt_PT.UTF8
NOME
       apropos - procura `strings' na base de dados "whatis"

ro_RO.UTF-8
NUME
       apropos - caută şiruri de caractere în baza de date whatis

sl_SI.UTF-8
IME
       apropos - poišči ključno besedo v datoteki whatis


2. Various pages from shadow-4.1.0 before debian took it over.
 Some of these are only minimally translated.

cs_CZ.UTF-8 vipw (8)
JMÉNO
       vipw, vigr - slouží k úpravě souborů password, group,
shadow-password a shadow-
       group.

de_DE.UTF-8 vipw (8)
NAME
       vipw, vigr - bearbeitet die Passwort-, Gruppen-,
Shadow-Passwort- oder Shadow-
       Gruppen-Datei

es_ES.UTF-8 vipw (8)
NOMBRE
     vipw, vigr — editan los ficheros de cuentas y grupos

fi_FI.UTF-8 passwd (1)
NIMI
       passwd - päivitä käyttäjän todennustunnukset

fr_FR.UTF-8 vipw (8)
NOM
       vipw, vigr - éditer les fichiers passwd, group, shadow ou
gshadow

hu_HU.UTF-8 passwd (5)
NÉV
       passwd - Jelszófájl

id_ID.UTF-8 useradd (8)
NAME
       useradd - Membuat user baru atau memperbarui informasi
tentang user baru

it_IT.UTF-8 vipw (8)
NOME
       vipw, vigr - edit the password, group, shadow-password or
shadow-group file
(only a few of the headings have been translated)

ja_JP.UTF-8 8 vipw (8)
名前
       vipw, vigr - password, group とそれぞれの shadow
ファイルを編集する

ko_KR.UTF-8 vipw (8)
NAME
     vipw — 패스워드 파일 편집

pl_PL.UTF-8 vipw (8)
NAZWA
       vipw, vigr - edytuj plik haseł, grup lub ich wersji
chronionych

pt_BR.UTF-8 passwd (5)
NOME
       passwd - arquivo de senhas

ru_RU.UTF-8 vipw (8)
НАЗВАНИЕ
       vipw, vigr - служат для редактирования файлов паролей, групп,
теневых паролей
       пользователей или групп.

sv_SE.UTF-8 vipw (8)
NAMN
       vipw, vigr - redigera lösenordet, grupp, skugglösenord eller
skuggruppfil

tr_TR.UTF-8 passwd (5)
İSİM
       passwd - parola dosyası

zh_CN.UTF-8 passwd (5)
NAME 名称
       passwd - 密码文件

 on stderr some messages like
<standard input>:45: warning [p 1, 2.2i]: cannot adjust line
which is not unexpected with groff-utf.

zh_TW.UTF-8 passwd (5)
NAME 名稱
       passwd - 密碼檔案
 again, error messages


3. From vim-7.1, with a pair of seds on the Makefile to put the UTF-8
pages into /{fr,it,pl,ru}/ directories without .UTF-8 or KOI8-R in
the names:

fr_FR.UTF-8
NOM
       vim - Vi IMproved, éditeur de texte pour programmeurs

it_IT.UTF-8
NOME
       vim - VI Migliorato, un editor di testi per programmatori

pl_PL.UTF-8
NAME
       vim - Vi rozbudowany, edytor tekstu dla programisty

ru_RU.UTF-8 1
ИМЯ
       vim - Vi IMproved (Улучшенный Vi), текстовый редактор для
программистов


4.  Man foo
 The following locales provide an English error message:
hu_HU.UTF-8, id_ID.UTF-8, ja_JP.UTF-8, ko_KR.UTF-8, nb_NO.UTF-8,
nn_NO.UTF-8, sv_SE.UTF-8, tr_TR.UTF-8, vi_VN.UTF-8, zh_CN.UTF-8.

 The following are known to have translations:
bg_BG.UTF-8 В ръководството няма страница за foo
cs_CZ.UTF-8 Žádný záznam pro foo
da_DK.UTF-8 Intet opslag for foo
de_DE.UTF-8 Keine Handbuchseite für foo
el_GR.UTF-8 Δεν υπάρχει σελίδα εγχειριδίου για foo
es_ES.UTF-8 No hay ninguna página sobre foo
fi_FI.UTF-8 Man-sivua foo ei löydy
fr_FR.UTF-8 Il n'y a pas de page de manuel pour foo.
hr_HR.UTF-8 Stranice foo nema
it_IT.UTF-8 Non c'è una voce per foo
nl_NL.UTF-8 Ik heb niets over foo, geloof ik.
pl_PL.UTF-8 Nie ma strony podręcznika dla foo
pt_PT.UTF-8 Não existe a entrada foo
ro_RO.UTF-8 Nici o intrare în manual pentru foo
ru_RU.UTF-8 Ничего про foo в руководстве нет
sl_SI.UTF-8 Strani za foo ni
zh_TW.UTF-8 不存在 foo 的使用手冊

ĸen
-- 

das eine Mal als Tragödie, das andere Mal als Farce



More information about the lfs-dev mailing list