New personal experimental book [warning: lots of UTF-8 in this]
ken at linuxfromscratch.org
Tue Sep 30 10:10:25 PDT 2008
CC'ing to lfs-dev, if I've remembered to change to a subscribed
On Tue, Sep 30, 2008 at 04:54:33PM +0100, Colin Watson wrote:
> On Tue, Sep 30, 2008 at 03:48:50PM +0100, Ken Moffat wrote:
> > On Tue, Sep 30, 2008 at 12:27:09PM +0100, Colin Watson wrote:
> > > I've been looking into the adoption of man-db in various distributions
> > > lately, and ran into your post archived at
> > > http://linuxfromscratch.org/pipermail/lfs-dev/2008-September/061632.html.
> > > I'm not subscribed to lfs-dev so can't easily reply directly there, but
> > > I wanted to reply to one point as I thought it was a bit odd:
> > >
> > > > Long-term, UTF-8 is the only sensible solution for text encoding,
> > > > in the same way that a terminal on an X desktop is the only way to
> > > > read some languages. In my view, packages such as man-db are
> > > > prolonging the pain of the transition by encouraging people to use
> > > > legacy encodings. But, for me as an English speaker the pain is
> > > > minimal. Others may conclude that the pain of conversion to UTF-8
> > > > should be deferred.
> > >
> > > Modern versions of man-db default to expecting UTF-8 for manual page
> > > source (although if they realise that the page is actually encoded in a
> > > legacy encoding then they'll automatically fall back to that), and will
> > > generate whatever is appropriate for the user's locale.
> > I like the sound of that. It's not the way we've been doing things
> > since we switched to man-db in LFS, and we have text (perhaps
> > carried forward in error) saying that man-db can't display UTF-8.
> > See
> > http://www.linuxfromscratch.org/lfs/view/development/chapter06/man-db.html
> > part 6.45.2 in the middle of the page.
> Ah, that definitely used to be true but is false as of man-db 2.5.0
> (though you should really use at least 2.5.1 - 2.5.0 didn't get the
> encoding fallback logic quite right).
> Since I have the opportunity (and thanks, I hadn't seen that page
> before), it seems worth going through the rest of that page. If I should
> file these as bugs instead, let me know, or feel free to forward this to
> lfs-dev, or whatever.
I suppose one of _us_ ought to file it, once this hits the list
> The first change is a sed substitution to delete the “/usr/man” and
> “/usr/local/man” lines in the man_db.conf file to prevent redundant
> results when using programs such as whatis:
> Do you make /usr/man and /usr/local/man symlinks? If so, I could detect
> that and skip them automatically.
> The second change accounts for programs that Man-DB should be able to
> find at runtime, but that haven't been installed yet:
> I made configure options available for these in 2.5.0, so you could use
> '--with-browser=lynx --with-col=col --with-vgrind=vgrind
> --with-grap=grap' instead.
> Prepare Man-DB for compilation:
> I think I already suggested this to somebody else at LFS, but I'd
> recommend that you use --with-db=gdbm rather than the default of
> Berkeley DB (which is something of an awkward beast, and overkill for
> man-db). This will be the default in man-db 2.5.3.
> And, yes, I think you can get rid of the convert-mans business entirely.
> With the exception of a few hopelessly misencoded pages that are really
> lost causes, man-db can pretty much cope with any of the obvious
> candidates for encoding pages in each language now.
> I noticed a comment in there about Norwegian not working, and have fixed
> it for man-db 2.5.3.
> > > In the distributions I'm most directly involved with, namely Debian and
> > > Ubuntu, everything is set up for UTF-8 output by default, and we've
> > > arranged for the packaging tools to automatically convert pages to UTF-8
> > > on installation with the aid of some helper tools I ship with man-db;
> > > while this latter item has only been running for a few months, it won't
> > > be long until we'll be running with UTF-8 across the board. As soon as
> > > groff upstream finishes off Unicode support then we'll use that and the
> > > whole pipeline will be UTF-8, but for the meantime we recode back and
> > > forward behind the scenes and very few people have to notice or care.
> > I'll also take a look at this part, it sounds good. I hope you're not
> > holding your breath for a UTF-8-capable version of groff ;-)
> Oh, certainly not; I've put a lot of effort into not holding my breath
> for that! That said, I'd be entirely happy to make man-db able to use
> groff-utf8 as an option if that's what you guys would prefer.
I haven't yet looked at what you are doing in 2.5.2, or what
versions of groff you are using in ubuntu and debian, but I'm fairly
sure most LFS users won't want to use groff-utf8 if it isn't needed.
It's only a temporary hack until groff is fixed.
> > > Is there some misunderstanding here about what man-db is doing? If so,
> > > I'd be happy to explain.
> > Thanks for the offer, I might take you up on it in a few weeks. NB
> > my estimates for how long things will take me are always way out, so
> > that might be next year! Depends on how long I spend beating my
> > head against the various versions of mozilla on ppc64, plus whatever
> > goes wrong when I finally upgrade my desktop to current packages :-(
> I know how it is, don't worry. Building distributions is busy work (in
> both senses) ...
> Colin Watson [cjwatson at debian.org]
das eine Mal als Tragödie, das andere Mal als Farce
More information about the lfs-dev