This is the mail archive of the guile@sourceware.cygnus.com mailing list for the Guile project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

I18N (was Re: Doc Tasks)

To: Jim Blandy <jimb at red-bean dot com>
Subject: I18N (was Re: Doc Tasks)
From: Gregg Reynolds <greynolds at greynolds dot com>
Date: Sun, 19 Dec 1999 13:28:36 -0600
CC: guile at sourceware dot cygnus dot com
References: <385A8345.1E25EB6F@greynolds.com> <m3ogbpti9e.fsf@savonarola.red-bean.com>
Reply-To: greynolds at enteract dot com

Jim Blandy wrote:
> 
> >       I18n stuff, since my personal interest in this area is strong.  I'm
> > going over the "mltext" and "mbapi" stuff from the docs directory, but
> > haven't looked at the code itself yet.
> 
> It would be awesome to have someone working on this!  Good luck!
> 
> However, there are some pretty demanding constraints on this work that
> should probably be explained.

Probably even more demanding than you think.  My personal interest is in
Arabic, which is not properly modeled in any encoding I know of,
including Unicode.  If you can model it, you can model anything.

> According to Stallman, the most important immediate application of
> Guile is to replace Emacs's lisp interpreter.  And if you want to

Hmmm.  My own most immediate requirement is tools of any sort that
handle Arabic properly - "properly" being the key word.  It probably
means generalizing things like case-mapping and the addition of a bunch
of metalinguistic codepoints.  In my judgement Unicode is extremely
unlikely to adopt such codepoints; hence for me one requirement is that
I be able to define sets and mappings (e.g. case mappings) explicitly,
and use idiosyncratic encodings.  I won't try to explain what all that
means in an email (it'll be on a webpage one of these days), just wanted
you to know where my head is.  This may mean my real interest is in
extending guile beyond the core, or that I need to convince people that
"core" should mean language-neutral, encoding-agnostic.  Actually emacs
already seems to fit that description pretty well.

> implement Emacs Lisp strings as Guile strings (which seems like the
> most reasonable approach), then you need to choose a representation
> for multi-lingual text in Guile strings which harmonizes well with
> Emacs.

I'm not familiar with Emacs internals; is there someplace I can find
some (breif!) documentation?  I know the Mule folks are working on this
sort of thing in conjunction with the Omega people.

> The way the winds are blowing, it looks like Unicode is eventually
> going to take over the world.  GTk, for example, uses UTF-8.  However,

Too bad.  I'm ambivalent about Unicode.  My strong suspicion is that it
will end up having been a Very Bad Thing for many peoples of the world. 
I think Guile (all software actually) should accept it as a transfer
encoding, but should aim considerably higher as far as text
intelligence.  Also I would caution against equating Unicode and
UTF-nnn.  Not the same.  Unfortunately the Unicoders seem disinclined to
correct some of the stupider parts of the standard, like mixing semantic
encoding (char-->int) and bit packaging (int-->seq of bits).  Your paper
gets it right.

> Emacs currently uses its own encoding for multilingual text, called
> Emacs-Mule.  Emacs-Mule is completely different from UTF-8.  It's not
> even possible to convert Emacs-Mule text to UTF-8 and back without
> losing information.  However, Emacs is going to switch to UTF-8; the
> folks doing the multilingual work expect it to take at least a year,
> and perhaps longer.
> 
> So how can Guile prepare itself for the future (UTF-8) while still
> meeting its immediate obligations (GNU Emacs)?
> 
> The key observation here is that, while UTF-8 and Emacs-Mule are very
> different, actually, for all the properties you really care about when
> writing C code, they are the same.  The differences can all be hidden
> away happily in tables and converters you'd need anyway.
> 
> So, what I'm trying to present in mbapi.texi is an interface which you
> can use to handle Guile text.  By an interface, I don't just mean
> functions and types, but also a set of *valid assumptions* you can
> make when you write your code.  It's a set of rules.  And if you
> follow these rules, your code will work, without modification, on
> either Emacs-Mule or UTF-8 text.  You just need to recompile.
> 
> I don't know whether mbapi.texi does an adequate job of explaining
> these issues, and making the strategy clear.  I hope it does.

I think so.  I use slightly different language (e.g. byte=seq of bits,
charcode=int, etc.), but the basic ideas look correct.  I think as far
as a documentation strategy, though, that it would make sense to write a
separate article discussing encodings, definitions of chars, etc., and
then make the API reference doc more concise.  The general discussion
space for chars, encodings, etc. is now very heavily polluted, IMHO;
even the efforts to clarify and standardize (esp. Unicode and W3C stuff)
have only served to stir the mud.  I think one must work it over in
pretty fine detail.  And, in case it's not clear, I would be adamantly
opposed to adopting Unicode's metalanguage, which I believe to be
fundamentally broken.

I've got some ideas along these lines; will try to post asap.

> So my big plan was: use the interface described in mbapi.texi as both
> the external interface that Guile's clients need to conform to, and
> also internally, in Guile's code itself.  At first, Guile would use
> Emacs-Mule, and Guile could be happily integrated into Emacs.  Then,
> when Emacs switched to UTF-8, Guile would to.  And, with adequate
> attention to the interface, this switch would be painless for almost
> all Guile clients.
> 
I think something like this is definitely the way to go.  A small
component that provides character/text services, and hides the data
format.  Clients just have to agree to abide by the protocol.  The
linguistic "character" semantics of "char" go away.  (Which reminds me,
I think it good practice to use stuff like #define BYTE unsigned char
where that is the semantics, but I guess that's a topic for a different
thread.)

> Of course, it should go without saying that Maciej is the one you need
> to please, not me.  He decides whether your changes go in or not.  But
> since Guile is a GNU project, Maciej is answerable to Stallman, and
> Stallman sees Emacs as the driving application for Guile.

Ok.  I guess maybe I'm not that far from such a position;  what
interests me about guile is the prospect of using it in an editor with
native support for structured editing (i.e. it understands *ML-style
languages) and for any language.  I guess that would be emacs in another
incarnation.

cheers,

-gregg

References:
- Doc Tasks
  - From: Gregg Reynolds
- Re: Doc Tasks
  - From: Jim Blandy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]