This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: i18n; wide characters; Guile



>The answer can only be UCS4.  It's no surprise that all reasonable
>i18n developers (this excludes those at IBM) use a 32bit type for
>wchar_t.

*ugh*

Can you be more specific about why 32 bits are needed?  Which
character sets does Unicode not accomodate?  Or is that the wrong
question for me to ask?

>This may sound like a big waste of space but if used correctly it
>isn't.  Normally string are not meant to contain whole text books but
>instead are rather short.  This means there is not that much
>redundancy.  If you need to store large texts you can still fall back
>on a multibyte encoding, perhaps offer several of them so that the
>most effective can be chosen.

This argument is not entirely reassuring to me.  If one thinks mostly
about processing text streams, sure, this is fine.  However, I am more
interested in interactive applications like Emacs, and related things
with wider audiences.  In such applications there are no clear
boundaries at which it is convenient to convert between a dense form,
like UTF-8, and a sparse but consistent form, like UCS2.  An Emacs
buffer must hold large amounts of text, and must also serve as the
operand to editing and searching commands.  It is terribly clumsy to
use a variable-length encoding in buffers.  Since the buffer
representation must be the foundation of all other i18n support, it's
important to get it right.  Doubling the text storage required isn't
so unreasonable; quadrupling it is.