This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: locale differences to Li18nux.org locales

To: libc-alpha at sources dot redhat dot com
Subject: Re: locale differences to Li18nux.org locales
From: Bruno Haible <haible at ilog dot fr>
Date: Tue, 6 Feb 2001 22:58:38 +0100 (CET)
Cc: martin dot strassburger at sap dot com
References: <FAFE609CB754D311B60C0008C75D355607D03269@dbwdfx14.wdf.sap-ag.de>

Martin Strassburger wrote:
> Last week I downloaded the package universal locales that was offered at 
> www.li18nux.org.

The download location at li18nux doesn't work for me. I downloaded it
from IBM
http://oss.software.ibm.com/developerworks/opensource/locale?open&l=linuxlst04,t=gr,p=Unicode

> <U00A0> NO-BREAK SPACE  as space,

This is wrong. The isspace/iswspace function is often used for
line-breaking purposes, and the Unicode 3.0 book says on p. 149
"U+0020 and U+00A0 behave differently for line breaking."

> <U064B> ARABIC FATHATAN,
> <U064C> ARABIC DAMMATAN,
> <U064D> ARABIC KASRATAN,
> <U064E> ARABIC FATHA,
> <U064F> ARABIC DAMMA,
> <U0650> ARABIC KASRA,
> <U0651> ARABIC SHADDA,
> <U0652> ARABIC SUKUN as alpha

These are combining characters. What's the purpose of putting them in
category "alpha"? Reasonable programs would apply isalpha() only to
non-combining characters. Otherwise you would need to make all of
U+0300..U+030C alpha as well.

> <U0390> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS,
> <U03B0> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS  as lowercase
> alpha

They have no uppercase equivalent. But it might make sense to make
them lowercase nevertheless, like what is done with U+00DF. I'll
consider a patch for this.

> <U200E> LEFT-TO-RIGHT MARK,
> <U200F> RIGHT-TO-LEFT MARK as control

Control characters are automatically non-printing according to
POSIX. This means that wcswidth() of any string containing these two
characters would return -1, causing lots of problems. Also, I don't
understand why then U+200C and U+200D shouldn't be considered
"control" as well - they are in category "Cf" as well.

Bruno

References:
- locale differences to Li18nux.org locales
  - From: Strassburger, Martin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]