This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/12051] New: CEO has confusing differences across locales
- From: "bonzini at gnu dot org" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sources dot redhat dot com
- Date: 24 Sep 2010 12:47:47 -0000
- Subject: [Bug localedata/12051] New: CEO has confusing differences across locales
- Reply-to: sourceware-bugzilla at sourceware dot org
According to POSIX 2008, there was a requirement in older POSIX that range
expressions be treated as CEO (collating element order) for all locales. POSIX
mentions some disadvantages of CEO, but one in particular is omitted---and glibc
has it: even when only considering ASCII characters and a single implementation,
the behavior with respect to case varies across locales: in some locales,
"[a-e]" may match either 'A' or 'E', while in others it will match none.
CEO in glibc is inconsistent for these locales:
ar_SA cs_CZ hr_HR hsb_DE is_IS km_KH lo_LA lt_LT lv_LV or_IN pl_PL sk_SK
sl_SI th_TH tr_CY tr_TR
which are the only ones following this model (from cs_CZ):
<U0041> <U0041>;<NONE>;<CAPITAL>;<U0041> # A
<U0061> <U0041>;<NONE>;<SMALL>;<U0041> # a
<U00AA> <U0041>;<NONE>;<U00AA>;<U0041> # ª
<U00C1> <U0041>;<ACUTE>;<CAPITAL>;<U0041> # Á
<U00E1> <U0041>;<ACUTE>;<SMALL>;<U0041> # á
...
<U005A> <U005A>;<NONE>;<CAPITAL>;<U005A> # Z
<U007A> <U005A>;<NONE>;<SMALL>;<U005A> # z
rather than the one in localedata/locales/iso14651_t1_common:
<U0061> <a>;<BAS>;<MIN>;IGNORE # 198 a start lowercase
<U00AA> <a>;<PCL>;<EMI>;IGNORE # 199 ª
<U00E1> <a>;<ACA>;<MIN>;IGNORE # 200 á
...
<U007A> <z>;<BAS>;<MIN>;IGNORE # 507 z
...
<U00FE> <th>;<BAS>;<MIN>;IGNORE # 516 Þ end lowercase
<U0041> <a>;<BAS>;<CAP>;IGNORE # 517 A start uppercase
<U00C1> <a>;<ACA>;<CAP>;IGNORE # 518 Á
...
<U005A> <z>;<BAS>;<CAP>;IGNORE # 813 Z
...
<U00DE> <th>;<BAS>;<CAP>;IGNORE # 824 þ end uppercase
As an aside, the CEO requirement was specifically relaxed in POSIX 2001, so
glibc is insisting on CEO ordering because of a version of POSIX two editions
ago (without documenting it). At the same time, other glibc interfaces no
longer comply with the stricter requirements in older POSIX that have since been
relaxed (for example, whether getopt() must include an error message with
"illegal" in the string). So, there is no reason to tie regex to the older
standard's CEO ordering.
--
Summary: CEO has confusing differences across locales
Product: glibc
Version: 2.12
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo: libc-locales at sources dot redhat dot com
ReportedBy: bonzini at gnu dot org
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=12051
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.