Bug 1354 - Problem in Turkish Locale Data (tr_TR.UTF-8 , Unicode)
Summary: Problem in Turkish Locale Data (tr_TR.UTF-8 , Unicode)
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.3.4
: P2 normal
Target Milestone: ---
Assignee: GNU C Library Locale Maintainers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-19 08:25 UTC by Devrim GUNDUZ
Modified: 2018-04-19 14:35 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Devrim GUNDUZ 2005-09-19 08:25:32 UTC
Hi,

We are experiencing problems in Turkish locale. 

$ cat /etc/redhat-release
Red Hat Enterprise Linux ES release 4 (Nahant Update 1)
$ rpm -qv glibc
glibc-2.3.4-2.9

Here is a short description of our problem:

==========================================================
test=# SELECT * from unicode_test WHERE a ILIKE 'ö%';
 a
----
 ös
(1 row)

test=# SELECT * from unicode_test WHERE a ILIKE 'Ö%';
 a
---
(0 rows)

test=#
==========================================================

Now details:

In Turkish, we have a special letter called "ö" (o with dots on it). The capital
of it is also "Ö" (O (capital o) with dots on it. Same stands for i and İ, ı and I.

I'm using PostgreSQL 8.0.3. PostgreSQL relies on the operating system for string
operations. So if there is something wrong in glibc, PostgreSQL also fails on
that locale.

As you can see in the short description, the ILIKE (Incasensitive LIKE) cannot
find the correct result. The same problem stands for i and i dotless (İ and I,
respectively). Both database queries should return the same result.

I hope this can be fixed in the next release of glibc.
Comment 1 Jakub Jelinek 2005-09-19 08:38:28 UTC
This is very likely an application bug (postgresql in this case), you need
to sort it out there, not here.  In case it would be a glibc bug
(unlikely, e.g. towlower/towupper etc. are known to work just fine
with Turkish dotless i/I and i/I with dot above), this would still be wrong
bugreport here.  For a bugreport here, you need to provide a self-contained
testcase that uses just glibc and shows the bug, or show say in ltrace
what calls return incorrect values.  Otherwise everybody could claim something
is a glibc bug and we'd have to debug all application bugs just in case
they might be glibc bugs.

Especially with Turkish i/I and case insensitivity where UTF-8 representation
is one byte for one case and 2 byte for the other case (and one is ASCII, while
the other is not), really many application don't handle this well.