Bug 16527 - strxfrm & strcoll broken with Hangul & en_US.UTF-8
Summary: strxfrm & strcoll broken with Hangul & en_US.UTF-8
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.18
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-04 21:13 UTC by ju.orth+sourceware
Modified: 2017-10-21 08:26 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ju.orth+sourceware 2014-02-04 21:13:45 UTC
Consider this program:

=============
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <malloc.h>
 
void ps(const char *a)
{
	size_t s;
	unsigned char *b;
	int i;
 
	s = strxfrm(NULL, a, 0);
	b = malloc(s+1);
	strxfrm((void *)b, a, s+1);
	for (i = 0; i <= s; i++)
		printf("%u ", (unsigned)b[i]);
	printf("\n");
}
 
int main(void)
{
	ps("퍼");
	ps("흐");
 
	setlocale(LC_COLLATE, "");
 
	ps("퍼");
	ps("흐");
}
=============

On systems with LANG=en_US.UTF-8 the output is

=============
237 141 188 0 
237 157 144 0 
1 1 1 1 194 182 1 194 182 1 194 182 0 
1 1 1 1 194 182 1 194 182 1 194 182 0 
=============

The output after setlocale(LC_COLLATE, "") is completely nonsensical. Similar useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale.

This can be circumvented by adding the following code to iso14651_t1:

=============
script <HANGUL>

order_start <HANGUL>;forward;forward;forward;forward,position
<UAC00> <UAC00>;IGNORE;IGNORE;IGNORE
.. ..;IGNORE;IGNORE;IGNORE
<UD7A3> <UD7A3>;IGNORE;IGNORE;IGNORE
#
order_end
#
=============

Right below a very similar workaround...
Comment 1 Egmont Koblinger 2015-09-06 22:22:58 UTC
Please see bug 18927 as well.