Consider this program: ============= #include <stdio.h> #include <locale.h> #include <string.h> #include <malloc.h> void ps(const char *a) { size_t s; unsigned char *b; int i; s = strxfrm(NULL, a, 0); b = malloc(s+1); strxfrm((void *)b, a, s+1); for (i = 0; i <= s; i++) printf("%u ", (unsigned)b[i]); printf("\n"); } int main(void) { ps("퍼"); ps("흐"); setlocale(LC_COLLATE, ""); ps("퍼"); ps("흐"); } ============= On systems with LANG=en_US.UTF-8 the output is ============= 237 141 188 0 237 157 144 0 1 1 1 1 194 182 1 194 182 1 194 182 0 1 1 1 1 194 182 1 194 182 1 194 182 0 ============= The output after setlocale(LC_COLLATE, "") is completely nonsensical. Similar useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale. This can be circumvented by adding the following code to iso14651_t1: ============= script <HANGUL> order_start <HANGUL>;forward;forward;forward;forward,position <UAC00> <UAC00>;IGNORE;IGNORE;IGNORE .. ..;IGNORE;IGNORE;IGNORE <UD7A3> <UD7A3>;IGNORE;IGNORE;IGNORE # order_end # ============= Right below a very similar workaround...
Please see bug 18927 as well.