http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html says: opengroup> Collation Order opengroup> opengroup> [...] opengroup> opengroup> The symbol UNDEFINED shall be interpreted as including all opengroup> coded character set values not specified explicitly or via opengroup> the ellipsis symbol. Such characters shall be inserted in opengroup> the character collation order at the point indicated by the opengroup> symbol, and in ascending order according to their coded opengroup> character set values. If no UNDEFINED symbol is specified, opengroup> and the current coded character set contains characters not opengroup> specified in this section, the utility shall issue a opengroup> warning message and place such characters at the end of the opengroup> character collation order. Unfortunatly it does not work like that in glibc. For example: The Japanese locale source file /usr/share/i18n/locales/ja_JP has this in the LC_COLLATE section: mfabian@ari:/usr/share/i18n/locales $ grep -A 8 ^LC_COLLATE ja_JP LC_COLLATE order_start forward % % C0 % <U0000> <U0001> <U0002> <U0003> mfabian@ari:/usr/share/i18n/locales $ grep -B 8 '^END LC_COLLATE' ja_JP <U9F97> <U9F9E> <U9FA1> <U9FA2> <U9FA3> <U9FA5> UNDEFINED order_end END LC_COLLATE mfabian@ari:/usr/share/i18n/locales $ I.e. it includes the “UNDEFINED” collation symbol at the end. Now if I choose a character which is *not* specified in the LC_COLLATE section, neither explicitly nor via the ellipsis for example: ⅞ U+215E VULGAR FRACTION SEVEN EIGHTHS and check how it sorts, I find: mfabian@ari:~/testdir $ LANG=ja_JP.UTF-8 ls ⅞ A B C D O U Z a b c d o u z Þ æ đ ı ß İ ä ö ü mfabian@ari:~/testdir $ I.e. it sorts at the beginning, not at the end (the other non-ASCII characters in that sort example *are* explicitly specified in the sort order, that’s why they appear after “z” which is how it is specified). To test this further, I created my own variant of /usr/share/i18n/locales/POSIX by removing the LC_COLLATE # This is the POSIX Locale definition for the LC_COLLATE category. # The order is the same as in the ASCII code set. order_start forward <U0000> <U0001> normal stuff here modified part follows: <U0040> <- @ <U0044> <- D (moved here make sure I am really using my modified locale) <U0041> <- A <U0043> <- C UNDEFINED <- B is *not* specified any more! Therefore it should go here! <U0045> <- E <U0046> <- F more normal stuff here <U007E> <U007F> order_end # END LC_COLLATE And when testing this (I installed this modified POSIX locale using localedef under the name "POSIXMIKE"): mfabian@ari:~/testdir $ LANG=POSIXMIKE ls B ?? ?? ?? ?? ?? ?? ?? ?? ?? ??? D A C O U Z a b c d o u z mfabian@ari:~/testdir $ So the now unspecified “B” is sorted at the beginning and *not* after “C” where the “UNDEFINED” collation symbol is.