This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improved check-localedef script


Luis Javier Merino <ninjalj@gmail.com> さんはかきました:

>> This is the first abmon string:
>>
>>     abmon	"جنوری";/
>>
>> The last letter in this string, ی U+06CC ARABIC LETTER FARSI YEH
>> is not convertible to CP1256.
>>
>> But this letter seems to be really used in writing Urdu, see:
>>
>>     https://en.wikipedia.org/wiki/Urdu_alphabet
>>     https://en.wikipedia.org/wiki/Urdu_alphabet#Ye
>>
>> So I think CP1256 is not a suitable charset to use for Urdu.
>
>
> Note that there is a transliteration rule for that letter:
>
> translit_start
> include "translit_combining";""
>
> % those two lettes are not in cp1256...
>
> % Maddah above -> Alef with madda above
> <U0653> "<U0622>"
> % Farsi yeh -> yeh
> <U06CC> "<U064A>"
>
> translit_end

Yes, this transliterates in an Arabic letter which looks identical
in most cases and that Arabic letter is contained in CP1256.

echo -n ی | iconv -f utf-8 -t cp1256//translit

does not transliterate it though.

>>     https://en.wikipedia.org/wiki/Windows-1256
>>
>> says:
>>
>> Wikipedia> Windows-1256 is a code page used to write Arabic (and possibly
>> some
>>
>> Note the “possibly”.
>>
>> Wikipedia> other languages that use Arabic script, like Persian and Urdu)
>> under
>> Wikipedia> Microsoft Windows.
>> Wikipedia> [...]
>> Wikipedia> Unicode and UTF-8 are preferred to Windows 1256 in modern
>> Wikipedia> applications. 0.1% of all web pages use Windows-1256 in June
>> 2016.
>>
>> So CP1256 doesn’t seem to be used much anymore.
>>
>
> Still, Xorg's locale.alias aliases ur_PK to ur_PK.CP1256:
> https://cgit.freedesktop.org/xorg/lib/libX11/tree/nls/locale.alias.pre#n1121
> , but that line comes straight from 2004:
> https://cgit.freedesktop.org/xorg/lib/libX11/commit/nls/locale.alias.pre?id=c6349f43193b74a3c09945f3093a871b0157ba47

In glibc, we do not have a ur_PK locale using CP1256 encoding:

$ locale -a | grep ^ur
locale -a | grep ^ur
ur_IN
ur_IN.utf8
ur_PK
ur_PK.utf8
$ LC_ALL=ur_PK locale charmap
LC_ALL=ur_PK locale charmap
UTF-8
$

-- 
Mike FABIAN <mfabian@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]