This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCHv4] Expected behaviour for a-z, A-Z, and 0-9 (Bug 23393).

On 07/25/2018 04:31 PM, Florian Weimer wrote:
> On 07/25/2018 10:25 PM, Carlos O'Donell wrote:
>> On 07/25/2018 04:18 PM, Florian Weimer wrote:
>>> On 07/25/2018 05:54 PM, Carlos O'Donell wrote:
>>>> Attaching it as swbz23393v3.tar.gz to avoid spam rejection.
>>> Quick comment.  The middle line here adds trailing whitespace:
>>> -  { "[a-z]|[^a-z]", "\xcb\xa2", REG_EXTENDED, 2,
>>> +
>>> +     The U+02DA RING ABOVE is chosen because it's not in [s-㏜].  */
>> Thanks. I'll fix this with v4.
> I have verified that localedata/locales/iso14651_t1_common is just a reordering (except for the new comments).
> localedata/locales/tr_TR is more complicated, but looks like an order-only change for me too.
>> I had to fix the following locales:
>>     modified:   localedata/locales/ar_SA
>>     modified:   localedata/locales/km_KH
>>     modified:   localedata/locales/lo_LA
>>     modified:   localedata/locales/or_IN
>>     modified:   localedata/locales/sl_SI
>>     modified:   localedata/locales/th_TH
> Do you have the actual locale names handy?  localedata/SUPPORTED contains charsets, but I'm not sure if the translation to locale names is completely regular.

It is completely regular. In that ar_SA => ar_SA.UTF-8. And so forth.

>> They all re-arranged ASCII character collation element ordering like tr_TR,
>> and so they needed manual fixing.
>> Could you please add these locales to your tester?
> I will try.  I already have an xtests part, and these probably need to go there as well.

- Fixed ar_SA, km_KH, lo_LA, or_IN, sl_SI, th_TH.
- Added range checking for a-z, A-Z for all supported UTF-8 locales.

All of my testers are clean.

So the question is now:

Do we commit to rational ranges for a-z, A-Z, 0-9 ... for 2.28.


Do we just do the deinterlacing of iso14651_t1_common to fix en_US.UTF-8?


Attachment: swbz23393v4.tar.gz
Description: application/gzip

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]