This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Locales: Use CLDR matching thousands separator
- From: Florian Weimer <fw at deneb dot enyo dot de>
- To: Marko Myllynen <myllynen at redhat dot com>
- Cc: Rafal Luzynski <digitalfreak at lingonborough dot com>, GNU C Library <libc-alpha at sourceware dot org>, Carlos O'Donell <carlos at redhat dot com>
- Date: Mon, 08 Oct 2018 20:51:56 +0200
- Subject: Re: [PATCH] Locales: Use CLDR matching thousands separator
- References: <eb1814b5-cae3-8472-ece6-44bec12d570b@redhat.com> <a2a29fbe-6872-c123-d4d0-2b8664825e72@redhat.com> <1786676151.161483.1534532463077@poczta.nazwa.pl> <9848a4de-2b6e-895b-d601-1358b79ef9f9@redhat.com> <414408229.228264.1534887501434@poczta.nazwa.pl> <349a70bd-92ba-e32f-e396-e4e595994029@redhat.com>
* Marko Myllynen:
> One perhaps related thing I noticed recently was that neither U+00A0 or
> U+202F are classified as whitespace characters. locales/i18n_ctype has
> this definition (based on ISO/IEC 30112, see
> http://www.open-std.org/jtc1/sc35/wg5/docs/30112d10.pdf document page 30):
>
> space /
> <U0009>..<U000D>;<U0020>;<U1680>;<U2000>..<U2006>;<U2008>..<U200A>;/
> <U2028>..<U2029>;<U205F>;<U3000>
>
> Looking at pages about whitespace characters
> (https://en.wikipedia.org/wiki/Whitespace_character) and Unicode spaces
> (http://jkorpela.fi/chars/spaces.html) it seems that a couple of other
> Unicode space characters are also omitted from that list.
>
> Does anyone know is there a particular reason to omit U+00A0 and U+202F
> and few others from the above classification?
I think it is deliberate to get the right behavior from line-breaking
algorithms.