Bug 16777 - pl_PL: incorrect thousands separator in locale
Summary: pl_PL: incorrect thousands separator in locale
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.27
Assignee: Mike FABIAN
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-29 18:29 UTC by Michał Górny
Modified: 2017-10-18 13:43 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
0001-Use-U-202F-NARROW-NO-BREAK-SPACE-as-thousands-separa.patch (883 bytes, patch)
2017-10-18 13:43 UTC, Mike FABIAN
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michał Górny 2014-03-29 18:29:08 UTC
Currently, glibc lists the following for pl_PL locale:

- no thousands separator in LC_NUMERIC section,

- '.' as thousands separator in LC_MONETARY section.

While actually, a non-breaking space is used as thousands separator in both cases. However, I'm not sure if it is correct to use <U00A0> in locales or if <U0020> should be used instead.

It should be also noted that the grouping is used only for numbers having more than 4 digits, i.e. '4000' does not use grouping, '40 000' and '4 000 000' do on 3-digit groups. I don't know if it is possible to express this in current glibc localedata format.

References:

1. opinion given by a linguist:
   http://poradnia.pwn.pl/lista.php?id=9842

2. EU publication guidelines:
   http://publications.europa.eu/code/pl/pl-360500.htm

3. example law act, having numbers e.g. on page 111 (PDF):
   http://isip.sejm.gov.pl/Download?id=WDU19910800350&type=3
Comment 1 Marko Myllynen 2014-03-31 07:10:18 UTC
(In reply to Michał Górny from comment #0)
> Currently, glibc lists the following for pl_PL locale:
> 
> - no thousands separator in LC_NUMERIC section,
> 
> - '.' as thousands separator in LC_MONETARY section.
> 
> While actually, a non-breaking space is used as thousands separator in both
> cases. However, I'm not sure if it is correct to use <U00A0> in locales or
> if <U0020> should be used instead.

<00A0> should be used in this case, as do many other locales as well.

> It should be also noted that the grouping is used only for numbers having
> more than 4 digits, i.e. '4000' does not use grouping, '40 000' and '4 000
> 000' do on 3-digit groups. I don't know if it is possible to express this in
> current glibc localedata format.

Perhaps you want to see whether that's doable, please see the Locales wiki page which has a link to a page describing LC_NUMERIC and grouping in detail. The page contains also information on how to test and submit your locale changes.

Thanks.
Comment 2 keld@keldix.com 2014-03-31 10:45:14 UTC
On Mon, Mar 31, 2014 at 07:10:18AM +0000, myllynen at redhat dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=16777
> 
> Marko Myllynen <myllynen at redhat dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |myllynen at redhat dot com
> 
> --- Comment #1 from Marko Myllynen <myllynen at redhat dot com> ---
> (In reply to Micha?? Górny from comment #0)
> > Currently, glibc lists the following for pl_PL locale:
> > 
> > - no thousands separator in LC_NUMERIC section,
> > 
> > - '.' as thousands separator in LC_MONETARY section.
> > 
> > While actually, a non-breaking space is used as thousands separator in both
> > cases. However, I'm not sure if it is correct to use <U00A0> in locales or
> > if <U0020> should be used instead.
> 
> <00A0> should be used in this case, as do many other locales as well.
> 
> > It should be also noted that the grouping is used only for numbers having
> > more than 4 digits, i.e. '4000' does not use grouping, '40 000' and '4 000
> > 000' do on 3-digit groups. I don't know if it is possible to express this in
> > current glibc localedata format.
> 
> Perhaps you want to see whether that's doable, please see the Locales wiki page
> which has a link to a page describing LC_NUMERIC and grouping in detail. The
> page contains also information on how to test and submit your locale changes.

I do not think the "under 10000" is doable with current glibc functionality.

furthermore I think this is not desirable, for LC_MONETARY.

The no break space versus the COMMA/PERIOD thousands separator issue
is a classical problem between linguistic and computer use.
Linguists tends in many languages, European style, to advocate no break space,
but for computer use a period is most often used. I have yet to see a financial
application to use no break space. Bank applications and ledger applications do not
use no break space.

Actually we shoud probably provide for both styles, and also do the "no separator
for under 10000.". I would then advise that we use the current functionality
in the POSIX style, and compatible with the POSIX/C locale, and then have new keywords
both in LC_MONETARY and LC_NUMERIC for the linguistic styles.

Best regards
Keld
Comment 3 cvs-commit@gcc.gnu.org 2017-10-18 13:40:36 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  508b1e71a37355839ab91f9c09ce7e577cf69a58 (commit)
      from  2c2245b92ccf6344b324d17d8f94ccd3b8c559c6 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=508b1e71a37355839ab91f9c09ce7e577cf69a58

commit 508b1e71a37355839ab91f9c09ce7e577cf69a58
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Wed Oct 18 14:27:44 2017 +0200

    Use U+202F NARROW NO-BREAK SPACE as thousands separators in pl_PL locale [BZ #16777]
    
    	[BZ #16777]
    	* localedata/locales/pl_PL (LC_MONETARY): Use U+202F as mon_thousands_sep
    	and improve readability by using more ASCII.
    	* localedata/locales/pl_PL (LC_NUMERIC): Use U+202F as thousands_sep
    	and improve readability by using more ASCII.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                |    8 ++++++++
 localedata/locales/pl_PL |   16 ++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)
Comment 4 Mike FABIAN 2017-10-18 13:41:19 UTC
Fixed in glibc master.
Comment 5 Mike FABIAN 2017-10-18 13:43:04 UTC
Created attachment 10537 [details]
0001-Use-U-202F-NARROW-NO-BREAK-SPACE-as-thousands-separa.patch

The patch I used to fix the problem.