Bug 20756 - [PATCH] Use Unicode wise thousands separator
Summary: [PATCH] Use Unicode wise thousands separator
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: unspecified
: P2 minor
Target Milestone: 2.27
Assignee: Mike FABIAN
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-01 19:52 UTC by Stanislav Brabec
Modified: 2017-08-28 16:29 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2016-11-01 00:00:00
fweimer: security-


Attachments
Proposed changes (3.66 KB, patch)
2016-11-01 19:52 UTC, Stanislav Brabec
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stanislav Brabec 2016-11-01 19:52:26 UTC
Created attachment 9605 [details]
Proposed changes

Many languages use small gap as thousands separator.

Thousands separator should not be a plain space, but a narrow space. And additionally, it is not allowed to wrap number in the middle when wrapping line.

Locale data were created in a deep age of 8-bit encodings, so most of them use space (incorrect: it allows word wrapping in the middle of the number), or NBSP (better, but typographically incorrect: space between group is too wide).

Now unicode is widely supported, so we should leave legacy characters in favor of correct UNICODE character.

UNICODE has a dedicated character for this purpose:

NNBSP
U+202F NARROW NO-BREAK SPACE: a narrow form of a no-break space, typically the width of a thin space or a mid space
Comment 1 Carlos O'Donell 2016-11-01 22:17:49 UTC
(In reply to Stanislav Brabec from comment #0)
> Created attachment 9605 [details]
> Proposed changes
> 
> Many languages use small gap as thousands separator.
> 
> Thousands separator should not be a plain space, but a narrow space. And
> additionally, it is not allowed to wrap number in the middle when wrapping
> line.

Agreed.
 
> Locale data were created in a deep age of 8-bit encodings, so most of them
> use space (incorrect: it allows word wrapping in the middle of the number),
> or NBSP (better, but typographically incorrect: space between group is too
> wide).
> 
> Now unicode is widely supported, so we should leave legacy characters in
> favor of correct UNICODE character.
> 
> UNICODE has a dedicated character for this purpose:
> 
> NNBSP
> U+202F NARROW NO-BREAK SPACE: a narrow form of a no-break space, typically
> the width of a thin space or a mid space

I would support this change.

The NNBSP has been around since Unicode 3.0 so we support it across the board.

Can you please post this to libc-alpha following:
https://sourceware.org/glibc/wiki/Contribution%20checklist

Note that you don't need a copyright assignment for locale data changes as your patch proposes.
Comment 2 Stanislav Brabec 2016-11-02 15:54:34 UTC
Sent to libc-alpha: https://sourceware.org/ml/libc-alpha/2016-11/msg00062.html
Comment 3 Joseph Myers 2017-08-28 16:29:43 UTC
Restoring changes lost in system crash and restore from backup.

https://sourceware.org/ml/glibc-bugs/2017-08/msg00358.html
https://sourceware.org/ml/glibc-bugs/2017-08/msg00359.html