This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: de_DE has been using the wrong group separator for over 18 years

From: kdex <kdex at kdex dot de>
To: libc-alpha at sourceware dot org
Date: Wed, 18 Apr 2018 10:30:48 +0200
Subject: Re: de_DE has been using the wrong group separator for over 18 years
References: <7224816.qpMlRvYOtE@punchy> <3e1607ab-e44e-9b28-5fd2-541b3313906d@redhat.com>

On Wednesday, April 18, 2018 9:14:45 AM CEST Florian Weimer wrote:
> On 04/18/2018 12:24 AM, kdex wrote:
> > To give some context: I have previously posted the following on
> > libc-locales and was asked to bring this to the attention of senior
> > developers on this least who speak German.
> > 
> > I have noticed that the locale `de_DE` has erroneously been using a full
> > stop (U+002E) for the thousands (group) separator in `mon_thousands_sep`
> > and `thousands_sep` ever since 2000. The usage of a full stop to group
> > thousands has (to my knowledge) has never been standardized.
> > 
> > As per DIN 1333, DIN 5008, and DIN EN ISO 80000, the separator should have
> > been a thin space (U+2009).
> > 
> > In fact, DIN 1333 even explicitly forbids the usage of U+002E to group
> > thousands, and DIN EN ISO 80000 explicitly excludes all other characters
> > than a thin space.
> 
> These standards are simply not universally used.  They aren't exactly
> wrong, either, because some typesetters actually use a (thin) space.
> It's just that adoption is poor.
> 
> U+002E is perfectly acceptable and widely used, especially if U+2009 is
> not available (and U+0020 risks introducing a line break).  Here's a
> recent example:
> 
> »Die Finanzkontrolle Schwarzarbeit überprüfte im Jahr 2017 mehr als
> 52.000 Arbeitgeber und leitete fast 108.000 Strafverfahren ein. Die
> Anzahl der eingeleiteten Ermittlungsverfahren wegen der Nichtgewährung
> des gesetzlichen Mindestlohns nach dem Mindestlohngesetz stieg auf 2.522
> Verfahren (2016: 1.651; 2015: 705).«
> 
> <https://www.bundesfinanzministerium.de/Content/DE/Pressemitteilungen/Finanz
> politik/2018/04/2018-04-17-ZJPK.html>
While the Federal Ministry of Finance may be an interesting (or even ironic) 
source to point out, it is in no way normative, and their website is mostly 
subject to their team of web developers.

Note that according to DIN 5008, amounts of money should, for security 
purposes, indeed be grouped with periods, so leaving `mon_thousands_sep` as-is 
would still allow for standards-compliance. Amounts of money are also covered 
by the three norms I've brought up before:

[…] Aus diesem Grund sehen Normen die Verwendung eines Leerzeichens als 
Tausendertrennzeichen vor (DIN 1333, DIN 5008 und ISO 80000). Dabei wird ein 
schmales Leerzeichen empfohlen, falls dieses technisch verfügbar ist. Eine 
Ausnahme bilden Geldbeträge, die aus Sicherheitsgründen mit dem Leerzeichen, 
das mindestens die Breite einer der Ziffern hat, oder einem Trennzeichen (wie 
dem Punkt) getrennt werden können. [2]

> 
> (Also look at the date at the top of the page—it doesn't follow DIN ISO
> 8601, either.)
That's unfortunate; but the paragraph about normativity above would apply 
here, too.

Duden, the German approach to these matters (generally considered relatively 
normative among Germans), adheres to the ISO norms as well [1], which should 
speak for itself.

It's simple enough to find instances of German articles about finances that 
try to use spaces (admittedly the wrong ones) as separators as well, see [3].
> 
> I don't think the locales need to change.  Using characters from the
> ASCII range for printing numbers has its advantages.
I don't think this premise is correct: In de_DE, amounts of money include 
`currency_symbol` (U+20AC), which is not in the ASCII range. ps_AF uses U+066C 
for `thousands_sep`, fa_IR uses U+002C and es_MX even uses U+2009 (the very 
same character that this thread is about). None of these are in the ASCII 
range; so why should we treat de_DE like a special case? It's much easier to 
be standards-compliant here and get used to the fact that numbers do generally 
contain non-ASCII characters that a parser could just skip over.
> 
> Thanks,
> Florian
[1] https://www.duden.de/sprachwissen/rechtschreibregeln/zahlen-und-ziffern
[2] https://de.wikipedia.org/wiki/
Zifferngruppierung#Zur_Problematik_von_Punkt_und_Komma_f%C3%BCr_Tausender-
_und_Dezimaltrennzeichen
[3] https://www.finanzen.ch/nachrichten/aktien/Aktien-Frankfurt-Eroeffnung-Dax-schiebt-sich-wieder-ueber-12-600-Punkte-1021483710

Attachment: signature.asc
Description: This is a digitally signed message part.

Follow-Ups:
- Re: de_DE has been using the wrong group separator for over 18 years
  - From: Florian Weimer
- Re: de_DE has been using the wrong group separator for over 18 years
  - From: Rafal Luzynski

References:
- de_DE has been using the wrong group separator for over 18 years
  - From: kdex
- Re: de_DE has been using the wrong group separator for over 18 years
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]