Bug 3156

Summary: LC_TIME for pl_PL doesn't match standard usage
Product: glibc Reporter: Piotr Engelking <inkerman42>
Component: localedataAssignee: GNU C Library Locale Maintainers <libc-locales>
Status: RESOLVED WONTFIX    
Severity: normal CC: glibc-bugs, tomek
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: LC_TIME fixes for pl_PL locale

Description Piotr Engelking 2006-08-31 01:47:16 UTC
Currently, glibc displays dates in the pl_PL locale as:

pon sie  6 01:23:45 CEST 1984

This format violates several conventions for date abbreviations in the Polish
language. I include a patch against the current CVS localedata with the
following changes:

* non-standard weekday abbreviations are replaced with standard ones
* non-standard month abbreviations are replaced with standard ones (based on
Roman numerals)
* middle-endian format (never used in Poland) is replaced with the little-endian
one (by far the most popular)
* standard padding is introduced, i.e. h:m:s are zero-padded, day of the month
is not padded
* fields are properly separated

With the patch, dates are displayed as:

Pn, 6 VIII 1984, 01:23:45 CEST

which matches the most common usage.

Please notice that the abbreviations are no longer fixed-width. Since this is
also the case in several other locales, I suppose it is not a problem.
Comment 1 Piotr Engelking 2006-08-31 01:49:16 UTC
Created attachment 1267 [details]
LC_TIME fixes for pl_PL locale
Comment 2 keld@dkuug.dk 2006-08-31 17:20:11 UTC
Subject: Re:  New: LC_TIME for pl_PL doesn't match standard usage

On Thu, Aug 31, 2006 at 01:47:16AM -0000, inkerman42 at gmail dot com wrote:
> Currently, glibc displays dates in the pl_PL locale as:
> 
> pon sie  6 01:23:45 CEST 1984
> 
> This format violates several conventions for date abbreviations in the Polish
> language. I include a patch against the current CVS localedata with the
> following changes:
> 
> * non-standard weekday abbreviations are replaced with standard ones
> * non-standard month abbreviations are replaced with standard ones (based on
> Roman numerals)
> * middle-endian format (never used in Poland) is replaced with the little-endian
> one (by far the most popular)
> * standard padding is introduced, i.e. h:m:s are zero-padded, day of the month
> is not padded
> * fields are properly separated
> 
> With the patch, dates are displayed as:
> 
> Pn, 6 VIII 1984, 01:23:45 CEST
> 
> which matches the most common usage.

well, you could use that for the long format, but it seems not
convenient for the short (abbreviated) format. Both day names and month
names are variable length.

My understanding is also that day and month names in Polish are spelled
with small initial letters.


> Please notice that the abbreviations are no longer fixed-width. Since this is
> also the case in several other locales, I suppose it is not a problem.

The recommendation is that the abbreviated format be fixed
format/lenght, as this is intended to be used in log messages.

best regards
Keld
Comment 3 Ulrich Drepper 2006-09-09 16:27:23 UTC
You have to provide evidence.  Provide URLs of official documents, railway
publications, newspapers.
Comment 4 Piotr Engelking 2006-09-11 18:42:47 UTC
[Sorry for delay, I have been on vacation for the last few days.]

First, some background to answer Mr. Drepper's and Mr. Simonsen's questions.


Weekdays:

Weekday abbreviations are not part of any official standard. They ones described
above are, however, used nearly universally in calendars.

Examples of use:

* http://kalendarz.pwn.pl/ [calendar of PWN (Polish Scientific Publishers),
publisher of the largest and most authoritative Polish-language encyclopedias
and dictionaries]
* http://lot.pl/ [timetable of LOT, the largest Polish airline]

Please note that these abbreviations can only appear standalone or as part of a
standalone date (and yes, while Polish weekday names are lowercase, they are
not). To abbreviate weekday names in an intertextual context (which would be
quite uncommon), one would have to use an ad hoc abbreviation following standard
rules, i.e. match the case of the word, end with a consonant, and be followed by
a dot, e.g. 'poniedziaƂek' could be abbreviated as 'pn.', 'pon.', or 'poniedz.'.


Date:

Modern dictionaries of Polish language allow the following date abbreviations:

* 6 VIII 1984 (older dictionaries also allowed 6.VIII.1984)
* 6.8.1984, or 6.08.1984, or 06.08.1984

The use of other abbreviations (such as 1984.08.06) is explicitly discouraged,
unless neccessitated by specialized data processing requirements.

Online reference:

* http://so.pwn.pl/zasady.php?id=629747 [ortographical dictionary of PWN]

Examples of use:

* http://www.senat.gov.pl/senatrp/noty/dzieje.pdf [history of Senat, upper
chamber of the Polish parlament)]
* http://edukacja.sejm.gov.pl/historia_sejmu/ [history of Sejm, lower chamber of
the Polish parlament)]
* http://rjp.pl/?mod=uchwaly&id=2 [resolutions of the Polish Language Council,
official standarizing organization for the Polish language]
* http://intercity.pl/scripts/train/index.php?action=train_list [timetable of
PKP, the largest Polish railway company]
* http://lot.pl/ [timetable of LOT]


> The recommendation is that the abbreviated format be fixed
> format/lenght, as this is intended to be used in log messages.

Ah, I wasn't aware of this recommendation. Perhaps it might be a good idea to
document it somewhere? Is there some particular reason why so many locales don't
follow it?

Which formats exactly should be fixed-width? For d_fmt, there should be no
problem. Weekday abbreviations can be made fixed-width as well, by using a
variant with N replaced with Nd. And while it isn't exactly common to mix
weekday abbreviations and numeric date format, I guess it can be done, too. How
about date_fmt? It's not fixed-width in the POSIX locale, either.
Comment 5 Piotr Engelking 2007-02-02 13:13:04 UTC
Has there been any activity on this bug recently? There has been no comment from
any of the developers for several months. I had submitted the requested
references, is that enough? My question about the formats, and which of them, if
any, should be changed to fixed-width hasn't been answered, either. Is there
anything else I can do to help?
Comment 6 Piotr Engelking 2007-02-17 17:13:47 UTC
The weekday abbreviations proposed by the above patch seem also to be identical
with ones used by Windows. (This is what the gtk2 calendar widget uses on
Windows XP, at least).
Comment 7 Ulrich Drepper 2007-02-17 19:37:59 UTC
I applied the patch.
Comment 8 Tomasz Kepczynski 2007-07-09 21:29:37 UTC
This change introduced a bug with strptime function.
D_FMT format is set to "%-d %b %Y" and strptime
function does not seem to support "-" in "%-d"
format specifier. This leads to a problem described
here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=243513
Comment 9 Piotr Engelking 2007-07-10 20:56:34 UTC
There are, indeed, two bugs in strptime() which prevent it from parsing d_fmt
correctly. Filed as:

* http://sources.redhat.com/bugzilla/show_bug.cgi?id=4772
* http://sources.redhat.com/bugzilla/show_bug.cgi?id=4773
Comment 10 Dominik 'Rathann' Mierzejewski 2007-09-18 16:25:48 UTC
There's significant uproar due to the month abbreviations change to roman
numerals. Though theoretically correct, it is not acceptable to the community at
large, and there are standards and expert opinions in favour of three-letter
abbreviations. Weekday abbreviations and field separator changes are under
debate, too. See bug 4789 for more details.
Comment 11 Zbigniew Luszpinski 2007-09-25 19:32:45 UTC
Please revert pl_PL-LC_TIME.patch ASAP. Someone made a joke on you. This archaic
date stamping using roman month numbers was never officially approved, never in
widespread use. It is only used if author of a document wanted to gain artistic
effect. That is why roman numbering of months is usually found in documents
together with Gothic fonts and Anno Domini A.D. prefix before date.

I'm sure glibc maintainers implemented this patch having good will in mind. Next
time before applying such patch please ask Polish Linux translation teams if
future localization patches are not a hostile joke.

This patch make Polish Linux community suffer (broken apps, logs, errors in data
processing). This patch is proof of concept how big negative impact can have
single person on whole community.

This patch breaks national standards:
PN-EN 28601:2002 which is the same as ISO 8601:2004
http://www.pkn.pl/index.php?a=show&m=katalog&id=463318&page=1
here is free access via wikipedia:
http://en.wikipedia.org/wiki/ISO_8601

The PN-EN 28601:2002 (ISO 8601:2004) is required format for data sorting in data
processing machines. Example: 2007-09-24

In real life (government documents, administration, commerce) for comfortable
user view dd-MM-CCYY format is used. Where:
dd - day in two digits format
MM - month in two digits format
CC - century in two digits format
YY - year in two digits format
Example: 24-09-2007
On paper documents instead of dash "-" separator a "." dot can be found. However
because of growing computerization of country administration and increasing
number of personal computer users dash becomes more popular over dot because it
is better visible on screen and after printing.

To all maintainers (no matter if it is glibc or other project): Please always
verify all localization patches by sending them (in human understandable format)
to official translation teams of a given Language/Region. This will keep Linux
safe and not compromised.
Comment 12 Julian Sikorski 2007-09-26 16:09:13 UTC
How about something like that:
Pon, 6 sie 1984 01:23:45 CEST
Looks nicer, has non-broken endianess and fixed-width names.
Comment 13 Marcin Gil 2007-10-04 11:42:11 UTC
ISO is an already established, widely used, adopted by PN (Polska Norma/Polish
Norm) standard. Using everything else, like fancy roman formatting, is useless
and troublesome. 
Comment 14 Christopher Yeleighton 2008-03-19 09:43:17 UTC
(In reply to comment #11)
> PN-EN 28601:2002 which is the same as ISO 8601:2004
> http://www.pkn.pl/index.php?a=show&m=katalog&id=463318&page=1

Here is something for you to criticise:
<http://en.wikipedia.org/wiki/Date_and_time_notation_by_country#Poland>
Comment 15 Jackie Rosen 2014-02-16 17:44:48 UTC Comment hidden (spam)