Summary: | LC_TIME for pl_PL doesn't match standard usage | ||
---|---|---|---|
Product: | glibc | Reporter: | Piotr Engelking <inkerman42> |
Component: | localedata | Assignee: | GNU C Library Locale Maintainers <libc-locales> |
Status: | RESOLVED WONTFIX | ||
Severity: | normal | CC: | glibc-bugs, tomek |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | LC_TIME fixes for pl_PL locale |
Description
Piotr Engelking
2006-08-31 01:47:16 UTC
Created attachment 1267 [details]
LC_TIME fixes for pl_PL locale
Subject: Re: New: LC_TIME for pl_PL doesn't match standard usage On Thu, Aug 31, 2006 at 01:47:16AM -0000, inkerman42 at gmail dot com wrote: > Currently, glibc displays dates in the pl_PL locale as: > > pon sie 6 01:23:45 CEST 1984 > > This format violates several conventions for date abbreviations in the Polish > language. I include a patch against the current CVS localedata with the > following changes: > > * non-standard weekday abbreviations are replaced with standard ones > * non-standard month abbreviations are replaced with standard ones (based on > Roman numerals) > * middle-endian format (never used in Poland) is replaced with the little-endian > one (by far the most popular) > * standard padding is introduced, i.e. h:m:s are zero-padded, day of the month > is not padded > * fields are properly separated > > With the patch, dates are displayed as: > > Pn, 6 VIII 1984, 01:23:45 CEST > > which matches the most common usage. well, you could use that for the long format, but it seems not convenient for the short (abbreviated) format. Both day names and month names are variable length. My understanding is also that day and month names in Polish are spelled with small initial letters. > Please notice that the abbreviations are no longer fixed-width. Since this is > also the case in several other locales, I suppose it is not a problem. The recommendation is that the abbreviated format be fixed format/lenght, as this is intended to be used in log messages. best regards Keld You have to provide evidence. Provide URLs of official documents, railway publications, newspapers. [Sorry for delay, I have been on vacation for the last few days.] First, some background to answer Mr. Drepper's and Mr. Simonsen's questions. Weekdays: Weekday abbreviations are not part of any official standard. They ones described above are, however, used nearly universally in calendars. Examples of use: * http://kalendarz.pwn.pl/ [calendar of PWN (Polish Scientific Publishers), publisher of the largest and most authoritative Polish-language encyclopedias and dictionaries] * http://lot.pl/ [timetable of LOT, the largest Polish airline] Please note that these abbreviations can only appear standalone or as part of a standalone date (and yes, while Polish weekday names are lowercase, they are not). To abbreviate weekday names in an intertextual context (which would be quite uncommon), one would have to use an ad hoc abbreviation following standard rules, i.e. match the case of the word, end with a consonant, and be followed by a dot, e.g. 'poniedziaĆek' could be abbreviated as 'pn.', 'pon.', or 'poniedz.'. Date: Modern dictionaries of Polish language allow the following date abbreviations: * 6 VIII 1984 (older dictionaries also allowed 6.VIII.1984) * 6.8.1984, or 6.08.1984, or 06.08.1984 The use of other abbreviations (such as 1984.08.06) is explicitly discouraged, unless neccessitated by specialized data processing requirements. Online reference: * http://so.pwn.pl/zasady.php?id=629747 [ortographical dictionary of PWN] Examples of use: * http://www.senat.gov.pl/senatrp/noty/dzieje.pdf [history of Senat, upper chamber of the Polish parlament)] * http://edukacja.sejm.gov.pl/historia_sejmu/ [history of Sejm, lower chamber of the Polish parlament)] * http://rjp.pl/?mod=uchwaly&id=2 [resolutions of the Polish Language Council, official standarizing organization for the Polish language] * http://intercity.pl/scripts/train/index.php?action=train_list [timetable of PKP, the largest Polish railway company] * http://lot.pl/ [timetable of LOT] > The recommendation is that the abbreviated format be fixed > format/lenght, as this is intended to be used in log messages. Ah, I wasn't aware of this recommendation. Perhaps it might be a good idea to document it somewhere? Is there some particular reason why so many locales don't follow it? Which formats exactly should be fixed-width? For d_fmt, there should be no problem. Weekday abbreviations can be made fixed-width as well, by using a variant with N replaced with Nd. And while it isn't exactly common to mix weekday abbreviations and numeric date format, I guess it can be done, too. How about date_fmt? It's not fixed-width in the POSIX locale, either. Has there been any activity on this bug recently? There has been no comment from any of the developers for several months. I had submitted the requested references, is that enough? My question about the formats, and which of them, if any, should be changed to fixed-width hasn't been answered, either. Is there anything else I can do to help? The weekday abbreviations proposed by the above patch seem also to be identical with ones used by Windows. (This is what the gtk2 calendar widget uses on Windows XP, at least). I applied the patch. This change introduced a bug with strptime function. D_FMT format is set to "%-d %b %Y" and strptime function does not seem to support "-" in "%-d" format specifier. This leads to a problem described here: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=243513 There are, indeed, two bugs in strptime() which prevent it from parsing d_fmt correctly. Filed as: * http://sources.redhat.com/bugzilla/show_bug.cgi?id=4772 * http://sources.redhat.com/bugzilla/show_bug.cgi?id=4773 There's significant uproar due to the month abbreviations change to roman numerals. Though theoretically correct, it is not acceptable to the community at large, and there are standards and expert opinions in favour of three-letter abbreviations. Weekday abbreviations and field separator changes are under debate, too. See bug 4789 for more details. Please revert pl_PL-LC_TIME.patch ASAP. Someone made a joke on you. This archaic date stamping using roman month numbers was never officially approved, never in widespread use. It is only used if author of a document wanted to gain artistic effect. That is why roman numbering of months is usually found in documents together with Gothic fonts and Anno Domini A.D. prefix before date. I'm sure glibc maintainers implemented this patch having good will in mind. Next time before applying such patch please ask Polish Linux translation teams if future localization patches are not a hostile joke. This patch make Polish Linux community suffer (broken apps, logs, errors in data processing). This patch is proof of concept how big negative impact can have single person on whole community. This patch breaks national standards: PN-EN 28601:2002 which is the same as ISO 8601:2004 http://www.pkn.pl/index.php?a=show&m=katalog&id=463318&page=1 here is free access via wikipedia: http://en.wikipedia.org/wiki/ISO_8601 The PN-EN 28601:2002 (ISO 8601:2004) is required format for data sorting in data processing machines. Example: 2007-09-24 In real life (government documents, administration, commerce) for comfortable user view dd-MM-CCYY format is used. Where: dd - day in two digits format MM - month in two digits format CC - century in two digits format YY - year in two digits format Example: 24-09-2007 On paper documents instead of dash "-" separator a "." dot can be found. However because of growing computerization of country administration and increasing number of personal computer users dash becomes more popular over dot because it is better visible on screen and after printing. To all maintainers (no matter if it is glibc or other project): Please always verify all localization patches by sending them (in human understandable format) to official translation teams of a given Language/Region. This will keep Linux safe and not compromised. How about something like that: Pon, 6 sie 1984 01:23:45 CEST Looks nicer, has non-broken endianess and fixed-width names. ISO is an already established, widely used, adopted by PN (Polska Norma/Polish Norm) standard. Using everything else, like fancy roman formatting, is useless and troublesome. (In reply to comment #11) > PN-EN 28601:2002 which is the same as ISO 8601:2004 > http://www.pkn.pl/index.php?a=show&m=katalog&id=463318&page=1 Here is something for you to criticise: <http://en.wikipedia.org/wiki/Date_and_time_notation_by_country#Poland> *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla. |