---------------- HISTORY: ---------------- hr_HR locale started out as a copy of sl_SI locale in glibc-2.0 and was maintained by Borka Jerman-Blažič (from Slovenia), shortly afterwards Tomislav Vujec (then at CARNet, now in RedHat) changed it to suite hr specific changes. After around 1998, that locale was only updated by glibc maintainer Ulrich Drapper who added or changed portions of it as mass updates to many locales. I have contacted current maintainer, Tomislav Vujec, last week and he is willing to support changes. Also, since it's been more then a decade since he did changes to this locale, he noted that he'd be willing to pass maintainership to someone else. BTW, he is also maintainer of bs_BA, I hope Bosnian translation team will take over maintenance of that locale... ---------------- RATIONALE: ---------------- The point is: hr_HR locale is now in a state of flux. It kind of works and fails in fairly subtle ways when sorting digraphs. I have made numerous changes which I'll describe below... Croatia doesn't have language law or any real specification of the language rules for writing dates, monetary data and so on. Most of the language decisions in real life are made using common established conventions. I'll rationale my decisions in my change descriptions below, using URLs where needed... I really wanted to make this right, so I've read all of the archive of libc-locale mailing list (2004-now), and also ISO/IEC TW 14652 (albeit 2002 edition which I found for free on the Internet). I've looked at history of changes of hr_HR locale through "git blame". I've also studied at sr_RS locale which is somewhat related to hr_HR since Croatian, Bosnian and Serbian have (or had) lot of common conventions. Initially I only wanted to change LC_COLLATE, but it made sense to update locale as a whole, so it required far more time than I anticipated, but changes made are worth it. I've (heavily) commented the locale, so it should be easy to maintain from now. UTF-8 characters are used only in comments). I've also contacted all hr translators team leaders and pointed them to this bugreport to give their opinion on these changes, since they will be system wide when accepted, and they are, by definition, at the forefront of i18n and l10n efforts. There are some general locale system errors which are not specific to Croatian locale, so if Ulrich Drapper (if he has some time) or someone else versed in in glibc internals can look at change descriptions to LC_COLLATE, LC_ADDRESS and LC_TELEPHONE, and help me a bit with system errors found there while using localedef I'd be really thankful :o) ---------------- CHANGES: ---------------- % <initial comments> Mostly cleaned comments and removed repeatable information into LC_IDENTIFICATION. Added that charset used in croatia should primarily be UTF-8. Previously we used ISO-8859-2 (which should be phased out since it doesn't support digraph characters [dž, lj and nj]) Added my email in the authors list, just so I can be notified in the future when the locale changes. LC_IDENTIFICATION: I bumped revision to 2.0 (from 1.0) since this is a major rewrite of this locale. I have left "CARNet" and their address although I'm not really sure why CARNet (Croatian Academic and Research Network) would have jurisdiction over hr_HR locale. Not even Ministry of Education of Croatia has jurisdiction over it, as they don't supply rules for writing dates, or monetary strings for example. category statements were updated to reflect new changes. Standard requires first parameter to define to which standard this category complies but all other locales just use locale name and a year here, so I did just that too. BTW most locales don't list all categories which are include in their file. For example, they usually don't include LC_MEASUREMENT. I did... LC_CTYPE: Although ISO/IEC TR 14652 has controversial LC_XLITERATE category, glibc uses "translit_start" inside LC_CTYPE. Hence I've added transliteration info (how to transliterate digraphs to ISO-8859-2 and ASCII). I'm not really sure how to test this, I hope I got it right. There is some weird behaviour in included "i18n"... For example it has same character in "upper" AND "lower" class, so both iswupper() and iswlower() give TRUE for <U01C5> {Dž}. I guess this is ok. Another behaviour is that towupper() will make <U01C6> {dž} -> <U01C4> {DŽ}, which can be wrong in some cases where <U01C5> {Dž} is needed. This is not ok, but not fixable in the current implementation anyway, so lets add it to curiosities for now :o) LC_COLLATE: Major revision. I have included "iso14651_t1" like most locales to reap benefits of "iso14651_t1" updates, as well as to significantly reduce hr_HR locale size and increase readability collating-elements are created and linked to the right digraphs [dž, lj and nj] BTW "collating-element" shouldn't be used after "copy", but many locales use it since there is no other way, except putting them in "iso14651_t1" Croatian alphabet considers č, ć, dž, đ, lj, nj, š and ž distinct letters, and that was implemented with reorder-after statements localedef says I have SYNTAX ERROR in LC_COLLATE, probably not liking "<d><z>" digraph literal. Is this really SYNTAX ERROR? It works though... LC_TIME: Names of days and months are now written with right digraphs, and not a combination of ASCII letters ( Digraphs can nowdays be seen in CLI apps as well. For example `cal 2009`. d_t_fmt was changed to format like: "Ponedjeljak, 31. Kolovoz 2009. 16:35:05 CEST" (The best we can in current implementation. Croatia uses declension in month names like most Slavic languages) [ This format can be seen on Croatian government pages http://vlada.hr/ ] date_fmt was changed to format like: "Pon, 31.08.2009. 16:49:36 CEST" [ Croatia in general doesn't use short versions of month or day names. For month we usually use number as seen on pages of Croatian president [ http://www.predsjednik.hr/ ] d_fmt is changed to format like: "01.09.2009." for reasons same as in date_fmt change explanation. Croatians read and write dd.mm.yyyy format for decades. If someone objects that it confuses people who use mm.dd.yyyy (us) format, I agree, but this is hr_HR locale and this form is widely used in Croatia. System software should use YYYY-MM-DD format anyway regardles of locale. t_fmt is changed to format like: "HH:MM:SS" I've added week, first_weekday, first_workday. first_weekday and first_workday are set to Monday LC_NUMERIC I've set thousands_sep to '.' So formating of numbers is "12.345.678,90" or "-12.345.678,90" LC_MONETARY I've lowecased currency_symbol to "kn" since that form is what majority of citizens/shops nowdays use. See online shops: http://www.links.hr/ , http://www.profil.hr/ , and many others. You can see there is no rule for this at wikipedia: http://hr.wikipedia.org/wiki/Hrvatska_kuna , where they note that Symbol is "Kn" but use "kn" a lot on the same page I've added thousands_sep to '.' as in LC_NUMERIC I've changed monetary string format to: "14.986,42 kn", "-14.986,42 kn" and for international to "HRK 14.986,42" and "-HRK 14.986,42" as was agreed upon in 2003 by Tomislav Vujec on libc-alpha [ http://sourceware.org/ml/libc-alpha/2003-04/msg00254.html ]. I'm not really sure that in international version HRK should be before the value (as said at the top, there is no law on how to write monetary values in Croatia just conventions). I'd leave them the same as local versions, and just use HRK instead of kn but I've complied with libc-aplha agreement of 2003 for now. LC_MESSAGES: I've removed trailing .* in yesexpr and noexpr as it was discussed in libc-locales mailing list [ http://sources.redhat.com/bugzilla/show_bug.cgi?id=71 ] that it's not really necessary. I didn't include 1 in yesexpr and 0 in noexpr although this was discussed in libc-locales mailing list too. But not many locales use it, so I've skipped it for now I've added yesstr, and nostr LC_NAME: Changes name_fmt to "salutation name other_name surnames" I've added name_mr, name_mrs, and name_miss. Croatia doesn't have gender neutral salutation, nor neutral female (name_ms) version of salutation LC_ADDRESS: postal_fmt is changed, so that address now looks like: Company name Department name Person's name C/O Person or Organization Street name and house number ZIP Code and City name Country localedef complains that postal_fmt have invalid escape sequence, I don't know why!?! I've added definitions for many missing attributes: country_post, country_car, country_isbn, lang_name, lang_ab, lang_term and lang_lib LC_TELEPHONE I've changed tel_int_fmt to look like: "+<country code> <area code without leading 0> <local number> < possible ext>" I've changed tel_dom_fmt to look like: "<possible area code with leading 0> <local number> <possible ext>" localedef complains that tel_int_fmt and tel_dom_fmt have invalid escape sequence, I don't know why!?! LC_PAPER A4 is used in Croatia LC_MEASUREMENT Croatia uses metric measurements ---------------- TESTING: ---------------- To see the file without Uxxx literals, I made this ugly oneliner which make HTML version of it. Just change file name at the start and you can use it with other locale files as well. ( FILE=hr_HR; sed -e 's/<U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)>/\<\&#x\1;\>/g' < $FILE > $FILE.tmp; sed -e 's/</\</g' < $FILE.tmp > $FILE.html; sed -e 's/>/\>/g' < $FILE.html > $FILE.tmp; echo "<pre>" > $FILE.html; cat $FILE.tmp >> $FILE.html; rm $FILE.tmp ) Also to test collating in hr_HR locale I made small dictionary which has Croatian digraphs in all forms, as well as letters which are considered distinct. To test collation with it I do the following: randomize it with `sort -R`, and resort it. The end file should have same MD5 as the starting one... Testing of other locale categories is a bit harder, but small C programs work well, and most code templates you have in glibc source / localedata anyway.
Created attachment 4158 [details] new hr_HR locale file This is not a patch file since it's completely new file and patch would be huge without valid reason for it. This file is just 1/7th of the file it replaces!
Created attachment 4159 [details] Small Croatian dictionary for testing Small dictionary of already sorted Croatian words which have digraphs, it's variations and letters which are considered distinct in an Croatian alphabet which affects sorting. Use `sort -R` to randomize it, and `sort` to check you get the same version back
Just a note that I fully support the changes. Further more, since I moved out of Croatia 10 years ago, I am unable to stay in sync with language and locale relevant policies and rules. Therefore, I would like to ask that a new maintainer is selected. I don't know if there is an official process for this now days, but since Dragan did all this work, I would like to support him if he wants to take over that role.
Thank you Tomislav for your support, KDE l10n team leader contacted me, I'm still waiting for GNOME l10n team to have their say. As for new maintainership, I'm willing to accept it for hr_HR locale
GNOME translation team still hasn't responded to my query. Instead I have contacted Croatian Ubuntu team which also does translation work, as well as Croatian Linux news group. So lets wait for few more days...
Created attachment 4175 [details] A new version of a hr_HR locale (with lowecased day and month names)
Apart for lowercasing day and month names, nobody had any objections to this new version of locale. I think this can be commited to libc-locales Thank you all for your time, N::
The locale doesn't compile correctly: /home/drepper/gnu/libc/localedata/locales/hr_HR:143: LC_COLLATE: syntax error LC_ADDRESS: invalid escape `%n' sequence in field `postal_fmt' LC_TELEPHONE: invalid escape sequence in field `tel_int_fmt' LC_TELEPHONE: invalid escape sequence in field `tel_dom_fmt' no output file produced because warnings were issued
(In reply to comment #8) > The locale doesn't compile correctly: > > /home/drepper/gnu/libc/localedata/locales/hr_HR:143: LC_COLLATE: syntax error > LC_ADDRESS: invalid escape `%n' sequence in field `postal_fmt' > LC_TELEPHONE: invalid escape sequence in field `tel_int_fmt' > LC_TELEPHONE: invalid escape sequence in field `tel_dom_fmt' > no output file produced because warnings were issued Hi Ulrich, thanks for your time and reply... I'm aware of this errors, but they are more system wide errors than hr_HR ones. I wrote about them in the long explanation... I'm quite sure you didn't have time to read it all, but let me repeat the last paragraph of RATIONALE which is of importance here: ------------------------------------------------------------------------------ There are some general locale system errors which are not specific to Croatian locale, so if Ulrich Drapper (if he has some time) or someone else versed in in glibc internals can look at change descriptions to LC_COLLATE, LC_ADDRESS and LC_TELEPHONE, and help me a bit with system errors found there while using localedef I'd be really thankful :o) ------------------------------------------------------------------------------ Allow me to elaborate just a bit to make it easier for you: "hr_HR:143: LC_COLLATE: syntax error" :: I've used quotes to mark digraph <d><z>. I used that designation since the same designation is used in "iso14651_t1_common" file... Look with: `grep '<d><z>' iso14651_t1_common` "LC_ADDRESS: invalid escape `%n' sequence in field `postal_fmt'" :: %n is a valid escape sequence per "ISO/IEC TW 14652". It states: "%n -- Person's name, possibly constructed with the LC_NAME "name_fmt" keyword" LC_TELEPHONE: invalid escape sequence in field `tel_int_fmt' and LC_TELEPHONE: invalid escape sequence in field `tel_dom_fmt' :: Again, as per ISO/IEC TW 14652, it contains no invalid escape sequence... %c %a %A %l %e and %t are mentioned in standard. Thank you once more... N::
Dragan, thank you for your work. It is true that the locales in glibc are not fully ISO/IEC 14652 compliant, in particular some fields that should be used in fact are not. I'm not personally sure why this is the case, probably it's for purely historical reasons. However, I believe the greatest value lies in consistency, and if no current locales use %n in postal_fmt and %e and %t in tel_*_fmt, neither should hr_HR as the programs using these locales probably do not expect to find these field descriptors there. So let's not conflate the issue of unsupported field descriptors with the new hr_HR locale; could you please submit an hr_HR locale version that does not use these field descriptors? Since you got a buy-in from other Croatians active in this area, I think we can commit the new locale speedily afterwards. (Regarding the issue of unsupported field descriptors, if you are interested in pursuing that further. A simple technical fix is to simply patch locale/programs/ld-{telephone,address}.c to allow these. However, we should do this with consideration to locale consistency and current usage of these categories in programs. This needs to be researched and I think the next reasonable step is to document the currently supported field descriptors in "glibc style locales". We can then think of how to proceed further while our users will already have a valuable reference. This process can be done gradually, category by category. Does that sound sensible?)
On Sat, Feb 16, 2013 at 12:39:45AM +0000, pasky at ucw dot cz wrote: > (Regarding the issue of unsupported field descriptors, if you are interested in > pursuing that further. A simple technical fix is to simply patch > locale/programs/ld-{telephone,address}.c to allow these. However, we should do > this with consideration to locale consistency and current usage of these > categories in programs. This needs to be researched and I think the next > reasonable step is to document the currently supported field descriptors in > "glibc style locales". We can then think of how to proceed further while our > users will already have a valuable reference. This process can be done > gradually, category by category. Does that sound sensible?) I would rather take another approach, and that would be to further implement ISO TR 14652 or the new version thereof, ISO TR 30112. ISO TR 30112 is closer to glibc, as some things that glibc implements is now specified in 30112, including LC_PAPER. Best regards Keld
Thank you for your comments! So if I understand correctly, I just need to trim LC_ADDRESS and LC_TELEPHONE to comply with current support in glibc, and you'll accept the whole patch? That would be great, since it resolves a lot of issues, shortens the file, makes it more manageable for future changes, and so on... Back then I've read whole ISO/IEC TW 14652, tried to mimic other locale format as much as possible and I think I made good patch. In the end I thought I'd need to learn flex & bison to improve glibc parsing of those data, but that was beyond me.
Keld, of course using the newer standard makes sense; however, I'm not sure what do you mean by "further implement" and how that differs from what I wrote. If you are interested in discussing this further, I propose we move the discussion to the mailing list where more people could follow it. (Note that I myself don't have the time to pursue the issue itself, so it makes sense to talk more about it only if someone intends to do anything about it.) Dragan, I'm sorry, I missed the LC_COLLATE syntax error. Any reason why we cannot use the unicode entity there instead? Also, I'm wondering, how was testing of this locale done if it doesn't even compile with glibc's localedef now? And which of the people that provided support for the new locale actually tested it rather than just embraced the idea?
On Sun, Feb 17, 2013 at 12:13:28AM +0000, pasky at ucw dot cz wrote: > http://sourceware.org/bugzilla/show_bug.cgi?id=10580 > > --- Comment #13 from Petr Baudis <pasky at ucw dot cz> 2013-02-17 00:13:28 UTC --- > Keld, of course using the newer standard makes sense; however, I'm not sure > what do you mean by "further implement" and how that differs from what I wrote. > If you are interested in discussing this further, I propose we move the > discussion to the mailing list where more people could follow it. (Note that I > myself don't have the time to pursue the issue itself, so it makes sense to > talk more about it only if someone intends to do anything about it.) So where should we do the discussion? I did think that this list was relevant. Anyway, the differences are not big,.. It is mostly to align with current glibc implementation, and then introduce 2 novelties. Best regards Keld
Hi, let me be frank. This was made in 2009. I've spend at least a week reading ISO documents, comparing to other locales similar to hr_HR, contacting Croatian Linux User Group and writing tests. Every question considering compile errors was answered in the huge description of the patch, and repeated on comment #9, since obviously Drepper didn't read it in the first place when he dissed the patch. If you don't want LC_ADDRESS or LC_TELEPHONE, copy them from C locale. If you don't want to implement "<d><z>", comment it out... Also there is no standard test suite for this locale categories. I find it hard to believe that I (or any locale writer) have to write custom test suites from scratch again, nor do I have the time. I repeat, this patch was a big improvement in 2009. I don't have time to again write test suites from scratch. Let alone to reread ISO documents, and patch libc itself. It's your choice will you ever apply this patch.
I have read everything you have written in this bugreport; I might have missed something, but I asked my questions because I believe they weren't answered in the previous comments. My question was not geared at test suites, though I appreciate your effort to test the collation rules. I was just wondering whether and how this locale (considering that it cannot be compiled by localedef as it is now) was tried out with actual commonly used software, and whether that was done just by you or by the other people supporting it too. If you could adjust the locale into a compilable form, we can easily ask other to test it so that we can incorporate any bugfixes before the next release; this (besides few simple sanity checks I'll do) does not need to block committing the new locale.
Heyyah Petr, thanks for reading and a reply. Give me a few days, and I'll try to test and fix this patch to compile using 2.17. I cannot vouch for testing of others who saw and gave approval of this patch. I did it myself as I was displeased of the state of hr_HR locale back then. I was mainly interested in collation, but did a lot more research then intended, and in turn patched all categories of locale. During that, I've cleaned, commented and trimmed the locale file considerably. bye for now, N::
Hi! Yes, I fully appreciate your efforts - I just want to confirm the status of the new locale regarding how it has been tested. Glad you decided to update your version of the locale, we will be looking forward to the new version. I can't think of specific updates that would be required for 2.17 (there were no changes in stock hr_HR since 2009), so mainly making it compile would be great.
Created attachment 6876 [details] An updated version of hr_HR which solves problems with LC_COLLATE, LC_ADDRESS and LC_TELEPHONE sections This is promised update to the hr_HR locale. Changes are: - bumped revision to 2.1 and a date to current date - removed duplicate character transliterations from LC_CTYPE which are found in i18n - changed LC_COLLATE error, and tested with the small Croatian dictionary provided in 2009 using "sort -R dict_file > scrambled_file; sort scrambled_file > sorted_file" md5 sums of original <dict_file> and <sorted_file> are the same - updated some comments, and some spacings - changed thousands_sep and mon_thousands_sep to " " instead of "." char to comply with the suggestions in language books published since 2009. - updated LC_ADDRESS to remove %n (persons name) field since it's not yet available in the code. Other locales fall back to %a (care of person or organization) and that's ok for now. - cleaned LC_TELEPHONE by removing %t (space or null string) and %e (extension) fields which are currently unsupported in the code. Falled back to "+%c %a %l" and "%A %l" as seen in other locales. Locale now compiles cleanly using localedef...
Created attachment 6877 [details] An updated version of hr_HR which solves problems with LC_COLLATE, LC_ADDRESS and LC_TELEPHONE sections Fixed small typos in comments... Reset the bug status to "NEW", to signify it's ready for review by mainteiners of the library... Thanks for your time, N::
Will you accept this patch? It also fixes #15264
Created attachment 7010 [details] Updated version of hr_HR Removed CARNet as source of the locale, and their address since I don't have any official relation to them, and the locale is completely changed. Small fixes in comments of the locale Bumped version to 2.2 and date to 2013-05-01
Where did it hang for so long? First weekday still wrong in Fedora after 1 year https://sourceware.org/bugzilla/show_bug.cgi?id=14892
*** Bug 14892 has been marked as a duplicate of this bug. ***
week settings should be fixed by: https://sourceware.org/ml/libc-alpha/2016-04/msg00419.html
Created attachment 9196 [details] Added week and first_weekday to the locale As requested, locale now contains missing "week" and "first_weekday" fields...
Created attachment 9197 [details] Small patch removing duplicated fields Small fix of removing multiple week and first_weekday... SemiRocket and Mike, thank you for your interest in moving this from a deadpoint. If you find any mistakes, please let me now so we can finally ship this with glibc-2.24 and finally have clean, and more importantly correct locale.
Will that effect sorting order of the sort command from GNU/Linux command line? If yes, I'm waiting for that status to change to FIXED since 2014. :) I'm sorry for not being able to participate with constructive comment but hoping to keep this alive since last comment was made a year ago. Thanks
In the #1 post from 2009, look under TESTING... there you have a sample using sort command...
(In reply to Dragan Stanojevic - Nevidljivi from comment #29) > In the #1 post from 2009, look under TESTING... there you have a sample > using sort command... Had no idea it could work that way. This will save me a lot of trouble I'm going trough write now when sorting Croatian text. I'll try to contact you via e-mail because I have some more questions about localization files in general and I'm thinking about changing one so I need some help. Don't won't to spam this report as it serves different purpose. I just hope to see hr_HR.utf8 in Debian soon. Many thanks for help and effort.
*** Bug 22518 has been marked as a duplicate of this bug. ***
Created attachment 10651 [details] 0001-hr_HR-locale-various-updates-BZ-10580.patch
Created attachment 10652 [details] 0002-Add-test-case-for-collation-in-hr_HR-locale.patch
Created attachment 10653 [details] 0003-Fix-test-case-for-hr_HR-monetary-formatting.patch
Created attachment 10654 [details] 0004-hr_HR-locale-fix-collation-and-expand-collation-test.patch
The patches attached to comment#32, comment#33, comment#34, and comment#35 : 0001-hr_HR-locale-various-updates-BZ-10580.patch 0002-Add-test-case-for-collation-in-hr_HR-locale.patch 0003-Fix-test-case-for-hr_HR-monetary-formatting.patch 0004-hr_HR-locale-fix-collation-and-expand-collation-test.patch update Dragan Stanojevic’s patch to current glibc master.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 5e56e937c9144e70a16793d2c5aa22d1bd0b2c18 (commit) via cf4341ca90164398c05e74f72ff19dc52136731c (commit) via 9ca6b343783236fda88e9712f29b46ec875d4156 (commit) via 37075ae18d10802b9d62db3fbc910b30e01398d4 (commit) from f33632ccd1dec3217583fcfdd965afb62954203c (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5e56e937c9144e70a16793d2c5aa22d1bd0b2c18 commit 5e56e937c9144e70a16793d2c5aa22d1bd0b2c18 Author: Mike FABIAN <mfabian@redhat.com> Date: Thu Nov 30 12:13:02 2017 +0100 hr_HR locale: fix collation and expand collation test file * localedata/locales/hr_HR (LC_COLLATE): Fix collation to make test case pass. * localedata/hr_HR.UTF-8.in: Add more test strings. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cf4341ca90164398c05e74f72ff19dc52136731c commit cf4341ca90164398c05e74f72ff19dc52136731c Author: Mike FABIAN <mfabian@redhat.com> Date: Thu Nov 30 10:50:44 2017 +0100 Fix test case for hr_HR monetary formatting * stdlib/tst-strfmon_l.c: Fix testcase. Needed because of [BZ #10580] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9ca6b343783236fda88e9712f29b46ec875d4156 commit 9ca6b343783236fda88e9712f29b46ec875d4156 Author: Dragan Stanojević - Nevidljivi <invisible@hidden-city.net> Date: Thu Nov 30 10:02:55 2017 +0100 Add test case for collation in hr_HR locale * localedata/Makefile: Add hr_HR.UTF-8 to test-input and to the list of locales to built for testing. * localedata/hr_HR.UTF-8.in: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=37075ae18d10802b9d62db3fbc910b30e01398d4 commit 37075ae18d10802b9d62db3fbc910b30e01398d4 Author: Dragan Stanojević - Nevidljivi <invisible@hidden-city.net> Date: Thu Nov 30 09:14:51 2017 +0100 hr_HR locale: various updates [BZ #10580] [BZ #10580] * localedata/locales/hr_HR (LC_COLLATE): Base collation rules on iso14651_t1. * localedata/locales/hr_HR (LC_TIME): Sync month and day names with CLDR (except use ligatures for the digraphs, CLDR does not use the ligatures), add first_workday, some fixes in the date and time formats. * localedata/locales/hr_HR (LC_CTYPE): Add transliteration rules for Đ and đ. * localedata/locales/hr_HR (LC_MONETARY): Change currency_symbol to lower case. p_cs_precedes and n_cs_precedes should be 0 instead of 1. Add int_p_cs_precedes and int_n_cs_precedes. * localedata/locales/hr_HR (LC_NUMERIC): Change thousands_sep to "<U202F>" (NARROW NO-BREAK SPACE) and grouping to 3;3 (Agrees with LC_MONETARY now). * localedata/locales/hr_HR (LC_TELEPHONE): Add tel_dom_fmt. * localedata/locales/hr_HR (LC_NAME): Add name_mr, name_mrs, and name_miss. * localedata/locales/hr_HR (LC_ADDRESS): Add country_post, country_isbn, and lang_lib. Change postal_fmt. change ----------------------------------------------------------------------- Summary of changes: ChangeLog | 39 + localedata/Makefile | 4 +- localedata/hr_HR.UTF-8.in | 70 ++ localedata/locales/hr_HR | 2324 ++++----------------------------------------- stdlib/tst-strfmon_l.c | 8 +- 5 files changed, 303 insertions(+), 2142 deletions(-) create mode 100644 localedata/hr_HR.UTF-8.in
Fixed in glibc master.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 96b06a19e602557bfa668ad9c1a9f29044d3e774 (commit) via 1f6d91f328b7699610210d7d56d2cc49d60e1c27 (commit) from 2e49fed84c9ada0ad54445d197060dc28ee94103 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=96b06a19e602557bfa668ad9c1a9f29044d3e774 commit 96b06a19e602557bfa668ad9c1a9f29044d3e774 Author: Mike FABIAN <mfabian@redhat.com> Date: Mon Dec 4 17:46:28 2017 +0100 tr_TR locale: Base collation on iso14651_t1 [BZ #22527] [BZ #22527] * localedata/locales/tr_TR (LC_COLLATE): Base collation rules on iso14651_t1. A test file localedata/tr_TR.UTF-8.in is already available, this rewrite of the collation rules does reproduce the test file in the same order. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1f6d91f328b7699610210d7d56d2cc49d60e1c27 commit 1f6d91f328b7699610210d7d56d2cc49d60e1c27 Author: Mike FABIAN <mfabian@redhat.com> Date: Mon Dec 4 13:10:29 2017 +0100 hr_HR locale: Don’t use single code points for the digraphs in LC_TIME [BZ #10580] * localedata/locales/hr_HR (LC_TIME): Use two letters for the digraphs in the month and day names. Using single code points for digraphs is deprecated. While there are dedicated Unicode codepoints, for the digraphs, these are included for backwards compatibility and modern texts use a sequence of Basic Latin characters. See: https://www.unicode.org/faq/ligature_digraph.html This makes the month and day names agree exactly with CLDR now, CLDR does not use the single code points for the digraphs either. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 20 + localedata/locales/hr_HR | 18 +- localedata/locales/tr_TR | 2112 ++-------------------------------------------- 3 files changed, 82 insertions(+), 2068 deletions(-)
Big thanks to Mike FABIAN for working on resolving this, and being through with the ending solution by brainstorming on digraphs usage, making locale more in line with CLDR, and making it more practical by avoiding digraphs in LC_TIME...
(In reply to cvs-commit@gcc.gnu.org from comment #39) > [...] > commit 1f6d91f328b7699610210d7d56d2cc49d60e1c27 > Author: Mike FABIAN <mfabian@redhat.com> > Date: Mon Dec 4 13:10:29 2017 +0100 > > hr_HR locale: Don’t use single code points for the digraphs in LC_TIME > > [BZ #10580] > * localedata/locales/hr_HR (LC_TIME): Use two letters for the > digraphs in the month and day names. Using single code points for > digraphs is deprecated. While there are dedicated Unicode > codepoints, for the digraphs, these are included for backwards > compatibility and modern texts use a sequence of Basic Latin > characters. See: https://www.unicode.org/faq/ligature_digraph.html > This makes the month and day names agree exactly with CLDR now, > CLDR does not use the single code points for the digraphs either. > [...] Before this change all abmon items (abbreviated month names) were 3 letters long. Now all are 3 letters long except the second item (February, Feb) which is "velj", 4 letters long. Previously it was "velj" therefore 3 letters. Dragan, wouldn't you prefer it to be "vel", consequently 3 letters long? The page https://vlada.gov.hr/ uses "Vel". CLDR uses "velj" so if you'd like this change I suggest creating a new ticket in in CLDR first: http://unicode.org/cldr/trac/newticket
> Before this change all abmon items (abbreviated month names) were 3 letters > long. Now all are 3 letters long except the second item (February, Feb) > which is "velj", 4 letters long. Previously it was "velj" therefore 3 > letters. Dragan, wouldn't you prefer it to be "vel", consequently 3 letters > long? True, before this change all were 3 letters, but through discussion with Mike several arguments were made against using digraphs in LC_TIME: - Unicode has since moved away from promoting them - They have a lot of problems with digraphs and even tried to solve it with: "U+034F COMBINING GRAPHEME JOINER" fix, so that digraphs would be glued with it, but still written as two separate letters. - Digraphs often look ugly in fonts, or are not contained in them so they're substituted from another font, terminals in general don't have Unicode fonts, and in TUI apps, it is better not to force digraphs, example would be `cal` or TUI mail clients, shell prompt, tmux, ... - abbreviations in many glibc locales isn't 3 letters. There is no rule that they need to be, they just need to be shorter. - I have wrongly assumed all abbr. needed to be of same length, they don't. If that was the case I'd be more stubborn on digraphs, this way I'm more in favor of "Velj". - Many applications and many programmers decided to avoid glibc locale since it was ugly. They either decided to make their own (LibreOffice for example), or they do something like taking first 3 letters of a month or day name, giving them wrong "Vel" values. "lj" is a digraph and a distinct phoneme, sounding different from simple "l". IMO "Vel" is more wrong then "Velj". - Glibc and CLDR were once very stern in what they'd accept. Now they've become more pragmatic. One result is this issue with digraphs, but I hope that it is clear that it was done with end users in mind. There are not many Unicode digraphs used. And people will continue to type two letters for them since entering digraphs is still awkward. In the end, this patch was done more than 8 years ago. It was a complete rewrite of the old locale and intention was to make it correct and easy to read/maintain. During those 8+ years several bugs were issued towards hr_HR and all of them were dups of this one, since I've solved all the issues back then. Yet so many maintainers avoided this patch for one reason or the other. During discussion with Mike, I really wasn't into forcing digraphs except in LC_COLLATE, since that would be awkward for end users, and most other locales avoid digraphs anyway. Even Unicode FAQ notes that they're troublesome in so many practical ways. In the end, I'm open to thoughts and arguments of others, especially end users, but this patch, in any conceivable way compared to the previous state, is a huge push towards maintainable and clear hr_HR locale.
(In reply to Rafal Luzynski from comment #41) > Before this change all abmon items (abbreviated month names) were 3 letters > long. Now all are 3 letters long except the second item (February, Feb) > which is "velj", 4 letters long. Previously it was "velj" therefore 3 > letters. Dragan, wouldn't you prefer it to be "vel", consequently 3 letters > long? No, I don’t think this makes sense because lj belongs together, one should not cut this digraph in the middle. Several other locales also have abbreviations for the month and day names longer than 3 characters. I think that is OK if it makes no sense to cut off after 3 characters.
That's OK, if "lj" is a digraph which should not be split and "vel" is not correct and "velj" is the correct abbreviation then let's leave it as is.