Ukrainian locale in gLibc still out of date. From: Petter Reinholdtsen To: Volodymyr M. Lisivka Date: 23 Nov 2003 11:14:52 +0100 Sorry for my late reply. ... ;)
Created attachment 23 [details] Latest Ukrainian locale Latest version of proposed new Ukrainian locale. I made no changes since 2003-11-25 and saw no bug reports. I consider it as "stable".
Here are some unusual things I notice about the uk_UA-2.0 locale submitted 2004-03-04. - The LC_IDENTIFICATION block uses the <U#> notation for the standard reference. No other locale uses this notation there. I'm not sure what the correct format for that section is, but suggest converting it from <U#> to the ASCII chars. - All 'copy "<file>"' statement uses the <U#> notation as well. This is different from other locales use of the copy notation. - The locale is well documented, including the indented values in comments. This is good. - Did the original locale writer (Denis V. Dmitrienko) approve the changes? It appear to be easier to get locale changes accepted if the original author agrees. - It might help to have references to standard bodies or other official looking entities documenting the correct values. I'm not sure what more to suggest.
> - The LC_IDENTIFICATION block uses the <U#> notation for the > standard reference. No other locale uses this notation there. > I'm not sure what the correct format for that section is, but > suggest converting it from <U#> to the ASCII chars. > > - All 'copy "<file>"' statement uses the <U#> notation as well. > This is different from other locales use of the copy notation. localedef -cv shows warning about _ALL_ parametrers not encoded as <U#....>. You should fix "localedef" first. > - Did the original locale writer (Denis V. Dmitrienko) approve the > changes? It appear to be easier to get locale changes accepted > if the original author agrees. Yes, he is agry. Check your old emails - you talking with him about that. - It might help to have references to standard bodies or other official looking entities documenting the correct values. Very hard to do for me - documents not available online. I need special access to read them in technical library or pay money to read them online.
Sorry for my late reply. It is hard to find time to assist in improving the glibc locales. [Volodymyr M. Lisivka] > localedef -cv shows warning about _ALL_ parametrers not encoded > as <U#....>. You should fix "localedef" first. That is probably a good idea. I'm sure patches are accepted by the glibc developers. :) > Yes, he is agry. Check your old emails - you talking with > him about that. I hope 'is agry' means 'agrees'. That is good, as it is a lot easier to get locale changes past the glibc developers if the original author approve the changes. > Very hard to do for me - documents not available online. I > need special access to read them in technical library or > pay money to read them online. Hm, yeah. That is a common problem for several locales. But it might not be a major problem when the original author approves. Can you get Denis V. Dmitrienko to add a comment to bugzilla stating that he approve the new locale? I had a fresh look at the attached uk_UA locale, and noticed a few new issues (in addition to the use of <U#> in content and copy statement. Can you have a look at these too, and provide an updated version of the locale? - Both yesexpr and noexpr includes '.*' at the end. The POSIX locale do not include such ending, so I recommend removing this ending to be comparable to the POSIX locale. - yesexpr and noexpr includes '1' and '0'. The POSIX locale do not include numbers, and I recommend removing these numbers from the regex-es. There might be other problems with the locale as well. I am working on a script to check the content of locales, and it is still very sketcy. :) You might find some useful information from <URL: http://www.student.uit.no/~pere/linux/glibc/ > about writing glibc locales. There isn't much there, but it is all I got at the moment. :)
Did you find time to look at my comments regarding the uk_UA locale?
I will release updated version soon.
Created attachment 433 [details] Latest Ukrainian locale The latest proposed Ukrainian locale. Changes: in LC_NUMMERIC dot changed to comma, in LC_TIME week begining changed to Modnay ("week"), but it has no effect, some comments fixed but my English is still bad.
Change the copy "" lines to not use Uxxxx. That's all I have and probably will apply it afterwards.
Created attachment 469 [details] Latest version of the Ukrainian locale Only one change: directives <<copy "...">> changed to not use Uxxxx.
Created attachment 476 [details] Latest version of Ukrainian locale The double-quote not closed in "auidence" directive in previous version. (Thanks to Eugeniy Meshcheryakov).
Your last patch has CR/LF delimiters, which is pretty annoying. Some lines were encoded twice with the <Uxxxx> notation, a patch will follow shortly. There are other minor issues: * Parenthesis are wrongly put in yesexpr/noexpr * Why do you define many collating-elements instead of simply ignoring <U042C> and <U044C> characters? * You set am_pm to non-empty values, but date/time formats only use 24hr notation.
Created attachment 482 [details] Decode lines which were encoded twice
Created attachment 483 [details] Latest Ukrainian locale with patch applied Patch applied, <CR> characters removed. There are other minor issues: > * Parenthesis are wrongly put in yesexpr/noexpr I see no difference in behaviour. > * Why do you define many collating-elements instead of simply ignoring <U042C> and <U044C> characters? Because soft sign is the part of the alphabet. Sorting of the set of letters alone must produce correct alphabet sequence. (I can send tests, if you want). > * You set am_pm to non-empty values, but date/time formats only use 24hr notation. We never use AM/PM format, but AM/PM format is common outside of the Ukraine. It much better to see DO/PO instead of nothing.
>> * Parenthesis are wrongly put in yesexpr/noexpr > I see no difference in behaviour. What I meant is that yesexpr "^(a)|(b)$" matches a leading 'a' or a trailing 'b', but your comments in the locale file show that you surely wanted to write yesexpr "^(a|b)$" That's not a big deal in practice, but these expressions should match their descriptive comments. Your other answers looked fine to me, thanks for your explanations.
(In reply to comment #14) > >> * Parenthesis are wrongly put in yesexpr/noexpr > > I see no difference in behaviour. > Your other answers looked fine to me, thanks for your explanations. Sorry, I mismatched your comment. :-( I has problems with access to Linux shell from the work so I not investigate this problem. I need to write test for it. I will fix it and will release new version in next few days.
Any progress?
Is there a new version coming? In the future could you provide smaller changes in the form of a patch from the top of the tree? Will you be providing a testcase to go with this?
Sorry for my long delay. I will try to provide fixed version of Ukraine locale soon (in next few days).
Gents, this bug is more then 2 years open and I definitely want to push things a little bit further. I offered Volodymyr Lisivka a help and he gracefully accepted my bugfixes. Now I have version 2.1.12 at my hands that has been tested by Volodymyr for few days. Since he has busy times even for upload I can upload new version by myself. The only question is how can we handle that? At first it'll need to be verifyed by glibc maintainers (Ulrich, Dwayne?) and then confirmed by Volodymyr that's all ok with that. I see (I hope :) no problem with former but what would be the options in case Volodymyr won't respond in a reasonable time? Thanks.
Created attachment 1053 [details] New version of Ukrainian locale with encoded characters
Created attachment 1054 [details] New version of Ukrainian locale with unencoded characters in UTF-8
Created attachment 1055 [details] Script to test alphabet sorting and date formatting in Ukrainian locale
Changes in new version of locale: * am_pm and t_fmt_ampm fields are cleared to force 24h time format; * first_weekday and first_workday changed from 2 to 1; * timezone shift time changed to 3:00am instead of 1:00am; * yesexpr and noexpr were fixed; * contact email changed from "libc-locales@sources.readhat.com" to email="bug-glibc-locales@gnu.org"; * lot of fixes in comments (thanks to Max Kutny).
I've checked in the last version.