Bug 58

Summary: Outdated Ukrainian (uk_UA) locale
Product: glibc Reporter: Volodymyr M. Lisivka <lvm>
Component: localedataAssignee: Petter Reinholdtsen <pere>
Status: RESOLVED FIXED    
Severity: normal CC: carlos, drepper.fsp, glibc-bugs, mkutny
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Latest Ukrainian locale
Latest Ukrainian locale
Latest version of the Ukrainian locale
Latest version of Ukrainian locale
Decode lines which were encoded twice
Latest Ukrainian locale with patch applied
New version of Ukrainian locale with encoded characters
New version of Ukrainian locale with unencoded characters in UTF-8
Script to test alphabet sorting and date formatting in Ukrainian locale

Description Volodymyr M. Lisivka 2004-03-04 17:30:57 UTC
Ukrainian locale in gLibc still out of date.

From: Petter Reinholdtsen
To: Volodymyr M. Lisivka
Date: 23 Nov 2003 11:14:52 +0100	

Sorry for my late reply. ...

;)
Comment 1 Volodymyr M. Lisivka 2004-03-04 17:35:19 UTC
Created attachment 23 [details]
Latest Ukrainian locale

Latest version of proposed new Ukrainian locale.
I made no changes since 2003-11-25 and saw no bug reports. I consider it as
"stable".
Comment 2 Petter Reinholdtsen 2004-03-21 12:45:51 UTC
Here are some unusual things I notice about the uk_UA-2.0 locale
submitted 2004-03-04.

 - The LC_IDENTIFICATION block uses the <U#> notation for the 
   standard reference.  No other locale uses this notation there.
   I'm not sure what the correct format for that section is, but
   suggest converting it from <U#> to the ASCII chars.

 - All 'copy "<file>"' statement uses the <U#> notation as well.
   This is different from other locales use of the copy notation.

 - The locale is well documented, including the indented 
   values in comments.  This is good.

 - Did the original locale writer (Denis V. Dmitrienko) approve the
   changes?  It appear to be easier to get locale changes accepted
   if the original author agrees.

 - It might help to have references to standard bodies or
   other official looking entities documenting the correct values.

I'm not sure what more to suggest.
Comment 3 Volodymyr M. Lisivka 2004-04-04 16:24:51 UTC
> - The LC_IDENTIFICATION block uses the <U#> notation for the 
>   standard reference.  No other locale uses this notation there.
>   I'm not sure what the correct format for that section is, but
>   suggest converting it from <U#> to the ASCII chars.
>
> - All 'copy "<file>"' statement uses the <U#> notation as well.
>   This is different from other locales use of the copy notation.
localedef -cv shows warning about _ALL_ parametrers not encoded as <U#....>. You
should fix "localedef" first.



> - Did the original locale writer (Denis V. Dmitrienko) approve the
>   changes?  It appear to be easier to get locale changes accepted
>   if the original author agrees.
Yes, he is agry. Check your old emails - you talking with him about that.


 - It might help to have references to standard bodies or
   other official looking entities documenting the correct values.
Very hard to do for me - documents not available online. I need special access
to read them in technical library or pay money to read them online.
Comment 4 Petter Reinholdtsen 2004-08-08 18:35:22 UTC
Sorry for my late reply.  It is hard to find time to assist in
improving the glibc locales.

[Volodymyr M. Lisivka]
> localedef -cv shows warning about _ALL_ parametrers not encoded
> as <U#....>. You should fix "localedef" first.

That is probably a good idea.  I'm sure patches are accepted by the
glibc developers. :)

> Yes, he is agry. Check your old emails - you talking with
> him about that.

I hope 'is agry' means 'agrees'.  That is good, as it is a lot
easier to get locale changes past the glibc developers if the
original author approve the changes.

> Very hard to do for me - documents not available online. I
> need special access to read them in technical library or
> pay money to read them online.

Hm, yeah.  That is a common problem for several locales.  But it
might not be a major problem when the original author approves.
Can you get Denis V. Dmitrienko to add a comment to bugzilla
stating that he approve the new locale?

I had a fresh look at the attached uk_UA locale, and noticed a few
new issues (in addition to the use of <U#> in content and copy
statement.  Can you have a look at these too, and provide an updated
version of the locale?

 - Both yesexpr and noexpr includes '.*' at the end.  The POSIX
   locale do not include such ending, so I recommend removing this
   ending to be comparable to the POSIX locale.

 - yesexpr and noexpr includes '1' and '0'.  The POSIX locale do
   not include numbers, and I recommend removing these numbers
   from the regex-es.

There might be other problems with the locale as well.  I am working
on a script to check the content of locales, and it is still very
sketcy. :)

You might find some useful information from
<URL: http://www.student.uit.no/~pere/linux/glibc/ > about
writing glibc locales.  There isn't much there, but it is
all I got at the moment. :)
Comment 5 Petter Reinholdtsen 2005-01-14 15:38:52 UTC
Did you find time to look at my comments regarding the uk_UA locale?
Comment 6 Volodymyr M. Lisivka 2005-01-24 16:37:56 UTC
I will release updated version soon.
Comment 7 Volodymyr M. Lisivka 2005-03-14 20:14:01 UTC
Created attachment 433 [details]
Latest Ukrainian locale

The latest proposed Ukrainian locale. Changes: in LC_NUMMERIC dot changed to
comma, in LC_TIME week begining changed to Modnay ("week"), but it has no
effect, some comments fixed but my English is still bad.
Comment 8 Ulrich Drepper 2005-04-29 03:07:46 UTC
Change the copy "" lines to not use Uxxxx.  That's all I have and probably will
apply it afterwards.
Comment 9 Volodymyr M. Lisivka 2005-04-29 10:03:07 UTC
Created attachment 469 [details]
Latest version of the Ukrainian locale

Only one change: directives <<copy "...">> changed to not use Uxxxx.
Comment 10 Volodymyr M. Lisivka 2005-05-05 12:42:41 UTC
Created attachment 476 [details]
Latest version of Ukrainian locale

The double-quote not closed in "auidence" directive in previous version.
(Thanks to Eugeniy Meshcheryakov).
Comment 11 Denis Barbier 2005-05-10 20:09:32 UTC
Your last patch has CR/LF delimiters, which is pretty annoying. Some lines were encoded twice with the <Uxxxx> notation, a patch will follow shortly.  There are other minor issues:   * Parenthesis are wrongly put in yesexpr/noexpr   * Why do you define many collating-elements instead of simply     ignoring <U042C> and <U044C> characters?   * You set am_pm to non-empty values, but date/time formats only     use 24hr notation. 
Comment 12 Denis Barbier 2005-05-10 20:13:23 UTC
Created attachment 482 [details]
Decode lines which were encoded twice
Comment 13 Volodymyr M. Lisivka 2005-05-11 12:35:37 UTC
Created attachment 483 [details]
Latest Ukrainian locale with patch applied

Patch applied, <CR> characters removed.

There are other minor issues:	

> * Parenthesis are wrongly put in yesexpr/noexpr

I see no difference in behaviour.

> * Why do you define many collating-elements instead of simply ignoring
<U042C> and <U044C> characters?

Because soft sign is the part of the alphabet. Sorting of the set of letters
alone must produce correct alphabet sequence. (I can send tests, if you want).

> * You set am_pm to non-empty values, but date/time formats only use 24hr
notation.

We never use AM/PM format, but AM/PM format is common outside of the Ukraine.
It much better to see DO/PO instead of nothing.
Comment 14 Denis Barbier 2005-05-15 19:32:41 UTC
>> * Parenthesis are wrongly put in yesexpr/noexpr
> I see no difference in behaviour.

What I meant is that
  yesexpr "^(a)|(b)$"
matches a leading 'a' or a trailing 'b', but your comments
in the locale file show that you surely wanted to write
  yesexpr "^(a|b)$"
That's not a big deal in practice, but these expressions
should match their descriptive comments.

Your other answers looked fine to me, thanks for your explanations.
Comment 15 Volodymyr M. Lisivka 2005-05-16 16:50:33 UTC
(In reply to comment #14)
> >> * Parenthesis are wrongly put in yesexpr/noexpr
> > I see no difference in behaviour.

> Your other answers looked fine to me, thanks for your explanations.

Sorry, I mismatched your comment. :-(

I has problems with access to Linux shell from the work so I not investigate
this problem. 

I need to write test for it. I will fix it and will release new version in next
few days.
Comment 16 Ulrich Drepper 2005-10-15 00:51:29 UTC
Any progress?
Comment 17 Dwayne Grant McConnell 2006-02-21 16:03:44 UTC
Is there a new version coming?
In the future could you provide smaller changes in the form of a patch from the
top of the tree?
Will you be providing a testcase to go with this?
Comment 18 Volodymyr M. Lisivka 2006-02-22 10:29:35 UTC
Sorry for my long delay. I will try to provide fixed version of Ukraine locale
soon (in next few days).
Comment 19 Max Kutny 2006-05-18 18:12:02 UTC
Gents,

this bug is more then 2 years open and I definitely want to push things a little
bit further.

I offered Volodymyr Lisivka a help and he gracefully accepted my bugfixes.

Now I have version 2.1.12 at my hands that has been tested by Volodymyr for few
days. Since he has busy times even for upload I can upload new version by myself.

The only question is how can we handle that?

At first it'll need to be verifyed by glibc maintainers (Ulrich, Dwayne?) and
then confirmed by Volodymyr that's all ok with that.

I see (I hope :) no problem with former but what would be the options in case
Volodymyr won't respond in a reasonable time?

Thanks.
Comment 20 Volodymyr M. Lisivka 2006-05-27 18:22:27 UTC
Created attachment 1053 [details]
New version of Ukrainian locale with encoded characters
Comment 21 Volodymyr M. Lisivka 2006-05-27 18:23:42 UTC
Created attachment 1054 [details]
New version of Ukrainian locale with unencoded characters in UTF-8
Comment 22 Volodymyr M. Lisivka 2006-05-27 18:25:12 UTC
Created attachment 1055 [details]
Script to test alphabet sorting and date formatting in Ukrainian locale
Comment 23 Volodymyr M. Lisivka 2006-05-27 18:39:05 UTC
Changes in new version of locale:

  * am_pm and t_fmt_ampm fields are cleared to force 24h time format;
  * first_weekday and first_workday changed from 2 to 1;
  * timezone shift time changed to 3:00am instead of 1:00am;
  * yesexpr and noexpr were fixed;
  * contact email changed from "libc-locales@sources.readhat.com" to
email="bug-glibc-locales@gnu.org";
  * lot of fixes in comments (thanks to Max Kutny).


Comment 24 Ulrich Drepper 2007-02-18 03:59:59 UTC
I've checked in the last version.