Bug 71 - [PATCH] All Arabic locales should use the same yesexpr/noexpr
Summary: [PATCH] All Arabic locales should use the same yesexpr/noexpr
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: localedata (show other bugs)
Version: 2.3.3
: P2 normal
Target Milestone: ---
Assignee: Petter Reinholdtsen
URL:
Keywords:
Depends on:
Blocks: 305
  Show dependency treegraph
 
Reported: 2004-03-10 19:35 UTC by Pablo Saratxaga
Modified: 2019-04-10 12:22 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Patch for ar_TN and ar_YE. (514 bytes, patch)
2004-03-19 21:44 UTC, Petter Reinholdtsen
Details | Diff
Improve yesexpr/noexpr for all arabic locales (1.37 KB, patch)
2004-08-10 22:25 UTC, Petter Reinholdtsen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pablo Saratxaga 2004-03-10 19:35:12 UTC
The locales ar_SA, ar_TN and ar_YE have incorrect yesexpr/noexpr values; as a
result it is impossible, under those locales, to properly reply to yes/no questions.

to test:
for i in ar_EG,ar_SA,ar_TN,ar_YE
do
    echo $i
    LC_ALL=$i locale -c yesexpr noexpr
done

(ar_EG is correct, so you can see a model)

all three ar_SA, ar_TN, ar_YE add two letters (<U0639> and <U0645>) to yesexpr
and one letter (<U0627>) to noexpr.
I don't know if those additions should be done on all ar_* or not, but the
problem is not that; the problem is that ar_SA uses parenthesis instead of
square brackets, and the other two use nothing!

Also, I think that *all* ar_* locales should also add "Yy" to yesexpr and "Nn"
to noexpr, in order to provide compatibility support (think that a lot of tools,
particularly comand line ones, aren't arabized yet, so the question will be in
english, and people will tend to reply in English to such questions; with the
current versions the replies will fail, as the locale only accept arabic letters)
Comment 1 Petter Reinholdtsen 2004-03-19 21:11:33 UTC
Looking at the ar_SA locale, it seem to be using the paranteses for
grouping, and the 'or' notation to match (pseudo-notation, converted
to ASCII where possible):

  yesexpr "^(<U0646>|<U0646><U0639><U0645>)"
  noexpr  "^(<U0644>|<U0644><U0627>)"

This seem to me like a valid regex, matching a sequence of chars at
the start of the string.  Is this legal for locales?  I'm not sure.

The ar_TN and ar_YE locales are obvoiusly wrong.  Adding '^[' at
the start of the regex and ']' at the end is required for this
regex to match properly.  I'll make a patch for this.

I'm not sure if all locales should include 'Yy' and 'Nn' in the
regex.
Comment 2 Pablo Saratxaga 2004-03-19 21:42:26 UTC
I strongly think that Yy and Nn should be included in all locales (unless they
conflict with the native letters, which happens for 2 or 3 languages at most).

Here is why:

[ru@test ru]$ touch foo
[ru@test ru]$ LANGUAGE=ar LC_ALL=ar_SA rm -i foo
rm: remove regular empty file `foo'? y
[ru@test ru]$ ls foo
foo

AS you can see, replying "y" is ignored in Arabic locale; but the question
messages is not translated, the question is done in English (and there are
plenty of such cases). People tend to reply in English (eg: with the "Y" (or
"N") key) when the question is not localized.
Maybe it is not stricto sensu a bug, but it is a very big annoyance, that could
be easily avoided.
Comment 3 Petter Reinholdtsen 2004-03-19 21:44:12 UTC
Created attachment 38 [details]
Patch for ar_TN and ar_YE.
Comment 4 Petter Reinholdtsen 2004-03-23 21:59:19 UTC
The patch fo rar_TN and ar_YE is now included in CVS.

I believe the strange format of the yesexpr and noexpr in ar_SA
should be reported as a separate bug if the regexp always should
match single characters and not sequence of chars.

I also believe the request that all yesexpr/noexpr patterns should
include 'YyNn' if possible should be reported as a separate bug, as
it is unrelated to the other issues.

Should this bug be closed, and new bugs be opened?  I leave that to
reporter to decide.
Comment 5 Pablo Saratxaga 2004-08-06 20:04:01 UTC
The patch for the ar_TN and ar_YE is probably wrong; in fact if you look at the
ar_SA locale you see that what is present in ar_TN and ar_YE is the word
"n-gh-m" in the second part of the regexp in ar_SA, so it is probably not three
separate letters.

The format of the regexp (either only letters, eg "^[abcd]" or
with words, eg "^(a|b|c|d|afoo|bbar|cfoobar|dtoto)" doesn't seem to be
a problem (but I haven't tested much either);
still, the lack of Yy/Nn is a big problem (I opened bug #305 for it)

So; ar_TN and ar_YE should be changed to match either the format in ar_SA, or
the format in another arabic locale (eg: ar_EG)
Comment 6 Petter Reinholdtsen 2004-08-08 18:46:46 UTC
If the LC_MESSAGES part of ar_TN and ar_YE should be identical to
another locale, they should use the 'copy' statement to fetch
that locales content verbatim.  Patches are most welcome, from
someone understanding arabic. :)
Comment 7 Munzir Taha 2004-08-10 20:59:40 UTC
I believe this is a duplicate of bug 71. We have only one bug that would be 
solved if you set all the arabic locales to use the same regex which 
is ^[نyY].* for the yesexpr and ^[لnN].* for the noexpr. Simple? ;) 
Comment 8 Petter Reinholdtsen 2004-08-10 22:21:30 UTC
Comment on attachment 38 [details]
Patch for ar_TN and ar_YE.

This patch is now in CVS.
Comment 9 Petter Reinholdtsen 2004-08-10 22:25:28 UTC
Created attachment 161 [details]
Improve yesexpr/noexpr for all arabic locales

I suspect you ment that this bug (#71) is a dupliate of bug #305.
I don't think it is, as bug #305 is a generic problem with several
locales, and this bug talks about the arabic locales only.

I made a patch to change the yesexpr/noexpr of ar_EG to make sure
'Yy' and 'Nn' is included and removed the useles '.*' at the end
of the regexes.  This is the first part of this patch.

The second part is changing all the other arabic locales ar_* to
copy the corrected LC_MESSAGES from ar_EG, to make sure all arabic
locales use the same yesexpr/noexpr.  Is this patch correct?
Comment 10 Munzir Taha 2004-08-11 14:30:12 UTC
Ah! 305 is for all locales, I got it now. 
 
Now, I believe everything is OK. The patch seems OK to me. Thanks a lot. 
 
Mr. Pablo, any comment? 
Comment 11 Abdulaziz Al-Arfaj 2004-08-16 08:05:16 UTC
Regarding the questions:

(1) should there be different regex-es for different arabic locales, 
(2) if not, how should the regex-es look and 
(3) which locale should be the "authorative" locale

I _think_ the answers should be:

(1) A definite no.
(2) This probably needs discussion.
(3) None of them.

Hope this is enough information from one person :)
Comment 12 Ulrich Drepper 2005-10-15 01:27:27 UTC
I fixed the locales I saw.  Open new bugs for additional locales which need to
change.