Bug 71

Summary:	[PATCH] All Arabic locales should use the same yesexpr/noexpr
Product:	glibc	Reporter:	Pablo Saratxaga <pablo>
Component:	localedata	Assignee:	Petter Reinholdtsen <pere>
Status:	RESOLVED FIXED
Severity:	normal	CC:	munzirtaha
Priority:	P2	Flags:	fweimer: security-
Version:	2.3.3
Target Milestone:	---
Host:		Target:
Build:		Last reconfirmed:
Bug Depends on:
Bug Blocks:	305
Attachments:	Patch for ar_TN and ar_YE. Improve yesexpr/noexpr for all arabic locales

Description Pablo Saratxaga 2004-03-10 19:35:12 UTC

The locales ar_SA, ar_TN and ar_YE have incorrect yesexpr/noexpr values; as a
result it is impossible, under those locales, to properly reply to yes/no questions.

to test:
for i in ar_EG,ar_SA,ar_TN,ar_YE
do
    echo $i
    LC_ALL=$i locale -c yesexpr noexpr
done

(ar_EG is correct, so you can see a model)

all three ar_SA, ar_TN, ar_YE add two letters (<U0639> and <U0645>) to yesexpr
and one letter (<U0627>) to noexpr.
I don't know if those additions should be done on all ar_* or not, but the
problem is not that; the problem is that ar_SA uses parenthesis instead of
square brackets, and the other two use nothing!

Also, I think that *all* ar_* locales should also add "Yy" to yesexpr and "Nn"
to noexpr, in order to provide compatibility support (think that a lot of tools,
particularly comand line ones, aren't arabized yet, so the question will be in
english, and people will tend to reply in English to such questions; with the
current versions the replies will fail, as the locale only accept arabic letters)

Comment 1 Petter Reinholdtsen 2004-03-19 21:11:33 UTC

Looking at the ar_SA locale, it seem to be using the paranteses for
grouping, and the 'or' notation to match (pseudo-notation, converted
to ASCII where possible):

  yesexpr "^(<U0646>|<U0646><U0639><U0645>)"
  noexpr  "^(<U0644>|<U0644><U0627>)"

This seem to me like a valid regex, matching a sequence of chars at
the start of the string.  Is this legal for locales?  I'm not sure.

The ar_TN and ar_YE locales are obvoiusly wrong.  Adding '^[' at
the start of the regex and ']' at the end is required for this
regex to match properly.  I'll make a patch for this.

I'm not sure if all locales should include 'Yy' and 'Nn' in the
regex.

Comment 2 Pablo Saratxaga 2004-03-19 21:42:26 UTC

I strongly think that Yy and Nn should be included in all locales (unless they
conflict with the native letters, which happens for 2 or 3 languages at most).

Here is why:

[ru@test ru]$ touch foo
[ru@test ru]$ LANGUAGE=ar LC_ALL=ar_SA rm -i foo
rm: remove regular empty file `foo'? y
[ru@test ru]$ ls foo
foo

AS you can see, replying "y" is ignored in Arabic locale; but the question
messages is not translated, the question is done in English (and there are
plenty of such cases). People tend to reply in English (eg: with the "Y" (or
"N") key) when the question is not localized.
Maybe it is not stricto sensu a bug, but it is a very big annoyance, that could
be easily avoided.

Comment 3 Petter Reinholdtsen 2004-03-19 21:44:12 UTC

Created attachment 38 [details]
Patch for ar_TN and ar_YE.

Comment 4 Petter Reinholdtsen 2004-03-23 21:59:19 UTC

The patch fo rar_TN and ar_YE is now included in CVS.

I believe the strange format of the yesexpr and noexpr in ar_SA
should be reported as a separate bug if the regexp always should
match single characters and not sequence of chars.

I also believe the request that all yesexpr/noexpr patterns should
include 'YyNn' if possible should be reported as a separate bug, as
it is unrelated to the other issues.

Should this bug be closed, and new bugs be opened?  I leave that to
reporter to decide.

Comment 5 Pablo Saratxaga 2004-08-06 20:04:01 UTC

The patch for the ar_TN and ar_YE is probably wrong; in fact if you look at the
ar_SA locale you see that what is present in ar_TN and ar_YE is the word
"n-gh-m" in the second part of the regexp in ar_SA, so it is probably not three
separate letters.

The format of the regexp (either only letters, eg "^[abcd]" or
with words, eg "^(a|b|c|d|afoo|bbar|cfoobar|dtoto)" doesn't seem to be
a problem (but I haven't tested much either);
still, the lack of Yy/Nn is a big problem (I opened bug #305 for it)

So; ar_TN and ar_YE should be changed to match either the format in ar_SA, or
the format in another arabic locale (eg: ar_EG)

Comment 6 Petter Reinholdtsen 2004-08-08 18:46:46 UTC

If the LC_MESSAGES part of ar_TN and ar_YE should be identical to
another locale, they should use the 'copy' statement to fetch
that locales content verbatim.  Patches are most welcome, from
someone understanding arabic. :)

Comment 7 Munzir Taha 2004-08-10 20:59:40 UTC

I believe this is a duplicate of bug 71. We have only one bug that would be 
solved if you set all the arabic locales to use the same regex which 
is ^[نyY].* for the yesexpr and ^[لnN].* for the noexpr. Simple? ;)

Comment 8 Petter Reinholdtsen 2004-08-10 22:21:30 UTC

Comment on attachment 38 [details]
Patch for ar_TN and ar_YE.

This patch is now in CVS.

Comment 9 Petter Reinholdtsen 2004-08-10 22:25:28 UTC

Created attachment 161 [details]
Improve yesexpr/noexpr for all arabic locales

I suspect you ment that this bug (#71) is a dupliate of bug #305.
I don't think it is, as bug #305 is a generic problem with several
locales, and this bug talks about the arabic locales only.

I made a patch to change the yesexpr/noexpr of ar_EG to make sure
'Yy' and 'Nn' is included and removed the useles '.*' at the end
of the regexes.  This is the first part of this patch.

The second part is changing all the other arabic locales ar_* to
copy the corrected LC_MESSAGES from ar_EG, to make sure all arabic
locales use the same yesexpr/noexpr.  Is this patch correct?

Comment 10 Munzir Taha 2004-08-11 14:30:12 UTC

Ah! 305 is for all locales, I got it now. 
 
Now, I believe everything is OK. The patch seems OK to me. Thanks a lot. 
 
Mr. Pablo, any comment?

Comment 11 Abdulaziz Al-Arfaj 2004-08-16 08:05:16 UTC

Regarding the questions:

(1) should there be different regex-es for different arabic locales, 
(2) if not, how should the regex-es look and 
(3) which locale should be the "authorative" locale

I _think_ the answers should be:

(1) A definite no.
(2) This probably needs discussion.
(3) None of them.

Hope this is enough information from one person :)

Comment 12 Ulrich Drepper 2005-10-15 01:27:27 UTC

I fixed the locales I saw.  Open new bugs for additional locales which need to
change.