71 – [PATCH] All Arabic locales should use the same yesexpr/noexpr

Bug 71 - [PATCH] All Arabic locales should use the same yesexpr/noexpr

Summary: [PATCH] All Arabic locales should use the same yesexpr/noexpr

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	localedata (show other bugs)
Version:	2.3.3

Importance:	P2 normal
Target Milestone:	---
Assignee:	Petter Reinholdtsen

URL:
Keywords:

Depends on:
Blocks:	305
	Show dependency tree / graph

Reported:	2004-03-10 19:35 UTC by Pablo Saratxaga
Modified:	2019-04-10 12:22 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Patch for ar_TN and ar_YE. (514 bytes, patch) 2004-03-19 21:44 UTC, Petter Reinholdtsen	Details \| Diff
Improve yesexpr/noexpr for all arabic locales (1.37 KB, patch) 2004-08-10 22:25 UTC, Petter Reinholdtsen	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pablo Saratxaga 2004-03-10 19:35:12 UTC

The locales ar_SA, ar_TN and ar_YE have incorrect yesexpr/noexpr values; as a
result it is impossible, under those locales, to properly reply to yes/no questions.

to test:
for i in ar_EG,ar_SA,ar_TN,ar_YE
do
    echo $i
    LC_ALL=$i locale -c yesexpr noexpr
done

(ar_EG is correct, so you can see a model)

all three ar_SA, ar_TN, ar_YE add two letters (<U0639> and <U0645>) to yesexpr
and one letter (<U0627>) to noexpr.
I don't know if those additions should be done on all ar_* or not, but the
problem is not that; the problem is that ar_SA uses parenthesis instead of
square brackets, and the other two use nothing!

Also, I think that *all* ar_* locales should also add "Yy" to yesexpr and "Nn"
to noexpr, in order to provide compatibility support (think that a lot of tools,
particularly comand line ones, aren't arabized yet, so the question will be in
english, and people will tend to reply in English to such questions; with the
current versions the replies will fail, as the locale only accept arabic letters)

Comment 1 Petter Reinholdtsen 2004-03-19 21:11:33 UTC

Looking at the ar_SA locale, it seem to be using the paranteses for
grouping, and the 'or' notation to match (pseudo-notation, converted
to ASCII where possible):

  yesexpr "^(<U0646>|<U0646><U0639><U0645>)"
  noexpr  "^(<U0644>|<U0644><U0627>)"

This seem to me like a valid regex, matching a sequence of chars at
the start of the string.  Is this legal for locales?  I'm not sure.

The ar_TN and ar_YE locales are obvoiusly wrong.  Adding '^[' at
the start of the regex and ']' at the end is required for this
regex to match properly.  I'll make a patch for this.

I'm not sure if all locales should include 'Yy' and 'Nn' in the
regex.

Comment 2 Pablo Saratxaga 2004-03-19 21:42:26 UTC

I strongly think that Yy and Nn should be included in all locales (unless they
conflict with the native letters, which happens for 2 or 3 languages at most).

Here is why:

[ru@test ru]$ touch foo
[ru@test ru]$ LANGUAGE=ar LC_ALL=ar_SA rm -i foo
rm: remove regular empty file `foo'? y
[ru@test ru]$ ls foo
foo

AS you can see, replying "y" is ignored in Arabic locale; but the question
messages is not translated, the question is done in English (and there are
plenty of such cases). People tend to reply in English (eg: with the "Y" (or
"N") key) when the question is not localized.
Maybe it is not stricto sensu a bug, but it is a very big annoyance, that could
be easily avoided.

Comment 3 Petter Reinholdtsen 2004-03-19 21:44:12 UTC

Created attachment 38 [details]
Patch for ar_TN and ar_YE.

Comment 4 Petter Reinholdtsen 2004-03-23 21:59:19 UTC

The patch fo rar_TN and ar_YE is now included in CVS.

I believe the strange format of the yesexpr and noexpr in ar_SA
should be reported as a separate bug if the regexp always should
match single characters and not sequence of chars.

I also believe the request that all yesexpr/noexpr patterns should
include 'YyNn' if possible should be reported as a separate bug, as
it is unrelated to the other issues.

Should this bug be closed, and new bugs be opened?  I leave that to
reporter to decide.

Comment 5 Pablo Saratxaga 2004-08-06 20:04:01 UTC

The patch for the ar_TN and ar_YE is probably wrong; in fact if you look at the
ar_SA locale you see that what is present in ar_TN and ar_YE is the word
"n-gh-m" in the second part of the regexp in ar_SA, so it is probably not three
separate letters.

The format of the regexp (either only letters, eg "^[abcd]" or
with words, eg "^(a|b|c|d|afoo|bbar|cfoobar|dtoto)" doesn't seem to be
a problem (but I haven't tested much either);
still, the lack of Yy/Nn is a big problem (I opened bug #305 for it)

So; ar_TN and ar_YE should be changed to match either the format in ar_SA, or
the format in another arabic locale (eg: ar_EG)

Comment 6 Petter Reinholdtsen 2004-08-08 18:46:46 UTC

If the LC_MESSAGES part of ar_TN and ar_YE should be identical to
another locale, they should use the 'copy' statement to fetch
that locales content verbatim.  Patches are most welcome, from
someone understanding arabic. :)

Comment 7 Munzir Taha 2004-08-10 20:59:40 UTC

I believe this is a duplicate of bug 71. We have only one bug that would be 
solved if you set all the arabic locales to use the same regex which 
is ^[نyY].* for the yesexpr and ^[لnN].* for the noexpr. Simple? ;)

Comment 8 Petter Reinholdtsen 2004-08-10 22:21:30 UTC

Comment on attachment 38 [details]
Patch for ar_TN and ar_YE.

This patch is now in CVS.

Comment 9 Petter Reinholdtsen 2004-08-10 22:25:28 UTC

Created attachment 161 [details]
Improve yesexpr/noexpr for all arabic locales

I suspect you ment that this bug (#71) is a dupliate of bug #305.
I don't think it is, as bug #305 is a generic problem with several
locales, and this bug talks about the arabic locales only.

I made a patch to change the yesexpr/noexpr of ar_EG to make sure
'Yy' and 'Nn' is included and removed the useles '.*' at the end
of the regexes.  This is the first part of this patch.

The second part is changing all the other arabic locales ar_* to
copy the corrected LC_MESSAGES from ar_EG, to make sure all arabic
locales use the same yesexpr/noexpr.  Is this patch correct?

Comment 10 Munzir Taha 2004-08-11 14:30:12 UTC

Ah! 305 is for all locales, I got it now. 
 
Now, I believe everything is OK. The patch seems OK to me. Thanks a lot. 
 
Mr. Pablo, any comment?

Comment 11 Abdulaziz Al-Arfaj 2004-08-16 08:05:16 UTC

Regarding the questions:

(1) should there be different regex-es for different arabic locales, 
(2) if not, how should the regex-es look and 
(3) which locale should be the "authorative" locale

I _think_ the answers should be:

(1) A definite no.
(2) This probably needs discussion.
(3) None of them.

Hope this is enough information from one person :)

Comment 12 Ulrich Drepper 2005-10-15 01:27:27 UTC

I fixed the locales I saw.  Open new bugs for additional locales which need to
change.