The locales ar_SA, ar_TN and ar_YE have incorrect yesexpr/noexpr values; as a result it is impossible, under those locales, to properly reply to yes/no questions. to test: for i in ar_EG,ar_SA,ar_TN,ar_YE do echo $i LC_ALL=$i locale -c yesexpr noexpr done (ar_EG is correct, so you can see a model) all three ar_SA, ar_TN, ar_YE add two letters (<U0639> and <U0645>) to yesexpr and one letter (<U0627>) to noexpr. I don't know if those additions should be done on all ar_* or not, but the problem is not that; the problem is that ar_SA uses parenthesis instead of square brackets, and the other two use nothing! Also, I think that *all* ar_* locales should also add "Yy" to yesexpr and "Nn" to noexpr, in order to provide compatibility support (think that a lot of tools, particularly comand line ones, aren't arabized yet, so the question will be in english, and people will tend to reply in English to such questions; with the current versions the replies will fail, as the locale only accept arabic letters)
Looking at the ar_SA locale, it seem to be using the paranteses for grouping, and the 'or' notation to match (pseudo-notation, converted to ASCII where possible): yesexpr "^(<U0646>|<U0646><U0639><U0645>)" noexpr "^(<U0644>|<U0644><U0627>)" This seem to me like a valid regex, matching a sequence of chars at the start of the string. Is this legal for locales? I'm not sure. The ar_TN and ar_YE locales are obvoiusly wrong. Adding '^[' at the start of the regex and ']' at the end is required for this regex to match properly. I'll make a patch for this. I'm not sure if all locales should include 'Yy' and 'Nn' in the regex.
I strongly think that Yy and Nn should be included in all locales (unless they conflict with the native letters, which happens for 2 or 3 languages at most). Here is why: [ru@test ru]$ touch foo [ru@test ru]$ LANGUAGE=ar LC_ALL=ar_SA rm -i foo rm: remove regular empty file `foo'? y [ru@test ru]$ ls foo foo AS you can see, replying "y" is ignored in Arabic locale; but the question messages is not translated, the question is done in English (and there are plenty of such cases). People tend to reply in English (eg: with the "Y" (or "N") key) when the question is not localized. Maybe it is not stricto sensu a bug, but it is a very big annoyance, that could be easily avoided.
Created attachment 38 [details] Patch for ar_TN and ar_YE.
The patch fo rar_TN and ar_YE is now included in CVS. I believe the strange format of the yesexpr and noexpr in ar_SA should be reported as a separate bug if the regexp always should match single characters and not sequence of chars. I also believe the request that all yesexpr/noexpr patterns should include 'YyNn' if possible should be reported as a separate bug, as it is unrelated to the other issues. Should this bug be closed, and new bugs be opened? I leave that to reporter to decide.
The patch for the ar_TN and ar_YE is probably wrong; in fact if you look at the ar_SA locale you see that what is present in ar_TN and ar_YE is the word "n-gh-m" in the second part of the regexp in ar_SA, so it is probably not three separate letters. The format of the regexp (either only letters, eg "^[abcd]" or with words, eg "^(a|b|c|d|afoo|bbar|cfoobar|dtoto)" doesn't seem to be a problem (but I haven't tested much either); still, the lack of Yy/Nn is a big problem (I opened bug #305 for it) So; ar_TN and ar_YE should be changed to match either the format in ar_SA, or the format in another arabic locale (eg: ar_EG)
If the LC_MESSAGES part of ar_TN and ar_YE should be identical to another locale, they should use the 'copy' statement to fetch that locales content verbatim. Patches are most welcome, from someone understanding arabic. :)
I believe this is a duplicate of bug 71. We have only one bug that would be solved if you set all the arabic locales to use the same regex which is ^[نyY].* for the yesexpr and ^[لnN].* for the noexpr. Simple? ;)
Comment on attachment 38 [details] Patch for ar_TN and ar_YE. This patch is now in CVS.
Created attachment 161 [details] Improve yesexpr/noexpr for all arabic locales I suspect you ment that this bug (#71) is a dupliate of bug #305. I don't think it is, as bug #305 is a generic problem with several locales, and this bug talks about the arabic locales only. I made a patch to change the yesexpr/noexpr of ar_EG to make sure 'Yy' and 'Nn' is included and removed the useles '.*' at the end of the regexes. This is the first part of this patch. The second part is changing all the other arabic locales ar_* to copy the corrected LC_MESSAGES from ar_EG, to make sure all arabic locales use the same yesexpr/noexpr. Is this patch correct?
Ah! 305 is for all locales, I got it now. Now, I believe everything is OK. The patch seems OK to me. Thanks a lot. Mr. Pablo, any comment?
Regarding the questions: (1) should there be different regex-es for different arabic locales, (2) if not, how should the regex-es look and (3) which locale should be the "authorative" locale I _think_ the answers should be: (1) A definite no. (2) This probably needs discussion. (3) None of them. Hope this is enough information from one person :)
I fixed the locales I saw. Open new bugs for additional locales which need to change.