Summary: | [PATCH] All Arabic locales should use the same yesexpr/noexpr | ||
---|---|---|---|
Product: | glibc | Reporter: | Pablo Saratxaga <pablo> |
Component: | localedata | Assignee: | Petter Reinholdtsen <pere> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | munzirtaha |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.3.3 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Bug Depends on: | |||
Bug Blocks: | 305 | ||
Attachments: |
Patch for ar_TN and ar_YE.
Improve yesexpr/noexpr for all arabic locales |
Description
Pablo Saratxaga
2004-03-10 19:35:12 UTC
Looking at the ar_SA locale, it seem to be using the paranteses for grouping, and the 'or' notation to match (pseudo-notation, converted to ASCII where possible): yesexpr "^(<U0646>|<U0646><U0639><U0645>)" noexpr "^(<U0644>|<U0644><U0627>)" This seem to me like a valid regex, matching a sequence of chars at the start of the string. Is this legal for locales? I'm not sure. The ar_TN and ar_YE locales are obvoiusly wrong. Adding '^[' at the start of the regex and ']' at the end is required for this regex to match properly. I'll make a patch for this. I'm not sure if all locales should include 'Yy' and 'Nn' in the regex. I strongly think that Yy and Nn should be included in all locales (unless they conflict with the native letters, which happens for 2 or 3 languages at most). Here is why: [ru@test ru]$ touch foo [ru@test ru]$ LANGUAGE=ar LC_ALL=ar_SA rm -i foo rm: remove regular empty file `foo'? y [ru@test ru]$ ls foo foo AS you can see, replying "y" is ignored in Arabic locale; but the question messages is not translated, the question is done in English (and there are plenty of such cases). People tend to reply in English (eg: with the "Y" (or "N") key) when the question is not localized. Maybe it is not stricto sensu a bug, but it is a very big annoyance, that could be easily avoided. Created attachment 38 [details]
Patch for ar_TN and ar_YE.
The patch fo rar_TN and ar_YE is now included in CVS. I believe the strange format of the yesexpr and noexpr in ar_SA should be reported as a separate bug if the regexp always should match single characters and not sequence of chars. I also believe the request that all yesexpr/noexpr patterns should include 'YyNn' if possible should be reported as a separate bug, as it is unrelated to the other issues. Should this bug be closed, and new bugs be opened? I leave that to reporter to decide. The patch for the ar_TN and ar_YE is probably wrong; in fact if you look at the ar_SA locale you see that what is present in ar_TN and ar_YE is the word "n-gh-m" in the second part of the regexp in ar_SA, so it is probably not three separate letters. The format of the regexp (either only letters, eg "^[abcd]" or with words, eg "^(a|b|c|d|afoo|bbar|cfoobar|dtoto)" doesn't seem to be a problem (but I haven't tested much either); still, the lack of Yy/Nn is a big problem (I opened bug #305 for it) So; ar_TN and ar_YE should be changed to match either the format in ar_SA, or the format in another arabic locale (eg: ar_EG) If the LC_MESSAGES part of ar_TN and ar_YE should be identical to another locale, they should use the 'copy' statement to fetch that locales content verbatim. Patches are most welcome, from someone understanding arabic. :) I believe this is a duplicate of bug 71. We have only one bug that would be solved if you set all the arabic locales to use the same regex which is ^[نyY].* for the yesexpr and ^[لnN].* for the noexpr. Simple? ;) Comment on attachment 38 [details]
Patch for ar_TN and ar_YE.
This patch is now in CVS.
Created attachment 161 [details] Improve yesexpr/noexpr for all arabic locales I suspect you ment that this bug (#71) is a dupliate of bug #305. I don't think it is, as bug #305 is a generic problem with several locales, and this bug talks about the arabic locales only. I made a patch to change the yesexpr/noexpr of ar_EG to make sure 'Yy' and 'Nn' is included and removed the useles '.*' at the end of the regexes. This is the first part of this patch. The second part is changing all the other arabic locales ar_* to copy the corrected LC_MESSAGES from ar_EG, to make sure all arabic locales use the same yesexpr/noexpr. Is this patch correct? Ah! 305 is for all locales, I got it now. Now, I believe everything is OK. The patch seems OK to me. Thanks a lot. Mr. Pablo, any comment? Regarding the questions: (1) should there be different regex-es for different arabic locales, (2) if not, how should the regex-es look and (3) which locale should be the "authorative" locale I _think_ the answers should be: (1) A definite no. (2) This probably needs discussion. (3) None of them. Hope this is enough information from one person :) I fixed the locales I saw. Open new bugs for additional locales which need to change. |