This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, I have now implemented all the changes requested for translit_cyrillic file but started hitting what seems like a bug: - If the line <U0425> <U0048>;<U0058> is present in translt_cyrillic the locale compilation fails i.e. grep CYRILLIC < $testfile | LOCPATH=$workdir/compiled_locales/"$locale"/ LC_ALL="$locale".UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT is hanging frozen. - If the line <U0425> <U0048>;<U0058> is absent from translit_cyrillic everything works, just the transliteration of <U0425> fails as expected (? is displayed) - If translit_cyrillic contains <U0425> <U0048>;<U0058> as the _only_ line the transliteration of <U0425> works again (others as ?). Would you have any idea into what direction should I look? The new translit_cyrillic is attached. (<U0425> is % CYRILLIC CAPITAL LETTER HA) Best regards, Egor On 09.10.2018 01:35, Egor Kobylkin wrote: > On 09.10.2018 00:23, Rafal Luzynski wrote: >> 8.10.2018 14:40 Marko Myllynen <myllynen@redhat.com> wrote: >>> Hi, >>> >>> Thanks for the update. I have few mostly cosmetic comments below, >>> hopefully we'll hear from others whether they agree with this direction. >>> > > Yeah, the earlier we have feedback the more productive we are. I'd be > happy to get much feedback on this as early as possible. So please > everybody concerned please chime in. > >> >>> - No duplicates: >>> >>> % CYRILLIC SMALL LETTER IE >>> <U0435> <U0065>; <U0065> >>> >>> should become: >>> >>> % CYRILLIC SMALL LETTER IE >>> <U0435> <U0065> >>> >>> - There are few issues with the definitions: >>> >>> % CYRILLIC CAPITAL LETTER U >>> <U0423> <U0055>; <U0055> >>> % CYRILLIC UNDEFINED >>> <U0423><U0423> <U00DA>; "<U0055><U0060>" >>> >>> % CYRILLIC SMALL LETTER U >>> <U0443> <U0075>; <U0075> >>> % CYRILLIC UNDEFINED >>> <U0443><U0443> <U00FA>; "<U0075><U0060>" >> >> Are the duplicates here because some Cyrillic letters may have multiple >> Latin transliterations depending on the context, for example Cyrillic IE >> must be transliterated sometimes as "e", sometimes as "ie", sometimes >> as "ye" or "je"? Can we provide rules for groups of characters instead? > No, the duplicates are just by design of my line generating logic. I > have fixed (removed) them. The varying transcription between > languages/locales can not be handled in one file at all as far as I > understood. > >> >>> I wonder would it be possible to automate generation of this file so >>> that issues like the above could avoided? But perhaps that could be the >>> next step once this initial patch lands. > > I am generating the content part of the translit_cyrillc from the > LibreOffice Spreadsheet. Not sure if you had time to view it by now? > https://sourceware.org/bugzilla/attachment.cgi?id=11299 > > Anyway I have just fixed the issues identified by Marko above in that > spreadsheet. I will do the changes for the below request and then upload > the new translit_cyrillic file to the bugzilla. > > >>> - Please add the standard glibc locale header (see the existing >>> translit_* files for reference) >>> - Consider wrapping the header lines at or around column 70-72 >>> - Consider describing which characters, character ranges, or blocks are >>> supported (perhaps also describe why some of those are not included, see >>> e.g. https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode) >>> - Please remove trailing whitespaces and spaces after ; >> >> Thanks for this, Marko. While at this, in the ChangeLog and in the commit >> message these paths: >> >> * locales/aa_DJ: likewise >> >> 1. Should be a relative path starting in the root directory of glibc > source, >> that is: "* localedata/locales/aa_DJ". >> 2. Should be "Likewise." (starting with an uppercase and ending with a > dot). > > will do. > > Bests, > Egor >
Attachment:
translit_cyrillic
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |