This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651


Joseph Myers <joseph@codesourcery.com> さんはかきました:

> On Fri, 26 Jan 2018, Mike FABIAN wrote:
>
>> > In the various cases where collation data has been changed locally since 
>> > the previous import from ISO 14651, are those local changes all obsoleted 
>> > by subsequent changes to the ISO 14651 collation data?
>> 
>> The improvements mentioned in this comment in the old iso14651_t1_common
>> file are obsoleted by the new file:
>
> It's entirely plausible that comment is far from exhaustive; reviewing 
> changes from the git logs for that file (and iso14651_t1 from which it was 
> split out) would be a good idea as well.  But knowing those listed in the 
> comment are obsoleted is a very good start.

I went to the git log a while ago and could not find anything interesting except
adding scripts which are now included anyway. But now I looked again
and found

commit b05eca0e1d96aecb25516287913c54bbb93d3d92
Author: Santhosh Thottingal <santhosh.thottingal@gmail.com>
Date:   Sun Jun 11 10:08:37 2017 -0400

    Correct collation rules for Malayalam.
    
            [BZ #19922]
            * locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.
    
            [BZ #19919]
            * locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.

which I overlooked.

I’ll make sure that Malayalam sorts correctly as well in the next
version of my patches.

> How automated is the process of getting from the ISO 14651 data to what 
> will be the new version of this file (and iso14651_t1_pinyin, I'm not sure 
> how that relates to ISO 14651 data either)?  It's desirable to have a 
> clear definition, as automated as possible, of how the glibc file relates 
> to the original source, to reduce the manual work involved in making 
> future updates.

The iso14651_t1_pinyin file is not related, I doubt that it has anything
to do with ISO.

The process is not automated at the moment, but I documented the
changes in these patches:

        0002-Necessary-changes-after-updating-the-iso14651_t1_com.patch
        0003-iso14651_t1_common-U-0-9A-F-0-9A-F-0-9A-F-0-9A-F-0-9.patch
        0004-Fixing-syntax-errors-after-updating-the-iso14651_t1_.patch
        0007-Add-convenience-symbols-like-AFTER-A-BEFORE-A-to-iso.patch
        0008-iso14651_t1_common-make-the-fourth-level-the-codepoi.patch

Some of this is hard to automate, especially the strange syntax errors
which were in the file from ISO.

I’ll update to the newer version of the file from ISO which Carlos
found in the next version of my patch, let’s see whether the new version
also has these errors.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]