This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [BZ 17588 13064] Update UTF-8 charmap and width to Unicode 7.0.0


On Dec  8, 2014, Mike FABIAN <mfabian@redhat.com> wrote:

> I changed gen-unicode-ctype.py mostly according to your suggestions

Thanks!

> Alexandre Oliva <aoliva@redhat.com> ãããããããã:

>> - I'm not sure it's wise for fill_attributes to load the entire file
>> into memory just to be able to index the lines in an array.  It doesn't
>> look like reading the input file line by line would make the code worse.

> https://github.com/pravins/glibc-i18n/commit/7ef5161898f54d2d66bb25f898310c68bc2d6577

In fill_attributes and fill_derived_core_properties, any reason to not
simplify:

    with open(...) as ..._file:
        ...1... # doesn't refer to ..._file
        for line in ..._file:
            ...2... # doesn't refer to ..._file

  to:

    ...1...
    for line in open(...):
        ...2...

  ?


>> - It's not obvious that is_alpha in the script, based on derived
>> properties, is equivalent to the many conditions tested in the C
>> program.  Is there any other script that checks their equivalence?

> It is *not* supposed to be equivalent.

> 	      /* Consider all the non-ASCII digits as alphabetic.
> 		 ISO C 99 forbids us to have them in category "digit",
> 		 but we want iswalnum to return true on them.  */

> which seems to make sense, therefore I kept that in is_alpha()
> in gen-unicode-ctype.py.

*nod*.  Speaking of which...  There are at least four occurrences of the
test for code_points in the '0'..'9' range.  Would it make sense to
factor them all out into a single function?


There are a few uses of âif 0:â that IMHO wouldn't hurt the eye as much
;-) if written âif False:â

There's at least one occurrence of '%s...1...'%...2... that might be
more efficiently written as ...2...+'...1...'.

len(a+b) is probably more efficient if written as len(a)+len(b); there
are at least two occurrences of the former.

IIRC the âverificationsâ function is exceeding the complexity limit set
by our pylintrc, and one of the scripts is exceeding the size limit.

verifications could be simplified by turning each test into a that take
a code_point as argument, perform the test and print a failure message
if appropriate.  verifications would then iterate, for each code_point,
over the list of functions, calling each one in turn.  This would reduce
the complexity, as presumably intended by the set limit.

As for the script size limit, the solution really is modularization.
Consider moving the parsing of UnicodeData.txt and DerivedProperties.txt
each to a separate module, that can then be reused by all scripts that
need to deal with this data.  Even the is_* functions might be turned
into a module of their own, if that makes sense.


Thanks!

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]