This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: locale-source validation script


Zack Weinberg <zackw@panix.com> wrote:

[...]

> It seems to me that this sort of check is not something that humans
> should have to do by eye; rather, it's a job for a linter.  So I wrote
> one. :)  It currently looks for "inappropriate" escape sequences and
> characters, using a quite strict notion of "inappropriate"; for
> strings that are not in Unicode Normalization Form C; and for strings
> that cannot be transcoded to the legacy charset for the locale (as
> defined by a "% Charset: xxx" annotation in the file - note that not
> all the files have such annotations).
>
> It is not ready for prime time; it is very slow (Python isn't really
> designed to go character-by-character through a file; it can probably
> be sped up with a cleverer lexer) and it finds a whole bunch of
> existing errors, some of which may not actually be _problems_, if you
> see what I mean.  I've attached the script and the result of running
> it over all of the files in localedata/locales/.  But it's ready for
> people to poke at.

Great!

I’m fixing the warnings your script reports.

> - The complaints about "inappropriate character '\t'" are all caused
> by _unintentional_ tabs inside strings.  If you write
>
> message "xyz/
>          abc"

It is certainly a good thing to fix these.


> is not what you want.  The linter currently only detects this when
> that indentation is done with tabs, but I think it should probably
> detect spaces as well.  If you _mean_ to put a tab in a string write
> <U0009>. :-)

-- 
Mike FABIAN <mfabian@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]