This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: locale-source validation script
- From: Mike FABIAN <mfabian at redhat dot com>
- To: Zack Weinberg <zackw at panix dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>, Mike FABIAN <maiku dot fabian at gmail dot com>, Rafal Luzynski <digitalfreak at lingonborough dot com>, "Carlos O'Donell" <carlos at redhat dot com>, Florian Weimer <fweimer at redhat dot com>
- Date: Wed, 26 Jul 2017 14:35:28 +0200
- Subject: Re: RFC: locale-source validation script
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mfabian at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8EBF2883D9
- References: <CAKCAbMj54Dq5gAvKrC1JXLU9fknoDVjbtK0iJutMvqhUOsjVhA@mail.gmail.com>
Zack Weinberg <zackw@panix.com> wrote:
[...]
> It seems to me that this sort of check is not something that humans
> should have to do by eye; rather, it's a job for a linter. So I wrote
> one. :) It currently looks for "inappropriate" escape sequences and
> characters, using a quite strict notion of "inappropriate"; for
> strings that are not in Unicode Normalization Form C; and for strings
> that cannot be transcoded to the legacy charset for the locale (as
> defined by a "% Charset: xxx" annotation in the file - note that not
> all the files have such annotations).
>
> It is not ready for prime time; it is very slow (Python isn't really
> designed to go character-by-character through a file; it can probably
> be sped up with a cleverer lexer) and it finds a whole bunch of
> existing errors, some of which may not actually be _problems_, if you
> see what I mean. I've attached the script and the result of running
> it over all of the files in localedata/locales/. But it's ready for
> people to poke at.
Great!
I’m fixing the warnings your script reports.
> - The complaints about "inappropriate character '\t'" are all caused
> by _unintentional_ tabs inside strings. If you write
>
> message "xyz/
> abc"
It is certainly a good thing to fix these.
> is not what you want. The linter currently only detects this when
> that indentation is done with tabs, but I think it should probably
> detect spaces as well. If you _mean_ to put a tab in a string write
> <U0009>. :-)
--
Mike FABIAN <mfabian@redhat.com>