This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: locale-source validation script

From: Mike FABIAN <mfabian at redhat dot com>
To: Zack Weinberg <zackw at panix dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>, Mike FABIAN <maiku dot fabian at gmail dot com>, Rafal Luzynski <digitalfreak at lingonborough dot com>, "Carlos O'Donell" <carlos at redhat dot com>, Florian Weimer <fweimer at redhat dot com>
Date: Wed, 26 Jul 2017 14:35:28 +0200
Subject: Re: RFC: locale-source validation script
Authentication-results: sourceware.org; auth=none
Authentication-results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=mfabian at redhat dot com
Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8EBF2883D9
References: <CAKCAbMj54Dq5gAvKrC1JXLU9fknoDVjbtK0iJutMvqhUOsjVhA@mail.gmail.com>

Zack Weinberg <zackw@panix.com> wrote:

[...]

> It seems to me that this sort of check is not something that humans
> should have to do by eye; rather, it's a job for a linter.  So I wrote
> one. :)  It currently looks for "inappropriate" escape sequences and
> characters, using a quite strict notion of "inappropriate"; for
> strings that are not in Unicode Normalization Form C; and for strings
> that cannot be transcoded to the legacy charset for the locale (as
> defined by a "% Charset: xxx" annotation in the file - note that not
> all the files have such annotations).
>
> It is not ready for prime time; it is very slow (Python isn't really
> designed to go character-by-character through a file; it can probably
> be sped up with a cleverer lexer) and it finds a whole bunch of
> existing errors, some of which may not actually be _problems_, if you
> see what I mean.  I've attached the script and the result of running
> it over all of the files in localedata/locales/.  But it's ready for
> people to poke at.

Great!

I’m fixing the warnings your script reports.

> - The complaints about "inappropriate character '\t'" are all caused
> by _unintentional_ tabs inside strings.  If you write
>
> message "xyz/
>          abc"

It is certainly a good thing to fix these.


> is not what you want.  The linter currently only detects this when
> that indentation is done with tabs, but I think it should probably
> detect spaces as well.  If you _mean_ to put a tab in a string write
> <U0009>. :-)

-- 
Mike FABIAN <mfabian@redhat.com>

References:
- RFC: locale-source validation script
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]