This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v4] Use a proper C tokenizer to implement the obsolete typedefs test.


On Thu, Mar 14, 2019 at 9:00 AM Carlos O'Donell <carlos@redhat.com> wrote:
>
> On 3/13/19 6:16 PM, Joseph Myers wrote:
> > I'm seeing failures from build-many-glibcs.py for
> > resource/check-obsolete-constructs:
> >
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3198: ordinal not in range(128)
> >
> > This is with LC_ALL=C (and bits/resource.h headers containing UTF-8 µ in a
> > comment).

This did not happen in my build-many-glibcs run, possibly because I’m
running it in a UTF-8 locale.  Should build-many-glibcs perhaps be
setting LC_ALL=C for all subprocesses?

As an immediate fix, I am going to commit a patch to
check-obsolete-constructs that specifies encoding="utf-8" since that’s
what we have in header files right now.

> > There is also a case that the encoding specified should be
> > ASCII - that installed headers should be required to be pure ASCII so they
> > can be included in source files with any ASCII-compatible character set if
> > compiling with -finput-charset= (which affects included headers as well as
> > the main source file, so compiling "#include <sys/resource.h>" with
> > -finput-charset=ascii currently fails).
>
> Do we have a requirement that #incldue <sys/resources.h> be compilable with
> -finput-charset=ascii?

I think a requirement that our installed header files be compilable
with *any* valid setting of -finput-charset= by application Makefiles
is reasonable (or, in other words, all installed header files should
use only the basic source character set).  This is technically a
stronger constraint than requiring -finput-charset=ascii to work, but
in practice I think testing against -finput-charset=ascii would be
sufficient.

I think it’s a bug in GCC that -finput-charset=ascii causes an error
for non-ASCII characters inside comments, but there have been so many
releases with that bug that we have to cope.

A counterargument is that clang apparently only implements
-finput-charset=utf-8; *any other value* is rejected.  That this was
considered adequate Makefile compatibility for the feature, strongly
suggests that nobody is using any other extended source character set
and we should be OK to continue using UTF-8 in installed headers, at
least in comments.

Whatever we do should be enforced by some test or other.  It might be
more appropriate to add it to check-installed-headers.sh than
check-obsolete-constructs.py, though.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]