This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: IDN support in getaddrinfo().

Hash: SHA1

Simon Josefsson wrote:

> That's fine.  Btw, guni*.h and rfc3454.c are generated, the Perl
> script to generate them, and the up-stream data files are available as
> part of libidn.  It doesn't seem necessary to add them to libc, though

It is necessary since required by the GPL.

>>~ the toutf8 code supports glibc now correctly.  Note, the non-glibc
>>  code isn't really right.  You cannot use setlocale() if the program
>>  hasn't done it by itself.  And I don't think using the CHARSET envvar
>>  is wise either.  Just require the application to chose the right
>>  locale.  This is what the glibc code now does.
> Libidn used to use this approach, IIRC.  I changed it rather long time
> ago, and haven't received negative reports about it, but I agree it
> might not be ideal.  Some discussion below, comments appreciated.

I'm strongly opposed to adding this hack.  Using exactly what the locale
says is the right way.  In glibc it even allows thread-specific encodings.

> Note that my code doesn't modify the locale.  It queries the current
> locale and save that, then reset the locale to the system default
> (locale==NULL), then get the charset used for the system locale, and
> finally reset the locale to the saved locale, i.e. the locale chosen
> by the application (if any).

Ehm, think about threads.  The locale is shared by all threads.  You
temporarily change it in one thread.

> If both the system and application use the same locale (the normal
> case), I believe both approaches result in the same behaviour.

Well, no.  In my first test program still using your code in toutf8 I
didn't add a setlocale() call in my test program.  It still magically
and unexpected in retrospect worked.  The implicit
setlocale(LC_CTYPE,"") call is not OK.

> My approach work well in those situations where an application receive
> strings from the system (on the command line, via stdin, etc) but was
> started in another locale.  Consider this (the terminal uses
> ISO-8859-1):
> $ LANG=sv_SE.ISO-8859-1 bash
> $ LANG=sv.SE.UTF-8 some-program-calling-idn-getaddrinfo rÃksmÃrgÃs

This is a user error.  And why should this be any better?  The program
calls setlocale(LC_ALL,"") at the beginning, setting the UTF-8 locale.
Then inside stringprep_locale_charset_slow() you run
setlocale(LC_CTYPE,"") which has the same result.

If in your example you'd set CHARSET=ISO-8859-1 before starting the
application it would work.  But only for the strings for the command line.

This is nothing to account for since it does not really make things
better.  The locale mustn't change between program and shell.

> The reason for it is that many systems have bad locale
> configurations, and some systems doesn't even have sufficient locale
> support (I'm told OpenBSD fall into the latter category, if anyone
> cares).

The problem is that things like this tend to spill over to sane systems
as well.  Somebody who for some reason developed on a system with the
limitation misdesigned the program around this feature.  I don't say
remove it right away.  Maybe change the CHARSET into something which
shows it is a hack.

- -- 
â Ulrich Drepper â Red Hat, Inc. â 444 Castro St â Mountain View, CA â
Version: GnuPG v1.2.3 (GNU/Linux)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]