This is the mail archive of the
mailing list for the glibc project.
Re: RFC: IDN support in getaddrinfo().
- From: Simon Josefsson <jas at extundo dot com>
- To: libc-alpha at sources dot redhat dot com
- Date: Mon, 08 Mar 2004 17:14:57 +0100
- Subject: Re: RFC: IDN support in getaddrinfo().
- References: <email@example.com> <3FC41F79.firstname.lastname@example.org><email@example.com> <3FC47A3B.firstname.lastname@example.org><email@example.com> <3FC4E691.firstname.lastname@example.org><email@example.com> <3FC4F244.firstname.lastname@example.org><email@example.com><firstname.lastname@example.org> <3FC50672.email@example.com><firstname.lastname@example.org><email@example.com> <404BF3EF.firstname.lastname@example.org>
Ulrich Drepper <email@example.com> writes:
> Simon Josefsson wrote:
>> Here's a patch that implement the dlopen approach for loading libidn.
>> Not exactly the way I originally intended, but it seem rather simple.
> Believe it or not, an hour before your patch arrived I started working
> on this myself.
> Well, I took your update code but had to make a number changes:
> ~ several files were missing. I've added them from the old version.
That's fine. Btw, guni*.h and rfc3454.c are generated, the Perl
script to generate them, and the up-stream data files are available as
part of libidn. It doesn't seem necessary to add them to libc, though
(the files haven't changed in a long time, IIRC), unless you really
> ~ the toutf8 code supports glibc now correctly. Note, the non-glibc
> code isn't really right. You cannot use setlocale() if the program
> hasn't done it by itself. And I don't think using the CHARSET envvar
> is wise either. Just require the application to chose the right
> locale. This is what the glibc code now does.
Libidn used to use this approach, IIRC. I changed it rather long time
ago, and haven't received negative reports about it, but I agree it
might not be ideal. Some discussion below, comments appreciated.
Note that my code doesn't modify the locale. It queries the current
locale and save that, then reset the locale to the system default
(locale==NULL), then get the charset used for the system locale, and
finally reset the locale to the saved locale, i.e. the locale chosen
by the application (if any).
If both the system and application use the same locale (the normal
case), I believe both approaches result in the same behaviour.
My approach work well in those situations where an application receive
strings from the system (on the command line, via stdin, etc) but was
started in another locale. Consider this (the terminal uses
$ LANG=sv_SE.ISO-8859-1 bash
$ LANG=sv.SE.UTF-8 some-program-calling-idn-getaddrinfo räksmörgås
If some-program-calling-idn-getaddrinfo calls setlocale (LC_CTYPE, "")
then nl_langinfo (CODESET) will return UTF-8. But command line
parameters, or data received via stdin, will be encoded in ISO-8859-1.
I guess the alternative is to require the program to discover this
situation itself, and to convert the command line parameter to the
then-current locale. I have never seen any program do this.
I agree regarding the CHARSET variable. It should ideally not be
used. The reason for it is that many systems have bad locale
configurations, and some systems doesn't even have sufficient locale
support (I'm told OpenBSD fall into the latter category, if anyone
cares). I wanted libidn to be useful, with some minimal tweaking,
even on those platforms.
> ~ various cleanups. So far only warnings. The only other file I looked
> at more then briefly is toutf8. Note the changes I've made, e.g., to
> prevent memory leaks or handle memory allocation errors. Also, the
> memory allocation is more likely to allocate an output buffer which is
> large enough.
I've merged back the two warning fixes, thanks.
Most of the toutf8.c changes are not portable to C89, so I can't use
them. However, I'll will take a look at what you've done and try to
come up with something clean and portable. I'll send a patch for this
> ~ in the libc stub, I've added code to prevent using libcidn from being
> used if the entire name is ASCII. This is a valid assumption since
> all locale encodings must be ASCII safe.
> ~ also, libcidn gets loaded only once. There is no reason to unload the
> object except when the process terminates and complete cleanup is
> requested. Look at the file.
> ~ since libidn is an add-on the use in getaddrinfo() is conditional.
> I've added the appropriate configure magic.
All fine by me.
> Let's get the CANONIDN stuff implemented so that we're done.
I'll have to experiment some with it to become more convinced the idea
doesn't have any unexpected problems (I have not implemented it yet,
so I don't have any real experience with it). I wouldn't mind if you
or someone else add it, before I manage to send the patch, though.