This is the mail archive of the
mailing list for the glibc project.
Re: RFC: IDN support in getaddrinfo().
- From: Simon Josefsson <jas at extundo dot com>
- To: Ulrich Drepper <drepper at redhat dot com>
- Cc: libc-alpha at sources dot redhat dot com
- Date: Wed, 26 Nov 2003 10:19:20 +0100
- Subject: Re: RFC: IDN support in getaddrinfo().
- References: <firstname.lastname@example.org> <3FC41F79.email@example.com>
Thanks for your thoughts.
Ulrich Drepper <firstname.lastname@example.org> writes:
> Simon Josefsson wrote:
>> Continuing an old thread regarding support for Internationalized
>> Domain Names (IDN) in glibc, prompted by the adaption of my glibc
>> patches by developers from some Linux distributions, I'd like to
>> formalize my ideas in a proposal for extending the getaddrinfo() API.
> The problem I have with this is: we do not have the idn code in glibc.
> It is big, and changing, which makes me not wanting to add this. And
> getaddrinfo is core functionality. Requiring some external code for it
> to work is undesirable. The interface might change or whatever other
> incompatibilities can arise. This is highly unpleasant.
Right. Still, the specifications are not likely to change at this
point, and if they do it wouldn't be over night, but rather take
years. The are published RFCs on Proposed Standard level, and is
currently being revised for Draft Standard level (only editorial
changes). So the code changes at this point are bug fixes, or feature
additions unrelated to IDN (which could be stripped out of libc).
I have to admit the libidn API has been changing somewhat, but mostly
the reason has been my own inexperience in designing good C APIs.
Although if libidn was part of libc, I believe it would be best to
only advertise the getaddrinfo interface of it, and wait a year or two
until all the libidn APIs are exported (if ever). Having the libidn
API available via libc have some benefits though, because there are
many programs that will need stringprep functionality in the future
(e.g., iSCSI, XMPP instant messaging, Kerberos, SASL).
Applications need to explicitly ask for IDN functionality, so it is
not something that would likely get in the way of existing code, too.
I have been thinking about a dlopen() approach, to reduce the code
size in libc. E.g., the application requests IDN, then libc try to
dlopen("idn"). The libc IDN code patch would only amount to, say,
less than 100 lines. Any thoughts on this? Is it feasible at all?
> I do see that this form of the interface is nice. So my questions are:
> ~ do you need all of the libidn interface to implement the suggested
> getaddrinfo extension?
There are some functions that aren't called, but they don't contribute
any substantial code size. The minimum amount of APIs required are 2
(for punycode encode/decode) + 1 (stringprep) + 2 (IDNA ToASCII and
ToUnicode) = 5, but some utility functions to convert between UTF-8
and UCS-4 are used internally, so make it ~10 API functions. (Perhaps
those functions already exist elsewhere in libc though?)
The current libidn API consists of 27 functions. Most of the
additional functions are wrappers around the core functions that take
input in locale or UCS-4 format, and convert output to locale or UCS-4
Basically, five separate functionalities are needed to implement IDN:
charset conversion (locale->UTF-8, UTF->UCS-4, etc), punycode, unicode
NFKC normalization, stringprep and nameprep.
Libidn currently support non-IDN related stringprep profiles as well,
but they re-use the IDN-related stringprep tables. They add only
about ~20 lines of initialization in a static const table (100-200
> ~ what is the size of the absolutely minimum amount of code (source
> and object file)
Self-contained C89 portable source code, only external requirement is
iconv and nl_langinfo (the lib/ directory of libidn):
3544 rfc3454.c GENERATED from rfc3454.txt
9353 gunibreak.h GENERATED from Unicode standard (from Glib)
658 gunicomp.h GENERATED from Unicode standard (from Glib)
10362 gunidecomp.h GENERATED from Unicode standard (from Glib)
286 idn-int.h GENEREATED by autoconf to get 'uint32_t'.
Most of the large files are generated, here are the "real" files:
jas@latte:~/src/libidn/lib$ wc -l idna.c nfkc.c profiles.c punycode.c stringprep.c toutf8.c idna.h punycode.h stringprep.h
Also note that the files are heavily commented -- the manual is (in
parts) generated from the source code.
Here are the object sizes for Libidn built on GNU/Linux with GCC 3.3.2
-rw-r--r-- 1 jas jas 192272 Nov 26 09:47 libidn.a
-rw-r--r-- 1 jas jas 5984 Nov 26 09:47 idna.o
-rw-r--r-- 1 jas jas 91520 Nov 26 09:47 nfkc.o
-rw-r--r-- 1 jas jas 7248 Nov 26 09:47 profiles.o
-rw-r--r-- 1 jas jas 2768 Nov 26 09:47 punycode.o
-rw-r--r-- 1 jas jas 76570 Nov 26 09:47 rfc3454.o
-rw-r--r-- 1 jas jas 4540 Nov 26 09:47 stringprep.o
-rw-r--r-- 1 jas jas 2784 Nov 26 09:47 toutf8.o
-rw-r--r-- 1 jas jas 1648 Nov 26 09:47 version.o
As you can see the only significant parts are the Unicode NFKC
normalization tables and the RFC 3454 tables. The Unicode NFKC
normalization come from glib, and I haven't investigated how much they
could be optimized in size. I believe the rfc3454 tables could be
optimized considerably, though.
Note that the unicode tables must be Unicode version 3.2, so it is not
something that can be easily re-used from another library or another
part of libc, even if there would be other NFKC tables on the system
> For the encoding conversion code, in glibc you'd have to use the
> glibc-internal interfaces, and not iconv() itself.
Thanks for the pointer.
Hope this helps,