This is the mail archive of the
mailing list for the glibc project.
RFC: IDN support in getaddrinfo().
- From: Simon Josefsson <jas at extundo dot com>
- To: libc-alpha at sources dot redhat dot com
- Date: Sun, 12 Oct 2003 00:04:51 +0200
- Subject: RFC: IDN support in getaddrinfo().
Continuing an old thread regarding support for Internationalized
Domain Names (IDN) in glibc, prompted by the adaption of my glibc
patches by developers from some Linux distributions, I'd like to
formalize my ideas in a proposal for extending the getaddrinfo() API.
I appreciate comments on the following document from people familiar
with the getaddrinfo API, and people with insight in the POSIX
standardization process. Is something like this suitable for
standardization? For inclusion in glibc?
Thoughts about alternative approaches, such as extending hostname(),
or a wchar_t API, are also welcome.
PS. The document lives in
Libidn getaddrinfo-idn.txt -- Proposal for IDN support in POSIX getaddrinfo.
Copyright (C) 2003 Simon Josefsson
See the end for copying conditions.
Libidn is a package for internationalized string handling based on the
Stringprep, Punycode and Internationalized Domain Names in
Applications (IDNA) specifications. It can be used by applications
directly by linking to it, as is done by, e.g., Gnus, KDE, and Mutt.
Having each and every application link with and perform its own IDN
handling is not a good idea. It bloats the code and makes things
unnecessarily complex. Only few applications, such as web browsers
and mail clients, will need to do this in the future, to provide good
user interfaces for internationalization.
See http://josefsson.org/libidn/ for more information.
There are implementation that modify gethostbyname() to accept Unicode
strings, and even implementations that assume gethostbyname, on the
client host, send Unicode strings out to the DNS server, and perform
the IDN-conversion on the local DNS-server.
Some doubts can be raised whether this is an approach that is likely
to be standardized. It also lack in functionality: it only provide
black-box ToASCII functionality. The application cannot extract the
output from the ToASCII operation. More important, there is no way to
perform a ToUnicode operation that applications may want to use.
See also the thread rooted in <firstname.lastname@example.org>
posted to email@example.com on 08 Jan 2003.
What I propose
The getaddrinfo() API should have two new flags, AI_IDN and
AI_CANONIDN. Roughly they correspond to IDNA ToASCII and IDNA
ToUnicode, but there are several details.
An application that uses AI_IDN signal to the getaddrinfo()
implementation that the input host name may be UTF-8 and that the
appropriate IDNA ToASCII steps should be carried out on the input, and
the output from the ToASCII operation (if any) should be used in the
lookup using the current resolver processing.
An application that uses AI_CANONIDN signal to the getaddrinfo()
implementation that the input host name should be put through the IDNA
ToUnicode steps, and the output of that placed in the 'ai_canonname'
field of the resulting structure. Normal resolver processing applies
to the input string, of course.
Consequently, an application that uses AI_IDN|AI_CANONIDN signal to
the getaddrinfo() implementation that the input host name may be in
UTF-8 and should be put through the IDNA ToASCII steps before run
through the resolver, and that the input string should also be run
through the IDNA ToUnicode steps and the output of that placed in the
The semantics of AI_CANON|AI_CANONIDN is that instead of running the
ToUnicode IDNA steps on the input string, the canonical host name as
returned by the resolver for the input string should be used in the
ToUnicode IDNA step.
The AI_IDN flag has been implemented and shipped as a proof-of-concept
patch for GNU Libc with GNU Libidn since January 2003, binary packages
for at least two Linux distributions exists. The AI_CANONIDN flag is
not yet implemented.
This document is a work-in-progress and the details may change.
Contact me at firstname.lastname@example.org to discuss changes.
Permission is granted to anyone to make or distribute verbatim copies
of this document, in any medium, provided that the copyright notice
and permission notice are preserved, and that the distributor grants
the recipient permission for further redistribution as permitted by
this notice. Modified versions may not be made.