Sourceware Bugzilla – Bug 4671
gethostbyname() cant resolve names starting/ending with "-"
Last modified: 2012-02-21 01:22:08 UTC
gethostbyname() fails to resolve domain names with minus sign at beginning or end of domain name,
in example -kol.deviantart.com, while it can be resolved using host and nslookup.
insa@devel:~$ ping -- -kol.deviantart.com
ping: unknown host -kol.deviantart.com
insa@devel:~$ host -- -kol.deviantart.com
-kol.deviantart.com has address 18.104.22.168
Breif look at linux iputils/ping.c shows that it's using gethostbyname() function.
So i wrote C test example that can be found at http://insa.pp.ru/files/bugs/gethost.c
Tested on Debian 3.1, Debian 4, FreeBSD 5.4. All i386.
Mac OS X not affected.
Such hostnames are invalid, see section 2 of RFC3696.
For hostnames, hyphen can be only in the middle, not at the start of at the end.
(In reply to comment #1)
> Such hostnames are invalid, see section 2 of RFC3696.
> For hostnames, hyphen can be only in the middle, not at the start of at the end.
Well, it's true, but
1) hostnames starting with numeric value are also not valid, but can be resolved via gethostbyname()
(i.e. ping 12345.livejournal.com);
2) We getting odd behaviour on various system. Even worse - on same machine using different tools
(nslookup vs. ping).
I looked at this and saw that not even the latest bind version allows - at the
beginning. If anybody allows it this is likely a side effect of not using the
bind code base. I see no reason to diverge here.
Plus, this could have unwanted effects. If somebody makes a mistake when
specifying a host name a parameter might be mistaken for it. This might even be
So, no, this won't change.
If gethostbyname refuses the invalid name in the below request, why does it
query the DNS (as can be seen with e.g. tcpdump)?
This bug report is valid. The RFC does not prohibit labels that don't start with
a letter; it merely recommends against them. The definition of "label" mentioned
above is part of a guideline introduced with this text:
"The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET)."
Points in favor of supporting domain names that don't necessarily follow those
1. There are actual domains out on the Internet running web sites with such
domain names (several blogs at blogspot.com spring to mind)
2. Such domains resolve on other OSes (not just Windows, but also OS X).
3. Direct DNS queries (dig, host, nslookup) on Linux work fine with such names.
4. Even gethostbyname() on Linux works for such names when they're in the local
/etc/hosts file. Possibly in NIS maps as well.
Point 4 is especially telling; I don't see any reason for gethostbyname() to
introduce a restriction between two interfaces when that both operate correctly
without the restriction. Especially not a restriction that prevents access to
actual web sites. Telling users that "The owner of that site shouldn't have
named it that" is not helpful.
"The DNS itself places only one restriction on the particular labels that can be used to identify resource
records. That one restriction relates to the length of the label and the full name. [...] Those restrictions
aside, any binary string whatever can be used as the label of any resource record."
-- RFC 2181, section 11
RFC3696, section 2 verifies this: "Any characters, or combination of bits (as octets), are permitted in
DNS names." Then it describes how the old ARPANET rules worked. But we moved beyond those
rules a long time ago. Just look at the international domain names.
(In reply to comment #6)
> "The DNS itself places only one restriction on the particular labels that can
> be used to identify resource
> records. That one restriction relates to the length of the label and the full
> name. [...] Those restrictions
> aside, any binary string whatever can be used as the label of any resource
> -- RFC 2181, section 11
> RFC3696, section 2 verifies this: "Any characters, or combination of bits (as
> octets), are permitted in
> DNS names." Then it describes how the old ARPANET rules worked. But we moved
> beyond those
> rules a long time ago. Just look at the international domain names.
Actually, while RFC 2181 states that there are no restrictions on DNS labels, it does not say anything about host names (not all records that can be stored in DNS are host names). In fact, it explicitly says that
"Note however, that the various applications that make use of DNS data can have restrictions imposed on what particular values are acceptable in their environment."
RFC 1123 still constitutes the accepted standard for valid host names, and this is what glibc's gethostbyname() implements. Actually, glibc implements a relaxation of RFC1123 that allows underscores anywhere RFC1123 permits hyphens, presumably to deal with errant Windows machines that like to put underscores in their names.
RFC 3696 is quite woolly on the subject of host names. It describes RFC 1123's restrictions on host names as "a preferred form that is required by most applications".
Also, international domain names are a different matter entirely, as they essentially work (as I understand it) by converting invalid host names to RFC 1123-compatible host names.
Arguing by RFC is clearly not going to get us anywhere, given the above. The best argument for this change is that there are domains that require gethostbyname() to accept hyphens (and, presumably, underscores) at the start and end of domain segments in order to be resolved. Glibc already relaxes RFC 1123's restrictions to allow underscores, so why not allow hyphens in any position as well?
The argument about mistaking domain names starting with hyphens for options is spurious, by the way. Given that these domains exist, it's perfectly reasonable that they might be passed to a tool regardless of whether or not gethostbyname() accepts them, and the tool will do option parsing before calling gethostbyname().