getaddrinfo chokes at hostnames containing "emoji" characters
Florian Weimer
fweimer@redhat.com
Wed May 16 14:09:00 GMT 2018
On 05/16/2018 04:03 PM, Name Surname wrote:
> Florian Weimer wrote:
>> On 05/16/2018 10:40 AM, Name Surname wrote:
>>> Greetings everyone.
>>>
>>> I recently bought a domain name containing "emoji" characters, as a
>>> novelty and in order to do some experiments. I tried getting the IP
>>> address associated to it using getaddrinfo, however, it errs and returns
>>> "Name or service not known". The same thing happens with any program
>>> that uses glibc for name resolution. I understand that emoji domains are
>>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>>> were supported according to IDNA2003, and web browsers resolve them
>>> normally according to IDNA2003 (at least firefox does).
>>>
>>> Is this a bug or a feature?
>>
>> In the near future, glibc will use the system libidn2 library to
>> implement AI_IDN getaddrinfo support. You will have to convince the
>> libidn2 maintainers to enable Emoji support (by default), but as long as
>> there is no published standard for that at all (perhaps with the
>> exception of Unicode TR46 transitional mode, which is not recommended),
>> this seems difficult.
> It seems that, according to the WHATWG URL standard, IDNs should be
> processed as per IDNA2008:
>
> > Let result be the result of running Unicode ToASCII with
> > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
> > CheckHyphens set to false,
> > CheckBidi set to true, CheckJoiners set to true,
> > *processing_option set to Nontransitional_Processing*,
> > and VerifyDnsLength set to beStrict.
>
> Source: https://url.spec.whatwg.org/#idna
>
> (Emphasis mine)
>
> If I am understanding the standard correctly, then discussion of this
> matter is moot, as this implies that emoji domains are not even
> considered valid URLs.
Yes, Firefox implements something else. It generates a DNS request for
xn--nmchen_2-0za.wildcard.t.enyo.de. from
<http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according
to UseSTD3ASCIIRules. This is probably a specification bug.
But based on what I understand, IDNA with TR46 non-transitional
processing does not actually allow emojis.
Thanks,
Florian
More information about the Libc-help
mailing list