getaddrinfo chokes at hostnames containing "emoji" characters

Florian Weimer fweimer@redhat.com
Wed May 16 14:09:00 GMT 2018


On 05/16/2018 04:03 PM, Name Surname wrote:
> Florian Weimer wrote:
>> On 05/16/2018 10:40 AM, Name Surname wrote:
>>> Greetings everyone.
>>>
>>> I recently bought a domain name containing "emoji" characters, as a
>>> novelty and in order to do some experiments. I tried getting the IP
>>> address associated to it using getaddrinfo, however, it errs and returns
>>> "Name or service not known". The same thing happens with any program
>>> that uses glibc for name resolution. I understand that emoji domains are
>>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>>> were supported according to IDNA2003, and web browsers resolve them
>>> normally according to IDNA2003 (at least firefox does).
>>>
>>> Is this a bug or a feature?
>>
>> In the near future, glibc will use the system libidn2 library to
>> implement AI_IDN getaddrinfo support.  You will have to convince the
>> libidn2 maintainers to enable Emoji support (by default), but as long as
>> there is no published standard for that at all (perhaps with the
>> exception of Unicode TR46 transitional mode, which is not recommended),
>> this seems difficult.

> It seems that, according to the WHATWG URL standard, IDNs should be
> processed as per IDNA2008:
> 
>   > Let result be the result of running Unicode ToASCII with
>   > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
>   > CheckHyphens set to false,
>   > CheckBidi set to true, CheckJoiners set to true,
>   > *processing_option set to Nontransitional_Processing*,
>   > and VerifyDnsLength set to beStrict.
> 
> Source: https://url.spec.whatwg.org/#idna
> 
> (Emphasis mine)
> 
> If I am understanding the standard correctly, then discussion of this
> matter is moot, as this implies that emoji domains are not even
> considered valid URLs.

Yes, Firefox implements something else.  It generates a DNS request for 
xn--nmchen_2-0za.wildcard.t.enyo.de. from 
<http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according 
to UseSTD3ASCIIRules.  This is probably a specification bug.

But based on what I understand, IDNA with TR46 non-transitional 
processing does not actually allow emojis.

Thanks,
Florian



More information about the Libc-help mailing list