getaddrinfo chokes at hostnames containing "emoji" characters

Name Surname augeus@outlook.com
Wed May 16 18:20:00 GMT 2018


Florian Weimer wrote:
> On 05/16/2018 04:03 PM, Name Surname wrote:
>> Florian Weimer wrote:
>>> On 05/16/2018 10:40 AM, Name Surname wrote:
>>>> Greetings everyone.
>>>>
>>>> I recently bought a domain name containing "emoji" characters, as a
>>>> novelty and in order to do some experiments. I tried getting the IP
>>>> address associated to it using getaddrinfo, however, it errs and 
>>>> returns
>>>> "Name or service not known". The same thing happens with any program
>>>> that uses glibc for name resolution. I understand that emoji domains 
>>>> are
>>>> not valid according to IDNA2008, however, some ccTLDs sell them, they
>>>> were supported according to IDNA2003, and web browsers resolve them
>>>> normally according to IDNA2003 (at least firefox does).
>>>>
>>>> Is this a bug or a feature?
>>>
>>> In the near future, glibc will use the system libidn2 library to
>>> implement AI_IDN getaddrinfo support.  You will have to convince the
>>> libidn2 maintainers to enable Emoji support (by default), but as long as
>>> there is no published standard for that at all (perhaps with the
>>> exception of Unicode TR46 transitional mode, which is not recommended),
>>> this seems difficult.
> 
>> It seems that, according to the WHATWG URL standard, IDNs should be
>> processed as per IDNA2008:
>>
>>   > Let result be the result of running Unicode ToASCII with
>>   > domain_name set to domain, UseSTD3ASCIIRules set to beStrict,
>>   > CheckHyphens set to false,
>>   > CheckBidi set to true, CheckJoiners set to true,
>>   > *processing_option set to Nontransitional_Processing*,
>>   > and VerifyDnsLength set to beStrict.
>>
>> Source: https://url.spec.whatwg.org/#idna
>>
>> (Emphasis mine)
>>
>> If I am understanding the standard correctly, then discussion of this
>> matter is moot, as this implies that emoji domains are not even
>> considered valid URLs.
> 
> Yes, Firefox implements something else.  It generates a DNS request for 
> xn--nmchen_2-0za.wildcard.t.enyo.de. from 
> <http://nämchen_2.wildcard.t.enyo.de/>, which is not allowed according 
> to UseSTD3ASCIIRules.  This is probably a specification bug.
> 
> But based on what I understand, IDNA with TR46 non-transitional 
> processing does not actually allow emojis.
> 
> Thanks,
> Florian
> .
> 

 > But based on what I understand, IDNA with TR46 non-transitional
 > processing does not actually allow emojis.

This is true.

It appears, though, that WHATWG changed their URL standard to recommend 
using Nontransitional_Processing quite recently (20/02/2017). Before 
that date, they recommended using Transitional_Processing. I suppose 
that, given enough time, the confusion will naturally clear itself up.
It certainly has cleared up for me :).
( Reference: 
https://github.com/whatwg/url/commit/f4d84a52e67b154b2d11e04889fe0a35a029c833 
)

Thanks for helping me out

.


More information about the Libc-help mailing list