Bug 15272 - getaddrinfo() returns EAI_SYSTEM with AF_UNSPEC and SOCK_STREAM on ARM
Summary: getaddrinfo() returns EAI_SYSTEM with AF_UNSPEC and SOCK_STREAM on ARM
Status: RESOLVED WORKSFORME
Alias: None
Product: glibc
Classification: Unclassified
Component: network (show other bugs)
Version: 2.17
: P2 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-12 17:36 UTC by Florian Fainelli
Modified: 2019-02-15 13:22 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
Project(s) to access:
ssh public key:
fweimer: security-


Attachments
Test case to illustrate the issue (403 bytes, text/x-csrc)
2013-03-12 17:36 UTC, Florian Fainelli
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Fainelli 2013-03-12 17:36:25 UTC
Created attachment 6930 [details]
Test case to illustrate the issue

The test case attached will return the following on an ARM v5 system:

./getaddrinfo www.free.fr
getaddrinfo: System error (errno: 111, Connection refused)

While the exact same test case works fine on an i686 system. This code is not architecture-specific as far as I know, but still consistenly fails on ARMv5 and not on i686.

Note that I was able to reproduce the system with glibc 2.16 and glibc 2.17 and not with glibc 2.13. I will dig more to see if glibc 2.14 and glibc 2.15 and 2.18 are also affected.
Comment 1 Florian Fainelli 2013-03-20 09:09:02 UTC
glibc 2.15 and before are not affected and exhibit a correct behavior (resolution works fine with AF_UNSPEC and SOCK_STREAM). glibc 2.16 and onwards are affected.
Comment 2 Siddhesh Poyarekar 2013-04-05 07:13:29 UTC
Could you describe the system configuration?  Is this when the network is up and everything is functioning correctly?  What does 'dig www.free.fr' say?
Comment 3 Florian Fainelli 2013-04-05 16:58:07 UTC
(In reply to comment #2)
> Could you describe the system configuration?  Is this when the network is up
> and everything is functioning correctly?  What does 'dig www.free.fr' say?

The system is an usual internet gateway device, with its upstream interface connecting it to the internet. The connectivity is working ok. The upstream interface has no IPv6 address assigned.

/etc/resolv.conf:
nameserver 212.27.40.240
nameserver 212.27.40.241

And here is what dig says:


; <<>> DiG 9.9.1-P3 <<>> www.free.fr
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9636
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.free.fr.                   IN      A

;; ANSWER SECTION:
www.free.fr.            65295   IN      A       212.27.48.10

;; Query time: 10 msec
;; SERVER: 212.27.40.240#53(212.27.40.240)
;; WHEN: Fri Apr  5 18:48:28 2013
;; MSG SIZE  rcvd: 56

dig works fine, but I do not see it using getaddrinfo().
Comment 4 Siddhesh Poyarekar 2013-04-05 17:02:06 UTC
Right, dig does not use getaddrinfo.  I just wanted to know if the network/DNS is accessible otherwise.  I'm not able to reproduce this on my armv6 board, so I'm going to pass on this.  Hopefully someone with a v5 will get to it.
Comment 5 Florian Fainelli 2013-04-05 17:05:31 UTC
(In reply to comment #4)
> Right, dig does not use getaddrinfo.  I just wanted to know if the network/DNS
> is accessible otherwise.  I'm not able to reproduce this on my armv6 board, so
> I'm going to pass on this.  Hopefully someone with a v5 will get to it.

The thing I am worried about is a possible toolchain issue. I have been using various combinations of GCC versions and GLIBC versions, but it really is specific to glibc 2.16 included and onwards.

Thanks for your help so far!
Comment 6 Mikael Pettersson 2013-04-06 10:42:41 UTC
The test case works for me on armv5tel-linux-gnueabi w/ glibc-2.17 and gcc-4.6.3, and on armv5teb-linux-gnueabi w/ glibc-2.17 and gcc-4.7.3.

I would expect "issues" on OABI, but I don't know if glibc-2.17 builds for that.
Comment 7 Florian Fainelli 2013-04-08 09:07:25 UTC
(In reply to comment #6)
> The test case works for me on armv5tel-linux-gnueabi w/ glibc-2.17 and
> gcc-4.6.3, and on armv5teb-linux-gnueabi w/ glibc-2.17 and gcc-4.7.3.
> 
> I would expect "issues" on OABI, but I don't know if glibc-2.17 builds for
> that.

I am also using EABI here. Can you share your glibc configuration logs/files? Thanks!
Comment 8 Mikael Pettersson 2013-04-08 09:51:38 UTC
(In reply to comment #7)
> I am also using EABI here. Can you share your glibc configuration logs/files?

I don't have any build logs left, but my glibc is based on Fedora 19's 
glibc-2.17-2.fc19.src.rpm with some tweaks for aarch64, arm, and m68k -- none that would matter code-generation wise for arm.  I see glibc-2.17-4.fc19.src.rpm currently on Fedora's mirrors.

If this doesn't work, I'd suspect a compiler issue (my gcc-4.6 and 4.7 are very heavily patched with backported bugfixes.)
Comment 9 Florian Fainelli 2013-04-08 09:53:23 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I am also using EABI here. Can you share your glibc configuration logs/files?
> 
> I don't have any build logs left, but my glibc is based on Fedora 19's 
> glibc-2.17-2.fc19.src.rpm with some tweaks for aarch64, arm, and m68k -- none
> that would matter code-generation wise for arm.  I see
> glibc-2.17-4.fc19.src.rpm currently on Fedora's mirrors.
> 
> If this doesn't work, I'd suspect a compiler issue (my gcc-4.6 and 4.7 are very
> heavily patched with backported bugfixes.)

Ok, I will try with a quite recente Linaro toolchain for instance, and see if that changes anything.
Comment 10 Florian Fainelli 2013-04-08 16:23:09 UTC
Tried several combinations without success, the issue remains the same:

- GCC 4.7 Linaro 2013.01 with -Os
- GCC 4.7 Linaro 2013.01 without -Os

both give me consistent failures.
Comment 11 Mikael Pettersson 2013-04-08 18:18:55 UTC
Please try to extract a self-contained test case (combining your application with relevant glibc code).
Comment 12 Dan Fandrich 2014-01-05 07:39:23 UTC
I'm able to reproduce this on x86_64 consistently on a couple of machines under Linux 3.10.24 using the neon test suite. Steps to reproduce:

curl -O http://www.webdav.org/neon/neon-0.29.6.tar.gz
tar xaf neon-0.29.6.tar.gz
cd neon-0.29.6
./configure --without-ssl --without-egd --without-pakchois --without-gssapi --without-libproxy --without-libxml2 --without-expat --without-zlib --disable-webdav
make
make check

And then, as needed,

cd test
./request

Test 68 is the first one that fails, then several others.

This is using glibc 2.18 and gcc 4.8.2. I've tried creating a simple test case that reproduces the same getaddrinfo calls as done in the test suite, but it's not sufficient to cause the failure. Changing the order of tests is enough to work around the problem, so it's clearly some subtle internal state that needs to be set up for the failure to occur.

The failing getaddrinfo() calls being done in the test suite are equivalent to:

        hints.ai_socktype = SOCK_STREAM;
        hints.ai_flags = AI_ADDRCONFIG;
        hints.ai_family = AF_UNSPEC;
        errnum = getaddrinfo("localhost", NULL, &hints, &res);

Either of the following two patches is sufficient to work around the problem and allow the test suite to pass. Changing the order of getaddrinfo() calls:

--- neon-0.29.6/test/request.c.orig     2014-01-05 06:36:01.124005697 +0000
+++ neon-0.29.6/test/request.c  2014-01-05 06:37:28.859996470 +0000
@@ -2397,8 +2397,6 @@
     T(fail_long_header),
     T(fail_on_invalid),
     T(read_timeout),
-    T(fail_lookup),
-    T(fail_double_lookup),
     T(fail_connect),
     T(proxy_no_resolve),
     T(fail_chunksize),
@@ -2422,5 +2420,7 @@
     T(socks_v4_proxy),
     T(send_length),
     T(socks_fail),
+    T(fail_lookup),
+    T(fail_double_lookup),
     T(NULL)
 };

Or using AF_INET instead of AF_UNSPEC:

--- neon-0.29.6/src/ne_socket.c.orig        2014-01-04 14:42:09.665502390 +0000
+++ neon-0.29.6/src/ne_socket.c     2014-01-05 00:56:28.287272839 +0000
@@ -925,8 +925,9 @@
     {
 #ifdef USE_GAI_ADDRCONFIG /* added in the RFC3493 API */
         hints.ai_flags = AI_ADDRCONFIG;
-        hints.ai_family = AF_UNSPEC;
+        hints.ai_family = AF_INET; //AF_UNSPEC;
         addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result);
+        hints.ai_family = ipv6_disabled ? AF_INET : AF_UNSPEC;
 #else
         hints.ai_family = ipv6_disabled ? AF_INET : AF_UNSPEC;
        addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result);
Comment 13 Dan Fandrich 2014-01-11 20:50:37 UTC
Sorry, I just noticed that second patch has some cruft from debugging. This is all that's actually needed as the workaround:

--- neon-0.29.6/src/ne_socket.c.orig        2014-01-04 14:42:09.665502390 +0000
+++ neon-0.29.6/src/ne_socket.c     2014-01-05 00:56:28.287272839 +0000
@@ -925,8 +925,8 @@
     {
 #ifdef USE_GAI_ADDRCONFIG /* added in the RFC3493 API */
         hints.ai_flags = AI_ADDRCONFIG;
-        hints.ai_family = AF_UNSPEC;
+        hints.ai_family = AF_INET; //AF_UNSPEC;
         addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result);
 #else
         hints.ai_family = ipv6_disabled ? AF_INET : AF_UNSPEC;
        addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result);
Comment 14 Alexandre Oliva 2014-09-27 06:41:51 UTC
Florian, any chance you could get an strace from a failed execution of your program with 2.16, and one from a successful execution with 2.15?  I'm afraid if that doesn't point us towards a solution, a debugger and glibc debug info might be required to figure out what's going on.
Comment 15 Florian Weimer 2019-02-15 13:22:08 UTC
We did not receive enough information to reproduce this.  It could be an NSS module or some nscd interaction.