Created attachment 6930 [details] Test case to illustrate the issue The test case attached will return the following on an ARM v5 system: ./getaddrinfo www.free.fr getaddrinfo: System error (errno: 111, Connection refused) While the exact same test case works fine on an i686 system. This code is not architecture-specific as far as I know, but still consistenly fails on ARMv5 and not on i686. Note that I was able to reproduce the system with glibc 2.16 and glibc 2.17 and not with glibc 2.13. I will dig more to see if glibc 2.14 and glibc 2.15 and 2.18 are also affected.
glibc 2.15 and before are not affected and exhibit a correct behavior (resolution works fine with AF_UNSPEC and SOCK_STREAM). glibc 2.16 and onwards are affected.
Could you describe the system configuration? Is this when the network is up and everything is functioning correctly? What does 'dig www.free.fr' say?
(In reply to comment #2) > Could you describe the system configuration? Is this when the network is up > and everything is functioning correctly? What does 'dig www.free.fr' say? The system is an usual internet gateway device, with its upstream interface connecting it to the internet. The connectivity is working ok. The upstream interface has no IPv6 address assigned. /etc/resolv.conf: nameserver 212.27.40.240 nameserver 212.27.40.241 And here is what dig says: ; <<>> DiG 9.9.1-P3 <<>> www.free.fr ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9636 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.free.fr. IN A ;; ANSWER SECTION: www.free.fr. 65295 IN A 212.27.48.10 ;; Query time: 10 msec ;; SERVER: 212.27.40.240#53(212.27.40.240) ;; WHEN: Fri Apr 5 18:48:28 2013 ;; MSG SIZE rcvd: 56 dig works fine, but I do not see it using getaddrinfo().
Right, dig does not use getaddrinfo. I just wanted to know if the network/DNS is accessible otherwise. I'm not able to reproduce this on my armv6 board, so I'm going to pass on this. Hopefully someone with a v5 will get to it.
(In reply to comment #4) > Right, dig does not use getaddrinfo. I just wanted to know if the network/DNS > is accessible otherwise. I'm not able to reproduce this on my armv6 board, so > I'm going to pass on this. Hopefully someone with a v5 will get to it. The thing I am worried about is a possible toolchain issue. I have been using various combinations of GCC versions and GLIBC versions, but it really is specific to glibc 2.16 included and onwards. Thanks for your help so far!
The test case works for me on armv5tel-linux-gnueabi w/ glibc-2.17 and gcc-4.6.3, and on armv5teb-linux-gnueabi w/ glibc-2.17 and gcc-4.7.3. I would expect "issues" on OABI, but I don't know if glibc-2.17 builds for that.
(In reply to comment #6) > The test case works for me on armv5tel-linux-gnueabi w/ glibc-2.17 and > gcc-4.6.3, and on armv5teb-linux-gnueabi w/ glibc-2.17 and gcc-4.7.3. > > I would expect "issues" on OABI, but I don't know if glibc-2.17 builds for > that. I am also using EABI here. Can you share your glibc configuration logs/files? Thanks!
(In reply to comment #7) > I am also using EABI here. Can you share your glibc configuration logs/files? I don't have any build logs left, but my glibc is based on Fedora 19's glibc-2.17-2.fc19.src.rpm with some tweaks for aarch64, arm, and m68k -- none that would matter code-generation wise for arm. I see glibc-2.17-4.fc19.src.rpm currently on Fedora's mirrors. If this doesn't work, I'd suspect a compiler issue (my gcc-4.6 and 4.7 are very heavily patched with backported bugfixes.)
(In reply to comment #8) > (In reply to comment #7) > > I am also using EABI here. Can you share your glibc configuration logs/files? > > I don't have any build logs left, but my glibc is based on Fedora 19's > glibc-2.17-2.fc19.src.rpm with some tweaks for aarch64, arm, and m68k -- none > that would matter code-generation wise for arm. I see > glibc-2.17-4.fc19.src.rpm currently on Fedora's mirrors. > > If this doesn't work, I'd suspect a compiler issue (my gcc-4.6 and 4.7 are very > heavily patched with backported bugfixes.) Ok, I will try with a quite recente Linaro toolchain for instance, and see if that changes anything.
Tried several combinations without success, the issue remains the same: - GCC 4.7 Linaro 2013.01 with -Os - GCC 4.7 Linaro 2013.01 without -Os both give me consistent failures.
Please try to extract a self-contained test case (combining your application with relevant glibc code).
I'm able to reproduce this on x86_64 consistently on a couple of machines under Linux 3.10.24 using the neon test suite. Steps to reproduce: curl -O http://www.webdav.org/neon/neon-0.29.6.tar.gz tar xaf neon-0.29.6.tar.gz cd neon-0.29.6 ./configure --without-ssl --without-egd --without-pakchois --without-gssapi --without-libproxy --without-libxml2 --without-expat --without-zlib --disable-webdav make make check And then, as needed, cd test ./request Test 68 is the first one that fails, then several others. This is using glibc 2.18 and gcc 4.8.2. I've tried creating a simple test case that reproduces the same getaddrinfo calls as done in the test suite, but it's not sufficient to cause the failure. Changing the order of tests is enough to work around the problem, so it's clearly some subtle internal state that needs to be set up for the failure to occur. The failing getaddrinfo() calls being done in the test suite are equivalent to: hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_ADDRCONFIG; hints.ai_family = AF_UNSPEC; errnum = getaddrinfo("localhost", NULL, &hints, &res); Either of the following two patches is sufficient to work around the problem and allow the test suite to pass. Changing the order of getaddrinfo() calls: --- neon-0.29.6/test/request.c.orig 2014-01-05 06:36:01.124005697 +0000 +++ neon-0.29.6/test/request.c 2014-01-05 06:37:28.859996470 +0000 @@ -2397,8 +2397,6 @@ T(fail_long_header), T(fail_on_invalid), T(read_timeout), - T(fail_lookup), - T(fail_double_lookup), T(fail_connect), T(proxy_no_resolve), T(fail_chunksize), @@ -2422,5 +2420,7 @@ T(socks_v4_proxy), T(send_length), T(socks_fail), + T(fail_lookup), + T(fail_double_lookup), T(NULL) }; Or using AF_INET instead of AF_UNSPEC: --- neon-0.29.6/src/ne_socket.c.orig 2014-01-04 14:42:09.665502390 +0000 +++ neon-0.29.6/src/ne_socket.c 2014-01-05 00:56:28.287272839 +0000 @@ -925,8 +925,9 @@ { #ifdef USE_GAI_ADDRCONFIG /* added in the RFC3493 API */ hints.ai_flags = AI_ADDRCONFIG; - hints.ai_family = AF_UNSPEC; + hints.ai_family = AF_INET; //AF_UNSPEC; addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result); + hints.ai_family = ipv6_disabled ? AF_INET : AF_UNSPEC; #else hints.ai_family = ipv6_disabled ? AF_INET : AF_UNSPEC; addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result);
Sorry, I just noticed that second patch has some cruft from debugging. This is all that's actually needed as the workaround: --- neon-0.29.6/src/ne_socket.c.orig 2014-01-04 14:42:09.665502390 +0000 +++ neon-0.29.6/src/ne_socket.c 2014-01-05 00:56:28.287272839 +0000 @@ -925,8 +925,8 @@ { #ifdef USE_GAI_ADDRCONFIG /* added in the RFC3493 API */ hints.ai_flags = AI_ADDRCONFIG; - hints.ai_family = AF_UNSPEC; + hints.ai_family = AF_INET; //AF_UNSPEC; addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result); #else hints.ai_family = ipv6_disabled ? AF_INET : AF_UNSPEC; addr->errnum = getaddrinfo(hostname, NULL, &hints, &addr->result);
Florian, any chance you could get an strace from a failed execution of your program with 2.16, and one from a successful execution with 2.15? I'm afraid if that doesn't point us towards a solution, a debugger and glibc debug info might be required to figure out what's going on.
We did not receive enough information to reproduce this. It could be an NSS module or some nscd interaction.