getaddrinfo() fails to use latest DNS address - v2.27

Tarun Tej K tarun4690@gmail.com
Tue Jan 7 06:01:00 GMT 2020


Hi,

Environment:
glibc version -  v2.27
platform - NXP's iMX6
cross-compiler - arm-poky-linux-gnueabi-gcc
Built using the Yocto recipes
As a part of long term testing of our system, I have a setup of
automatic network switching between different interfaces like
ethernet, wlan and ppp. During this automation, the DNS addresses in
the /etc/resolv.conf keep changing because the active network
interface i.e., WLAN/Ethernet/PPP keeps changing.

Issue Description:
The issue might be related to
https://sourceware.org/bugzilla/show_bug.cgi?id=984
It is observed that once in a while, after certain duration like 5
hours or so, the getaddrinfo() fails to resolve the addresses and keep
getting EAGAIN 'Temporary failure in name resolution' as return code.
'strace' output of the failing process shows that the getaddrinfo() is
doing neither stat64 nor openat() of /etc/resolv.conf (to check for
latest DNS change)  at all when the process is in this state and may
be due to this reason it is not updating the global config
(resolv_conf_global) with correct DNS values.

I am yet to get the steps to reproduce this issue easily.
I have tried a simple application which just calls getaddrinfo() based
on user input and that application always does 'stat64' of
/etc/resolv.conf and openat when there is change in time or size or
inode of  /etc/resolv.conf
But I am not sure what is causing my actual application to get into a
state where it is not even doing 'stat64' of /etc/resolv.conf after
some time of running

I have gone through glibc code and have a query regarding below part
from the function maybe_init() in file resolv/resolv_context.c

if (ctx->conf != NULL && replicated_configuration_matches (ctx))
        {
          struct resolv_conf *current = __resolv_conf_get_current ();
          if (current == NULL)
            return false;
          /* Check if the configuration changed.  */
          if (current != ctx->conf)
            {
              /* This call will detach the extended resolver state.  */
              if (resp->nscount > 0)
                __res_iclose (resp, true);
              /* Reattach the current configuration.  */
              if (__resolv_conf_attach (ctx->resp, current))
                {
                  __resolv_conf_put (ctx->conf);
                  /* ctx takes ownership, so we do not release current.  */
                  ctx->conf = current;
                }
            }
          else
            /* No change.  Drop the reference count for current.  */
            __resolv_conf_put (current);
        }
      return true;

Here the return value will be 'true' even when the condition   if
(ctx->conf != NULL && replicated_configuration_matches (ctx)) fails. I
think that  this is one case where __resolv_conf_get_current() or
__resolv_conf_load()  would not be  called and so 'stat64' or openat()
would not be done on /etc/resolv.conf. Why is the function maybe_init
returning 'true' when the condition (ctx->conf != NULL &&
replicated_configuration_matches (ctx)) fails?

Note:
One thing about /etc/resolv.conf if it helps. Depending the type of
active network interface the application changes file type of
/etc/resolv.conf is sometimes regular file or symlink to
/var/run/resolv.conf.  Could the /etc/resolv.conf being a symlink
cause any problem like this.

Thanks
Tarun



More information about the Libc-help mailing list