NIS performance patches

Jeff Bastian jmbastia@ti.com
Sat Jan 10 00:57:00 GMT 2004


Thorston and glibc developers,

I am a systems administrator at Texas Instruments.  We have been 
deploying Linux clusters in our datacenter here in Dallas, and as more 
and more clusters go online, we've been noticing a performance problem 
with NIS.  By default, ypbind pings the NIS servers every 20 seconds 
asking "are you still alive?", and every 15 minutes, it rebinds to the 
NIS server regardless of whether or not there were problems.  With the 
number of Linux clients on our network, this is creating a high level of 
stress on our NIS servers.  (nscd helps a little, but we've been having 
issues with nscd dieing; we're still debugging this issue.)

There is a -no-ping option to ypbind that disables this 20sec/15min 
activity, however, this also disables a section of code that is useful 
in case a NIS server goes down.  With the 20sec pinging going on, if a 
NIS server crashes, ypbind will find a new server to bind to.  But if 
the -no-ping option is used, the client will remain bound to the bad 
server forever.  If you look in the test_bindings() function 
(ypbind-mt-1.12/src/serv_list.c), there are two lines near
the top:
  if (ping_interval < 1)
    pthread_exit (&success);
The -no-ping option sets ping_interval=0, so this thread exits and the
code that tries to rebind never gets run.

We have developed some patches that modify the behavior of both glibc 
and ypbind so that, in the case of an error, ypbind will look for a new 
server even if -no-ping was used.  We're also working with RedHat to 
incorporate these patches, but RedHat does not want to created a forked 
version of ypbind and glibc.  They would be more comfortable if these 
patches were accepted by you.

I've attached the patches to this e-mail, and I have a more detailed 
description of what the patches do below.  These patches were generated 
against the source from RedHat Enterprise Linux 3, which means:
   ypbind-mt-1.12
   glibc-2.3.2-200309260658

You'll also notice in the patch for ypbind that I've defined 
USE_BROADCAST=0.  We have another issue where a few NIS servers are 
extremely fast at responding to pings even when heavily loaded, so of 
our 11 NIS servers, most clients bind to the fastest 3 and the other 8 
sit by idle.  We're experimenting with USE_BROADCAST=0 and a Perl script 
that randomizes the order of the servers in /etc/yp.conf to get better 
load balancing.  If this works well, an actual 'configure' switch would 
be better than my patch that simply hacks the 'configure' script.

Please take a look at our patches and let me know what you think.  I 
believe these patches will help make Linux a better product for the 
enterprise.

Thank you!

-----------------
Jeff Bastian
jmbastia@ti.com
Unix System Admin
Texas Instruments
-----------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Detailed Description

What exactly do our ypbind patches do?
  1) There is a chunk of code in the test_bindings() function in
     (ypbind-mt-1.12/src/serv_list.c) that tests the current server
     for accessibility and finds a new server if its broken.  This
     chunk of code is inside a while(1) loop that controls the
     20sec/15min actions.  We've moved this chunk of code to a
     function called test_bindings_once() and placed a call to
     this new function in test_bindings(), so the functionality
     is not changed here.
  2) In the ypbindproc_domain() function
     (ypbind-mt-1.12/src/ypbind_server.c) there is an added call to
     the test_bindings_once() function.  If everything is working,
     this added test is a small price.  If the server is down, it
     will look for a new one.

So, the only change in behavior here is that ypbindproc_domain() will
test the current binding once before returning.  If necessary, a new
command line flag can be added so ypbindproc_domain() only calls
test_bindings_once() if this new flag is present, e.g.
  ypbind -no-ping --xyz
  ...
  ypbindproc_domain(...)
  {
    ...
    if (xyz)
      test_bindings_once(1);
    find_domain (domain, result);
    ...
  }



However, this is only half the picture.  Meanwhile, over in glibc
land, more changes are made in the NIS client code to make this whole
new system work.

What exactly do our glibc patches do?
  1) Like test_bindings() above, we take the __yp_bind() function
     (glibc-2.3.2-200309260658/nis/ypclnt.c) and move chunks of it
     into three smaller functions.
       a) __yp_bind_client_create() is a small chunk of code that
          was duplicated in __yp_bind(), once for the section of
          code that looks at /var/yp/binding, the other for the
          section that talks to the ypbind daemon
       b) the section that looks at /var/yp/binding was moved into
          the __yp_bind_file() function, with a call to
          __yp_bind_client_create() where the duplicate code used
          to reside
       c) the section that talks to ypbind daemon was moved into
          the __yp_bind_ypbindprog() function, again, with a call to
          __yp_bind_client_create() replacing the duplicate code
     And, of course, calls to __yp_bind_file() and
     __yp_bind_ypbindprog() are inserted into __yp_bind() where the
     code used to reside.  Also, like ypbind change (1) above, this
     does not alter the functionality at all.
  2) If /var/yp/binding has bad data in it (e.g., a server that went
     offline), then calls do_ypcall() will fail w/o trying to find
     a new NIS server.  So, two lines of code are added to do_ypcall()
     that, in the event of an error from clnt_call(), try calling
     __yp_bind_ypbindprog() (one of our new functions), which in turn
     does
       clnt_call (client, YPBINDPROC_DOMAIN, ...)
     which in turn calls ypbindproc_domain() in the ypbind daemon
     which calls test_bindings_once() and hopefully finds a new
     server.

In summary, while at first our patches appear to do lots of surgery
to the source code, there's really only two small changes:
  1) Call test_bindings_once() from ypbindproc_domain(), possibly
     controlled by a new command line flag
  2) Call __yp_bind_ypbindprog() from do_ypcall() if clnt_call()
     returns an error



-------------- next part --------------
A non-text attachment was scrubbed...
Name: nis_patches_rhel3.tar.gz
Type: application/gzip
Size: 28685 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20040110/4be4869c/attachment.gz>


More information about the Libc-alpha mailing list