NIS performance patches
Jeff Bastian
jmbastia@ti.com
Sat Jan 10 00:57:00 GMT 2004
Thorston and glibc developers,
I am a systems administrator at Texas Instruments. We have been
deploying Linux clusters in our datacenter here in Dallas, and as more
and more clusters go online, we've been noticing a performance problem
with NIS. By default, ypbind pings the NIS servers every 20 seconds
asking "are you still alive?", and every 15 minutes, it rebinds to the
NIS server regardless of whether or not there were problems. With the
number of Linux clients on our network, this is creating a high level of
stress on our NIS servers. (nscd helps a little, but we've been having
issues with nscd dieing; we're still debugging this issue.)
There is a -no-ping option to ypbind that disables this 20sec/15min
activity, however, this also disables a section of code that is useful
in case a NIS server goes down. With the 20sec pinging going on, if a
NIS server crashes, ypbind will find a new server to bind to. But if
the -no-ping option is used, the client will remain bound to the bad
server forever. If you look in the test_bindings() function
(ypbind-mt-1.12/src/serv_list.c), there are two lines near
the top:
if (ping_interval < 1)
pthread_exit (&success);
The -no-ping option sets ping_interval=0, so this thread exits and the
code that tries to rebind never gets run.
We have developed some patches that modify the behavior of both glibc
and ypbind so that, in the case of an error, ypbind will look for a new
server even if -no-ping was used. We're also working with RedHat to
incorporate these patches, but RedHat does not want to created a forked
version of ypbind and glibc. They would be more comfortable if these
patches were accepted by you.
I've attached the patches to this e-mail, and I have a more detailed
description of what the patches do below. These patches were generated
against the source from RedHat Enterprise Linux 3, which means:
ypbind-mt-1.12
glibc-2.3.2-200309260658
You'll also notice in the patch for ypbind that I've defined
USE_BROADCAST=0. We have another issue where a few NIS servers are
extremely fast at responding to pings even when heavily loaded, so of
our 11 NIS servers, most clients bind to the fastest 3 and the other 8
sit by idle. We're experimenting with USE_BROADCAST=0 and a Perl script
that randomizes the order of the servers in /etc/yp.conf to get better
load balancing. If this works well, an actual 'configure' switch would
be better than my patch that simply hacks the 'configure' script.
Please take a look at our patches and let me know what you think. I
believe these patches will help make Linux a better product for the
enterprise.
Thank you!
-----------------
Jeff Bastian
jmbastia@ti.com
Unix System Admin
Texas Instruments
-----------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Detailed Description
What exactly do our ypbind patches do?
1) There is a chunk of code in the test_bindings() function in
(ypbind-mt-1.12/src/serv_list.c) that tests the current server
for accessibility and finds a new server if its broken. This
chunk of code is inside a while(1) loop that controls the
20sec/15min actions. We've moved this chunk of code to a
function called test_bindings_once() and placed a call to
this new function in test_bindings(), so the functionality
is not changed here.
2) In the ypbindproc_domain() function
(ypbind-mt-1.12/src/ypbind_server.c) there is an added call to
the test_bindings_once() function. If everything is working,
this added test is a small price. If the server is down, it
will look for a new one.
So, the only change in behavior here is that ypbindproc_domain() will
test the current binding once before returning. If necessary, a new
command line flag can be added so ypbindproc_domain() only calls
test_bindings_once() if this new flag is present, e.g.
ypbind -no-ping --xyz
...
ypbindproc_domain(...)
{
...
if (xyz)
test_bindings_once(1);
find_domain (domain, result);
...
}
However, this is only half the picture. Meanwhile, over in glibc
land, more changes are made in the NIS client code to make this whole
new system work.
What exactly do our glibc patches do?
1) Like test_bindings() above, we take the __yp_bind() function
(glibc-2.3.2-200309260658/nis/ypclnt.c) and move chunks of it
into three smaller functions.
a) __yp_bind_client_create() is a small chunk of code that
was duplicated in __yp_bind(), once for the section of
code that looks at /var/yp/binding, the other for the
section that talks to the ypbind daemon
b) the section that looks at /var/yp/binding was moved into
the __yp_bind_file() function, with a call to
__yp_bind_client_create() where the duplicate code used
to reside
c) the section that talks to ypbind daemon was moved into
the __yp_bind_ypbindprog() function, again, with a call to
__yp_bind_client_create() replacing the duplicate code
And, of course, calls to __yp_bind_file() and
__yp_bind_ypbindprog() are inserted into __yp_bind() where the
code used to reside. Also, like ypbind change (1) above, this
does not alter the functionality at all.
2) If /var/yp/binding has bad data in it (e.g., a server that went
offline), then calls do_ypcall() will fail w/o trying to find
a new NIS server. So, two lines of code are added to do_ypcall()
that, in the event of an error from clnt_call(), try calling
__yp_bind_ypbindprog() (one of our new functions), which in turn
does
clnt_call (client, YPBINDPROC_DOMAIN, ...)
which in turn calls ypbindproc_domain() in the ypbind daemon
which calls test_bindings_once() and hopefully finds a new
server.
In summary, while at first our patches appear to do lots of surgery
to the source code, there's really only two small changes:
1) Call test_bindings_once() from ypbindproc_domain(), possibly
controlled by a new command line flag
2) Call __yp_bind_ypbindprog() from do_ypcall() if clnt_call()
returns an error
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nis_patches_rhel3.tar.gz
Type: application/gzip
Size: 28685 bytes
Desc: not available
URL: <http://sourceware.org/pipermail/libc-alpha/attachments/20040110/4be4869c/attachment.gz>
More information about the Libc-alpha
mailing list