This is the mail archive of the
libc-help@sourceware.org
mailing list for the glibc project.
Re: [RFC]setlocale() race condition
I think you have all the right pieces in place. Please feel free to
post a "final" version to libc-help, I can give it a once over, and
then you can post to libc-alpha. The point of libc-help is to start
the feedback process earlier and with better results :-)
Cheers,
Carlos.
Carlos
This is same as my earlier mail ..but with minor changes
So does it look better ?
This problem was noticed with glibc shipped with distro, with older
version of glibc 2.5-12, the problem is noticed after 8 hours of testing
and application crashed with SIGSEGV. My efforts to replicate the
problem with main line glibc was not successful, but I still feel the
problem is there even with main line glibc and wish to know your
thoughts on this issue
-----------------------------------------------------------------
Explanation:
This problem was noticed during PHP engine development, current
implementation calls setlocale() every time a page is requested. The
problem is noticed during stress test of this PHP engine
setlocale() is being called on multiple threads. The exact API
calls are as follows
....
setlocale(LC_ALL,'C');
....
setlocale(LC_TYPE,'');
....
setlocale(LC_CTYPE,'C');
....
It was observed that after ~8 hours of testing, application crashed
at strcmp() call made from setlocale(), when I analyzed the dump it
showed that _nl_global_locale.__names[category] pointer was corrupted.
Code analysis showed a window for race - when one thread calls
strcmp()(with in setlocale()) with current value of
_nl_global_locale.__names[category] passed as argument and another
thread goes ahead and frees the string pointer pointed by
_nl_global_locale.__names[category].
_nl_global_locale.__names[category] is protected through the lock
libc_setlocale_lock, but the lock is taken only while writing to the
data and not while reading from the global variable;
This lock is taken in freelocale() and setlocale() functions
Though setlocale() is not on the POSIX.1 list of async-signal safe
functions as in section 2.4.3
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04
It still needs to be thread safe according to section 2.9.1 in
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html
-----------------------------------------------------------------
Testing:
Didn't notice any regression with testing. Testing was done under
x86-64 box where the application was built as 32bit.
-----------------------------------------------------------------
Fix:
Similar fix, fixed the problem with distro glibc, where
__libc_lock_lock() is used instead of __libc_rwlock_rdlock()
Thanks
Yeehaw
Index: libc/locale/setlocale.c
===================================================================
--- libc.orig/locale/setlocale.c 2008-03-31 06:07:03.000000000 +0530
+++ libc/locale/setlocale.c 2008-06-12 08:59:45.000000000 +0530
@@ -234,10 +234,14 @@
if (locale == NULL)
return (char *) _nl_global_locale.__names[category];
+ __libc_rwlock_rdlock (__libc_setlocale_lock);
if (strcmp (locale, _nl_global_locale.__names[category]) == 0)
+ {
+ __libc_rwlock_unlock (__libc_setlocale_lock);
/* Changing to the same thing. */
return (char *) _nl_global_locale.__names[category];
-
+ }
+ __libc_rwlock_unlock (__libc_setlocale_lock);
/* We perhaps really have to load some data. So we determine the
path in which to look for the data now. The environment variable
`LOCPATH' must only be used when the binary has no SUID or SGID