Fixing a scalability issue in OpenSSL error reporting
Tue Jun 16 13:22:00 GMT 2015
OpenSSL has its own implementation of thread-local variables (using a
few global locks and a hash table indexed by the address of errno), and
the error state is needed in a few places even if no error occurs which
is visible at the application level. This turns out to be a major
scaling issue if you have more than a few hundred OpenSSL connections in
one process (I don't know if it applies to servers as well).
Unlike errno, the error state is not of fixed size, so ideally, a
deallocation function would run, releasing the state if the error
information isn't collected before the thread terminates.
The OpenSSL implementation has a function to deallocate the error state
of *another* thread, but that's obviously racy and would have to be
turned into a NOP.
I have come up with several potential approaches:
(a) Use __thread (or C++11 thread_local with a POD) and do not add a
deallocation function. The downside is a potential memory leak, as
mentioned above. The existing code already has this problem, though.
Advantage is good portability to older GNU toolchain versions.
(b) Use pthread_setspecific and related functions. This should offer
even better portability, and there is a destructor function which can
deallocate memory. The downside is that it currently requires linking
against libpthread, which is something I want OpenSSL cease to do. A
fully portable solution with pthread_once may lack performance, and
portable atomics more or less require a C++11 compiler outside of the
(c) Use C++11 thread_local. This requires linking against libstdc++. I
don't know if this could have adverse consequences, comparable to
linking against libpthread. Portability will increase over the time,
something that seems unlikely for (a) and (b).
Solutions involving C++11 might be a difficult sell for OpenSSL
upstream, but I prefer it over reimplementing TLS destructors from
scratch. The old OpenSSL TLS implementation would still stay around, so
perhaps it's acceptable to compile just one file with a C++11 compiler.
That's why I'm leaning towards (c), but I'm not sure about the impact
of the libstdc++ dependency.
I also noticed that pther_setspecific destructors do not run for the
main thread. Is this a bug?
Florian Weimer / Red Hat Product Security
More information about the Libc-help