Fixing a scalability issue in OpenSSL error reporting

Florian Weimer
Tue Jun 16 13:22:00 GMT 2015

OpenSSL has its own implementation of thread-local variables (using a
few global locks and a hash table indexed by the address of errno), and
the error state is needed in a few places even if no error occurs which
is visible at the application level.  This turns out to be a major
scaling issue if you have more than a few hundred OpenSSL connections in
one process (I don't know if it applies to servers as well).

Unlike errno, the error state is not of fixed size, so ideally, a
deallocation function would run, releasing the state if the error
information isn't collected before the thread terminates.

The OpenSSL implementation has a function to deallocate the error state
of *another* thread, but that's obviously racy and would have to be
turned into a NOP.

I have come up with several potential approaches:

(a) Use __thread (or C++11 thread_local with a POD) and do not add a
deallocation function.  The downside is a potential memory leak, as
mentioned above.  The existing code already has this problem, though.
Advantage is good portability to older GNU toolchain versions.

(b) Use pthread_setspecific and related functions.  This should offer
even better portability, and there is a destructor function which can
deallocate memory.  The downside is that it currently requires linking
against libpthread, which is something I want OpenSSL cease to do.  A
fully portable solution with pthread_once may lack performance, and
portable atomics more or less require a C++11 compiler outside of the
GNU ecosystem.

(c) Use C++11 thread_local.  This requires linking against libstdc++.  I
don't know if this could have adverse consequences, comparable to
linking against libpthread.  Portability will increase over the time,
something that seems unlikely for (a) and (b).

Solutions involving C++11 might be a difficult sell for OpenSSL
upstream, but I prefer it over reimplementing TLS destructors from
scratch.  The old OpenSSL TLS implementation would still stay around, so
perhaps it's acceptable to compile just one file with a C++11 compiler.
 That's why I'm leaning towards (c), but I'm not sure about the impact
of the libstdc++ dependency.

I also noticed that pther_setspecific destructors do not run for the
main thread.  Is this a bug?

Florian Weimer / Red Hat Product Security

More information about the Libc-help mailing list