Fixing a scalability issue in OpenSSL error reporting

Adhemerval Zanella
Tue Jun 16 19:44:00 GMT 2015


On 16-06-2015 10:22, Florian Weimer wrote:
> OpenSSL has its own implementation of thread-local variables (using a
> few global locks and a hash table indexed by the address of errno), and
> the error state is needed in a few places even if no error occurs which
> is visible at the application level.  This turns out to be a major
> scaling issue if you have more than a few hundred OpenSSL connections in
> one process (I don't know if it applies to servers as well).
> Unlike errno, the error state is not of fixed size, so ideally, a
> deallocation function would run, releasing the state if the error
> information isn't collected before the thread terminates.
> The OpenSSL implementation has a function to deallocate the error state
> of *another* thread, but that's obviously racy and would have to be
> turned into a NOP.
> I have come up with several potential approaches:
> (a) Use __thread (or C++11 thread_local with a POD) and do not add a
> deallocation function.  The downside is a potential memory leak, as
> mentioned above.  The existing code already has this problem, though.
> Advantage is good portability to older GNU toolchain versions.
> (b) Use pthread_setspecific and related functions.  This should offer
> even better portability, and there is a destructor function which can
> deallocate memory.  The downside is that it currently requires linking
> against libpthread, which is something I want OpenSSL cease to do.  A
> fully portable solution with pthread_once may lack performance, and
> portable atomics more or less require a C++11 compiler outside of the
> GNU ecosystem.
> (c) Use C++11 thread_local.  This requires linking against libstdc++.  I
> don't know if this could have adverse consequences, comparable to
> linking against libpthread.  Portability will increase over the time,
> something that seems unlikely for (a) and (b).
> Solutions involving C++11 might be a difficult sell for OpenSSL
> upstream, but I prefer it over reimplementing TLS destructors from
> scratch.  The old OpenSSL TLS implementation would still stay around, so
> perhaps it's acceptable to compile just one file with a C++11 compiler.
>  That's why I'm leaning towards (c), but I'm not sure about the impact
> of the libstdc++ dependency.
> I also noticed that pther_setspecific destructors do not run for the
> main thread.  Is this a bug?

I would go for a solution to use a static limited buffer for each thread,
which unfortunately would require some work in OpenSSL internals.  I would
avoid link or build using C++11 just because I do not see why adding C++11
linkage to a C-only project (besides the memory burden on lining against
C++ libraries).  

>From what I could get on default 64-bits architectures 'ERR_STATE' size
is alone 600 bytes plus the dynamic plus the dynamic 'err_data' and
'err_file' fields.

The 'ERR_set_error_data' sets as default for 'err_data' a string of
size 80, but it might resize.  The 'ERR_put_error' also uses a dynamically
field for 'err_file'.

So if you limit the 'err_data' fields for about 128 bytes and 'err_file'
for '256' you would have about a 600 + 16*(128) + 16*(256) -> 6744 bytes
per thread. It is somewhat high for some embedded environments and we can
add some tunables in configure to enable/disable this or tune the sizes.

It might give the errors codes some less flexibility, but it can be tuned
to balance memory usage and fields usage.

More information about the Libc-help mailing list