swapcontext() slow

Adhemerval Zanella adhemerval.zanella@linaro.org
Fri Jan 22 16:55:00 GMT 2016



On 22-01-2016 08:53, Stas Sergeev wrote:
> 21.01.2016 19:40, Mike Frysinger пишет:
>> On 21 Jan 2016 16:10, Stas Sergeev wrote:
>>> I am implementing the user-space cooperative
>>> threading with swapcontext(), but it is quite slow
>>> because swapcontext() calls sigprocmask().
>>>
>>> Firstly, I'd like to know the reason of this.
>>> Is this so that (1) every coroutine can have its separate
>>> signal mask, or is it to (2) allow switching in/out of a
>>> signal handler?
>>
>> because the specification requires it:
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/setcontext.html
> There should be the reason why the specification requires it.
> And the reason appears to be stated there too, which is the use
> with the signal handlers.
> 
>>> I can think of the possible work-arounds, depending on an
>>> answer to the above question.
>>> If (1) is true, then perhaps the "light" version of
>>> swapcontext()/setcontext() can be added that do not call
>>> sigprocmask(). If (2) is true, then perhaps the vDSO can
>>> be introduced to get the current signal mask. Then glibc
>>> could compare the old and new masks and not call sigprocmask()
>>> when not needed.
>>>
>>> Would some optimization be possible?
>>> It would be very good to have the user-space threads
>>> lightweight, not calling to the kernel at all when possible.
>>
>> these functions are deprecated/dead -- they no longer exist in the latest
>> POSIX specification.  the preference would be to stop using them.  i think
>> we might consider dropping them in a future glibc version.
> You are kidding perhaps.
> Many projects use them. qemu, for one, uses them directly, and
> there are the indirect users via libpcl, libpth and many coroutine libs.
> qemu is trying the mix of swapcontext+longjmp to get a reasonable
> speed, but I use libpcl which doesn't do that optimization.
> 

Because as you have noted swapcontext is not the best way to create
coroutines/green threads.  Besides the performance issue related with
signal mask, you have a different meaning of context depending of the
underlying platform. On powerpc, for instance, it will save not only
all the GPR registers, but also de FPR and, if machine have altivec
support, the SIMD register as well.

And you also have interoperability with kernel level threads (pthreads).
TLS is the most prominent, since if you swap its context onto a different
thread it's bound to misbehave (and glibc internally uses TLS for a set
of different things). You also have issues like profiling and debug, since
the debugger does not know either thread or stack information; and
thread management (nice, tkill).

That's why recent projects try to either add help from kernel (for instance
Google proposal of user-level threads [1]) or implement green threads
by a constricted VM interface (go, erlang, etc.). And I see that it is
why also POSIX has deprecated these functions.

As I said in another example, you can either provide your own API using
specific architecture/ABI code or use some more well tests like
boost::coroutines. Either way, restricting to a C/C++ interface you will
still have the interoperability issues I noted.

[1] https://www.youtube.com/watch?v=KXuZi9aeGTw#t=519



More information about the Libc-help mailing list