spinlock.h timeout causing *** fatal error - add_item abort
Fri Apr 29 16:32:00 GMT 2016
We occasionally see an api_fatal abort during process startup like:
*** fatal error - add_item ("somepath", "/", ...) failed, errno 1
This happens if mountinfo.init(false) in user_info::initialize is
called twice. The error occurs on a second call when trying to add
the root mount point when it already exists.
mountinfo.init is guarded by a "spinlock" object that should only
allow one process to call it. But the spinlock has a timeout. After
15 seconds, it stops waiting and returns a value of 0. The fatal
error can occur if two processes are starting around the same time
and the first process takes a long time in internal_getpwsid(). We've
seen this happen in our environment due to LDAP queries taking a long
time. (Incidentally we are using msys, but code in spinlock.h and
shared.cc looks the same in cygwin).
To solve the aborts it is tempting to make a local fix to remove the
spinlock timeout. I assume there was a rationale for it, and would
like to understand what tradeoff is incurred if we remove it.
Problem reports: http://cygwin.com/problems.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin