spinlock.h timeout causing *** fatal error - add_item abort

Kevin Nomura knomura@vmware.com
Fri Apr 29 16:32:00 GMT 2016


We occasionally see an api_fatal abort during process startup like:

*** fatal error - add_item ("somepath", "/", ...) failed, errno 1

This happens if mountinfo.init(false) in user_info::initialize is
called twice.  The error occurs on a second call when trying to add
the root mount point when it already exists.

mountinfo.init is guarded by a "spinlock" object that should only
allow one process to call it.  But the spinlock has a timeout.  After
15 seconds, it stops waiting and returns a value of 0.  The fatal
error can occur if two processes are starting around the same time
and the first process takes a long time in internal_getpwsid().  We've
seen this happen in our environment due to LDAP queries taking a long
time.  (Incidentally we are using msys, but code in spinlock.h and
shared.cc looks the same in cygwin).

To solve the aborts it is tempting to make a local fix to remove the
spinlock timeout.  I assume there was a rationale for it, and would
like to understand what tradeoff is incurred if we remove it.

- Kevin

Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

More information about the Cygwin mailing list