[ECOS] Deadlock scenario on close() in io/fileio

Rutger Hofman rutger@cs.vu.nl
Thu Dec 11 11:43:00 GMT 2008

Good morning list,

I ran into a deadlock with io/fileio on invoking close(fd) in 

The problem arises because close() grabs fd_lock, then calls 
cyg_fd_free() which in turn calls the file system's fo_close(), which 
will flush any pending data; this flush invokes write() which will again 
try to grab fd_lock. Twice the same lock in one thread -> deadlock.

The stack trace on deadlock clearly shows this:

     lock(fdlock)       <======================================
                     lock(fdlock)  <===========================

The code of close() even has a comment in the call to cyg_fd_free() that 
points out that the file's fo_close may be called.

Now, this scenario can be circumvented by having close(fd) first of all 
call fsync(fd), which will enforce the flush before closing. When I 
inserted this fsync(fd) call, the deadlock disappeared.

But I think this is a patchwork solution. There is *no guarantee at all* 
which code fo_close() will choose to call. It might try to flush, to 
open a metanode, etc etc.

I think there are a few possible solutions:
1) a full-fledged continuation mechanism where all locks are released 
when a layer is left;
2) allow fdlock to be grabbed recursively, as in Java-style synchronized 
3) check if this is the only occurrence of this deadlock scenario, and 
check if the lock can be released in fp_ucount_dec without impairing 

1) is a lot of work
2) sounds good to me; the mutex.cxx type could be subclassed
3) Specifically in io/fileio/.../fd.cxx, the call to 
fp->f_ops->fo_close(fp) in fp_ucount_dec(fp) can be done with fdlock 
released. From browsing through the code, I think it seems possible to 
do *all* critical operations *before* any call to fp_ucount_dec(); if 
this is true, fdlock can be unlocked/relocked around the call to 
fo_close() without impairing atomicity. But I am not sure this is the 
only place where this deadlock scenario can occur.

  - what about cyg_file_lock() in fp_ucount_dec() ? Should that also be 
handled in this way?
  - what about LOCK_FILE() everywhere in io.cxx?

Rutger Hofman
VU Amsterdam

