[ECOS] Deadlock scenario on close() in io/fileio

Rutger Hofman rutger@cs.vu.nl
Thu Dec 11 11:43:00 GMT 2008


Good morning list,

I ran into a deadlock with io/fileio on invoking close(fd) in 
io/fileio/current/src/fd.cxx.

The problem arises because close() grabs fd_lock, then calls 
cyg_fd_free() which in turn calls the file system's fo_close(), which 
will flush any pending data; this flush invokes write() which will again 
try to grab fd_lock. Twice the same lock in one thread -> deadlock.

The stack trace on deadlock clearly shows this:

close()
   cyg_fd_free()
     lock(fdlock)       <======================================
     fd_close()
       fp_ucount_dec()
         cyg_yaffs_fo_close()
           yaffs_close()
             yaffs_FlushFile()
               yaffs_UpdateObjectHeader()
                 readwritev()
                   cyg_fp_get()
                     lock(fdlock)  <===========================

The code of close() even has a comment in the call to cyg_fd_free() that 
points out that the file's fo_close may be called.

Now, this scenario can be circumvented by having close(fd) first of all 
call fsync(fd), which will enforce the flush before closing. When I 
inserted this fsync(fd) call, the deadlock disappeared.

But I think this is a patchwork solution. There is *no guarantee at all* 
which code fo_close() will choose to call. It might try to flush, to 
open a metanode, etc etc.

I think there are a few possible solutions:
1) a full-fledged continuation mechanism where all locks are released 
when a layer is left;
2) allow fdlock to be grabbed recursively, as in Java-style synchronized 
locking;
3) check if this is the only occurrence of this deadlock scenario, and 
check if the lock can be released in fp_ucount_dec without impairing 
atomicity

1) is a lot of work
2) sounds good to me; the mutex.cxx type could be subclassed
3) Specifically in io/fileio/.../fd.cxx, the call to 
fp->f_ops->fo_close(fp) in fp_ucount_dec(fp) can be done with fdlock 
released. From browsing through the code, I think it seems possible to 
do *all* critical operations *before* any call to fp_ucount_dec(); if 
this is true, fdlock can be unlocked/relocked around the call to 
fo_close() without impairing atomicity. But I am not sure this is the 
only place where this deadlock scenario can occur.

Questions:
  - what about cyg_file_lock() in fp_ucount_dec() ? Should that also be 
handled in this way?
  - what about LOCK_FILE() everywhere in io.cxx?

Rutger Hofman
VU Amsterdam

-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss



More information about the Ecos-discuss mailing list