[ECOS] Deadlock scenario on close() in io/fileio

Rutger Hofman rutger@cs.vu.nl
Fri Dec 12 09:29:00 GMT 2008


Nick Garnett wrote:
> Rutger Hofman <rutger@cs.vu.nl> writes:
> 
>> Good morning list,
>>
>> I ran into a deadlock with io/fileio on invoking close(fd) in
>> io/fileio/current/src/fd.cxx.
>>
>> The problem arises because close() grabs fd_lock, then calls
>> cyg_fd_free() which in turn calls the file system's fo_close(), which
>> will flush any pending data; this flush invokes write() which will
>> again try to grab fd_lock. Twice the same lock in one thread ->
>> deadlock.
>>
>> The stack trace on deadlock clearly shows this:
>>
>> close()
>>    cyg_fd_free()
>>      lock(fdlock)       <======================================
>>      fd_close()
>>        fp_ucount_dec()
>>          cyg_yaffs_fo_close()
>>            yaffs_close()
>>              yaffs_FlushFile()
>>                yaffs_UpdateObjectHeader()
>>                  readwritev()
>>                    cyg_fp_get()
>>                      lock(fdlock)  <===========================
>>
>> The code of close() even has a comment in the call to cyg_fd_free()
>> that points out that the file's fo_close may be called.
> 
> Is YAFFS recursively calling back into its own entrypoints? Or is it
> calling a different filesytem? I hope the latter, it should be using
> it's own internal interfaces to do anything else.

You are absolutely right. The second I/O is caused by a trace print to 
stdout from YAFFS.

>> I think there are a few possible solutions:
>> 1) a full-fledged continuation mechanism where all locks are released
>> when a layer is left;
>> 2) allow fdlock to be grabbed recursively, as in Java-style
>> synchronized locking;
> 
> These both look horrible. Recursive mutexes are evil and are to be
> resisted at all costs -- they are generally a quick workaround for
> something that should be solved with a bit more thought.

You should tell the Java designers... :-)

>> 3) check if this is the only occurrence of this deadlock scenario, and
>> check if the lock can be released in fp_ucount_dec without impairing
>> atomicity
> 
> As it happens I fixed this, and other similar problems, in the
> eCosCentric sources some time ago. I'll make a proper contribution
> later, but for now try this out and see if it solves your problem:

This is solution 3). It works for my test program. Thanks! I'll strip 
out my recursive mutexes again.

Rutger

> Index: ChangeLog
> ===================================================================
> RCS file: /cvs/ecos/ecos/packages/io/fileio/current/ChangeLog,v
> retrieving revision 1.68
> diff -u -5 -r1.68 ChangeLog
> --- ChangeLog	1 May 2008 09:31:09 -0000	1.68
> +++ ChangeLog	11 Dec 2008 14:25:41 -0000
> @@ -1,5 +1,12 @@
> +2008-12-11  Nick Garnett  <nickg@ecoscentric.com>
> +
> +	* src/fd.cxx (fp_ucount_dec, fd_close): Fix locking strategy so
> +	that the fdlock is free when filesystem close entry is
> +	called. This allows other threads to peform descriptor operations
> +	during the close.
> +
>  2008-04-02 Xinghua Yang <yxinghua@sunnorth.com.cn>
>  	   Andrew Lunn <andrew.lunn@ascom.ch>
>  
>  	* cdl/fileio.cdl: Use CYGPKG_FILEIO_DIRENT_DTYPE to enable/disable
>  	d_type field of struct dirent.
> Index: src/fd.cxx
> ===================================================================
> RCS file: /cvs/ecos/ecos/packages/io/fileio/current/src/fd.cxx,v
> retrieving revision 1.7
> diff -u -5 -r1.7 fd.cxx
> --- src/fd.cxx	7 Apr 2004 11:18:54 -0000	1.7
> +++ src/fd.cxx	11 Dec 2008 14:25:41 -0000
> @@ -147,24 +147,45 @@
>  // These must all be called with the fdlock already locked.
>  
>  //--------------------------------------------------------------------------
>  // Decrement the use count on a file object and if it goes to zero,
>  // close the file and deallocate the file object.
> +//
> +// A word on locking here: It is necessary for the filesystem
> +// fo_close() function to be called with the file lock claimed, but
> +// the fdlock released, to permit other threads to perform fd-related
> +// operations. The original code here took the file lock and released
> +// the fdlock before the call and then locked the fdlock and released
> +// the file lock after. The idea was that there was no point at which
> +// a lock of some sort was not held. However, if two threads are
> +// running through this code simultaneously, this could lead to
> +// deadlock, particularly if the filesystem's syncmode specifies fstab
> +// or mtab level locking. So the code now unlocks the file lock before
> +// reclaiming the fdlock. This leaves a small window where no locks
> +// are held, where in theory some other thread could jump in and mess
> +// things up. However, this is benign; if the other thread is
> +// accessing some other file object there will be no conflict and by
> +// definition no other thread can access this file object since we are
> +// executing here because no file descriptors point to this file
> +// object any longer. Additionally, the file object is only marked
> +// free, by zeroing the f_flag field, once the fdlock has been
> +// reclaimed.
>  
>  static int fp_ucount_dec( cyg_file *fp )
>  {
>      int error = 0;
>      if( (--fp->f_ucount) <= 0 )
>      {        
>          cyg_file_lock( fp, fp->f_syncmode );
> +        FILEIO_MUTEX_UNLOCK(fdlock);
>          
>          error = fp->f_ops->fo_close(fp);
>  
>          cyg_file_unlock( fp, fp->f_syncmode );
> +        FILEIO_MUTEX_LOCK(fdlock);        
>              
> -        if( error == 0 )
> -            fp->f_flag = 0;
> +        fp->f_flag = 0;
>      }
>  
>      return error;
>  }
>  
> @@ -178,21 +199,20 @@
>      cyg_file *fp;
>  
>      CYG_ASSERT(((0 <= fd) && (fd<CYGNUM_FILEIO_NFD)), "fd out of range");    
>  
>      fp = desc[fd];
> +    desc[fd] = FD_ALLOCATED;
>  
>      if( fp != FD_ALLOCATED && fp != NULL)
>      {
>          // The descriptor is occupied, decrement its usecount and
>          // close the file if it goes zero.
>  
>          error = fp_ucount_dec( fp );
>      }
>  
> -    desc[fd] = FD_ALLOCATED;
> -
>      return error;
>  }
>  
>  
>  //==========================================================================
> 
> 
> 


-- 
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss



More information about the Ecos-discuss mailing list