untarring symlinks with ../ fails randomly, silghtly OT

Ryan Johnson ryan.johnson@cs.utoronto.ca
Mon Jul 4 15:05:00 GMT 2011


On 04/07/2011 8:21 AM, Ryan Johnson wrote:
> On 04/07/2011 7:33 AM, Corinna Vinschen wrote:
>> On Jul  4 06:56, Ryan Johnson wrote:
>>> On 04/07/2011 6:46 AM, Corinna Vinschen wrote:
>>>> On Jul  4 11:15, Wolf Geldmacher wrote:
>>>>> As an aside:
>>>>>     I also used to have some trouble with "rm -rf" of a directory
>>>>>     hierarchy failing more or less reproducibly (like: 80% of the
>>>>>     time) because files were presumably still "in use". Repeating
>>>>>     the command several times would succeed, though.
>>>>>
>>>>>     Downgrading from cygwin1.dll/1.7.9.1 to cygwin1.dll/1.7.8.1
>>>>>     seems to have solved that issue as well - still have to see
>>>>>     the first "retry to delete".
>>>>>
>>>>> This may or may not be related to the original report, as it also 
>>>>> reeks
>>>>> of a race condition during file/directory operations.
>>>> I can neither reproduce the tar problem, nor can I reprocude the rm
>>>> problem.  I tried this under 2008R2 which is basically the same as 
>>>> your
>>>> W7-64 bit.  I used local and remote drives to test the issue but to no
>>>> avail.
>>>>
>>>> Are you sure this isn't a BLODA problem which is triggered by the
>>>> changes in 1.7.9?
>>>>
>>>> I just took a look through the changes between 1.7.8 and 1.7.9, and
>>>> the list of changes which affect filesystem access is pretty small:
>>>>
>>>> [snip]
>>>>
>>>> So, is it possible that the request for WRITE_DAC access in the 
>>>> call to
>>>> NtCreateFile triggers some hiccup of your virus checker?  It could 
>>>> easily
>>>> explain both effects.
>>> I have also seen the rm -rf problem occasionally on my w7-64
>>> machine, and I don't think anything from BLODA is installed.
>> Also with 1.7.8?  Given the minor number of FS-related changes, it's
>> so very unlikely that they would cause a differnce between 1.7.8 and
>> 1.7.9.
>>
>>> However, I haven't noticed the issue since disabling the search
>>> indexer on my machine. I did this on the hunch that I often delete
>>> large directory trees which aren't very old (e.g. after
>>> untar/configure/make of some source package), and that it wouldn't
>>> be a big surprise if indexing and cygwin's rm don't mix for whatever
>>> reason.
>> Hard to imagine that setting the WRITE_DAC flag would interfere with the
>> search indexer.  On second thought, the flag is only set if a file does
>> not exist yet and NtCreateFile gets called to create the file.  That
>> makes it especially unlikely that this would affect unlinking.
>>
>> However, given that you can reproduce the issue, could you test the
>> scenario again?  If the issue occurs, can you disable the following code
>> in fhandler.cc and see if it changes anything?
>>
>> 616  else if (!exists ()&&  has_acls ())
>> 617    /* If we are about to create the file and the filesystem supports
>> 618       ACLs, we will overwrite the DACL after the call to 
>> NtCreateFile.
>> 619       This requires a handle with additional WRITE_DAC access,
>> 620       otherwise set_file_sd has to open the file again. */
>> 621    access |= WRITE_DAC;
>>
> Sorry, I have no idea which version of the dll I had at the time. It 
> was at least a month ago, maybe more.
>
> However, I was wrong about not seeing the problem since. Choosing a 
> random source dir to blow away:
>> $ rm -rf Python-2.6.6
>> rm: cannot remove `Python-2.6.6/Lib/lib2to3/tests': Directory not empty
>> $ rm -rf Python-2.6.6
>> $
>
> This seems to happen more than half the time (different non-empty dir 
> every time). Naturally, running under strace makes the problem go away 
> (it doesn't help that strace kills stderr, where any error messages 
> might have gone).
>
> Running the following command 10x:
>
> $ tar -xaf Python-2.6.6.tar.bz2 && sleep 3 && (rm -rf Python-2.6.6 || 
> (echo 'Retrying...' && rm -rf Python-2.6.6))
>
> I get six times with no error, two times with one error, one time each 
> with two and three errors.
>
> I'm currently updating and rebuilding my cygwin sources to try out 
> your patch...
Updated, built, and reproduced, with and without the patch. If anything 
it's more common in my dev build -- it happened on the first try both times.

Any idea of how to debug this? We need some instantaneous version of 
lsof or something...

Ryan

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple



More information about the Cygwin mailing list