This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

1.7: mv: Device or resource busy


I would like to report a strange behavior. It seems intermittent
by nature, but i succeeded to make it constantly reproducible (at
least on my PC). So please don't quit.

Under some circumstances (detailed below), the command:

mv file1 file2
mv: cannot move `file1' to `file2': Device or resource busy

with a return code 1 (and logically, nothing is done).

I can tell in what circumstances the problems does *not* seem to happen:
a) when `file1' is with mode 644, the problem never happens (i believe
   this is 100% true)
b) when `file1' does not begin with `MZ' (signature of an executable)
   the problem never happens (i believe this is 90% true)
c) when i use my second Cygwin box (on top of XP SP2) the problem
   never happens (i believe this is 99% true)
d) when `mv file1 file2' has failed (therefore `file1' is still there),
   a second try will almost always succeed
e) when `mv file1 file2' has succeeded (perhaps having failed the first
   time), if the filename `file1' is reused (with the same or a
   different content, but still beginning with `MZ'),
   the next `mv file1 file2' will never fail. However, if
   the `dirname' of `file1' is emptied, then removed, then mkdir'ed
   again, the problem will surely happen again (i believe this is 98%
f) when the file1 is big (eg a copy of /usr/bin/emacs-X11.exe 15Mb),
   the problem happens almost always; when the file1 is small
   (eg a copy of /usr/bin/ldh.exe, 1536bytes), the problem happens
   almost never. I never observed the problem with a `file1' less than
   1536 bytes.
g) i checked but didn't notice any incidence of:
   - suffix of `file1' or `file2' (i tried .exe, .xxx, .yyy, with more
     or less the same results, even with no suffix; i discover now,
     writing this message, that i didn't test any .dll files, i'll do
     that tomorrow)
   - whether `file2' already exists or not
   - if the `mv' command is called from within /usr/bin/tcsh or
     from /bin/sh
   - whether `file1' and `file2' are in the same directory (however, i
     never tested with `file2' outside the filesystem)

An important thing is that i also noticed that the access time of
`file1' is always updated in case of mode 755. Why should it be the case?
Does it conform to the standards (POSIX?)?

To investigate, i tried all the following, with no noticeable
- i removed McAfee Virusscan(8.5i P6): no change
- several reboots: no change
- i switched back to Cygwin 1.7.0-61: same results
- i had a look in the cygwin-1.7.0-62 sources (, path.h,
  - i first suspected NtOpenFile() (line 651 in
    if an signedness inconsistency exists in NtOpenFile signature,
    the NtClose could remain uncalled
  - i tried to use the ntdll.dll from my second Cygwin box (but i
    didn't manage to make it work inside my first Cygwin box)
  - finally, i had a deeper look into the code and found that
    if _check_for_executable is set to false, the files are not
    searched into, and i poked byte 0 into cygwin1.dll at the right
    % cmp cygwin1.dll.original cygwin1.dll.poked
    1323689   1   0
    This last action with absolutely no change. This disappointed me a
    lot because i was then absolutely sure that McAfee was the culprit.
    But since then, i removed McAfee, and the problem is still there...
    (by the way, how can we print the cwdstuff structure?)

The only improvement i got, uninterestingly is by using either:
- my second Cygwin box (no error never here, with many tests performed)
- Cygwin 1.5.25-15, that i reinstalled on my first Cygwin box (no errors
  showed up, however with not so many tests performed)

How to reproduce if you want to: Use this piece of shell and modify as needed: ------------------------------------------ #!/bin/sh

variant="x`date +%M%S`"
echo $variant

# select one of these
origfile="/usr/bin/xpdf.exe" # 1308kb
origfile="/usr/bin/banner.exe" # 8kb
origfile="/usr/bin/ldh.exe" # 1536b
origfile="/usr/bin/xpdf.exe" # 1308kb
origfile="/usr/bin/diff.exe" # 105kb
origfile="/usr/bin/emacs-X11.exe" # 15Mb


if true; then
  rm -f ${file1} # to be sure
  cp ${origfile} ${file1} # don't want to kill your Cygwin binaries
  #chmod 644 ${file1} # uncommented and mv will *never* fail
  #date;sleep 3;date # uncommented to check atime update (see `ls')
  ls -ilu --full-time ${file1}

mv ${file1} ${file2}
echo "rc=$?"
#date;sleep 3;date # uncommented to check atime update (see `ls')
ls -ilu --full-time ${file1} ${file2} 2> /dev/null # one is there, one is missing, $rc above says which

rm -f ${file2} # kill file2 and the remaining file1's (ie *xxx, see above) will expose the failures


Figures: i just tested with xpdf.exe (1.3Mb), and it failed 19 times out of 20. i just tested with banner.exe (8192b), and it failed 3 times out of 20. i just tested with diff.exe (105kb), and it failed 20 times out of 20.

My environment is:
I have two Cygwin boxes, the first is a Fujitsu laptop with XP SP3, the
second is an HP tower with XP SP2. Both with an NTFS disk (150Gb), all
the tests have been performed within this NTFS disk, under (unless
otherwise mentioned) Cygwin 1.7.0-62, with all the packages installed.
I also never noticed any change in the inodes (see above `ls -i').

My interpretation of the symptoms is as follows (ie if i had to
reproduce this behavior inside a program of my own, i would do the
Let's suppose we only have executable (755) files beginning with MZ,
on my first Cygwin box.
In my opinion, each slot in a directory would some room for a boolean
variable initially set to 0, meaning "this entry is not executable
or i don't know". This boolean would be used only in case of a `mv'
command which would use this particular directory slot as the first
parameter. When the `mv' command is launched, two cases:
- if the boolean is set to 1 (meaning: "this entry has been established
  to be executable"), the command would be performed normally, the
  file is moved, the directory entry remains with the boolean set
  to 1, with no file inside (since the `mv' succeeded)
- if the boolean is set to 0, two processes would be running
  - the normal process of mv, which (if successful) finally has to
    rename() file1
  - an unknown process which reads the content of file1, updates
    the access time of file1, turns the boolean into 1, and takes
    more time if the content of file1 (or the size indicated in
    the directory slot, who knows?) is large and less time if the
    content of file1 is small;
    Also: this unknown process starts after having received the 'y' in
    case of `mv -i'.
  If the winner of this race is the normal process of mv, we get:
  Device or resource busy.
  If the winner is the unknown process, the mv is performed normally.
  In any case, the boolean is set to 1.

The above mechanism does not exactly seem 100% correct, since we can
observe that the access time is also updated when the boolean has
previously been set to 1: the unknown process would probably need
to be launched at each instance of (this kind of) mv, but must return
very quickly if the boolean is already set to 1.

How to solve this? How to make my first box behave like the second
one (ie never fail)? At least, did you manage to reproduce this?

Thank you for your time.

Denis Excoffier.

P.S. For your information and to be the most comprehensive, at least two
classical packages (`tcl8.6b1' and `openssl-1.0.0-beta3') have their
`make install' to fail with exactly this error (to be exact, the
`make install' from the tcl package does not fail since the error
is not caught, but the final copy is not performed).

-- Problem reports: FAQ: Documentation: Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]