Attempting to start the X server fails "Fatal server error: Can't read lock file /tmp/.X0-lock" http://cygwin.com/ml/cygwin-xfree/2008-11/msg00058.html http://cygwin.com/ml/cygwin-xfree/2008-12/msg00027.html http://cygwin.com/ml/cygwin-xfree/2008-12/msg00019.html http://cygwin.com/ml/cygwin-xfree/2009-01/msg00004.html Adding '-nolock' to the X server options works around this.
Comparing strace output shows CreateHardLink() is failing for some reason Good: 14 1091860 [main] X 4372 fhandler_base::open: 0 = NtCreateFile (0x71C, 20100, C:\cygwin\tmp\.tX1-lock, io, NULL, 0, 7, 1, 4400, NULL, 0) 17 1091877 [main] X 4372 fhandler_base::open: 1 = fhandler_base::open (C:\cygwin\tmp\.tX1-lock, 0x110000) 16 1091893 [main] X 4372 fhandler_base::open_fs: 1 = fhandler_disk_file::open (C:\cygwin\tmp\.tX1-lock, 0x10000) 191 1092084 [main] X 4372 fhandler_base::close: closing '/tmp/.tX1-lock' handle 0x71C 44 1092128 [main] X 4372 link: 0 = link (/tmp/.tX1-lock, /tmp/.X1-lock) Bad: 33 21431 [main] XWin 3360 fhandler_base::open: 0 = NtCreateFile (0x7E0, 20100, C:\cygwin\tmp\.tX0-lock, io, NULL, 0, 7, 1, 4400, NULL, 0) 24 21455 [main] XWin 3360 fhandler_base::open: 1 = fhandler_base::open (C:\cygwin\tmp\.tX0-lock, 0x110000) 32 21487 [main] XWin 3360 fhandler_base::open_fs: 1 = fhandler_disk_file::open (C:\cygwin\tmp\.tX0-lock, 0x10000) 4691 26178 [main] XWin 3360 fhandler_disk_file::link: CreateHardLinkA failed 33 26211 [main] XWin 3360 seterrno_from_win_error: /ext/build/netrel/src/cygwin-1.5.25-15/winsup/cygwin/fhandler_disk_file.cc:893 windows error 3 31 26242 [main] XWin 3360 geterrno_from_win_error: windows error 3 == errno 2 22 26264 [main] XWin 3360 __set_errno: void seterrno_from_win_error(const char*, int, DWORD):310 val 2 32 26296 [main] XWin 3360 fhandler_base::close: closing '/tmp/.tX0-lock' handle 0x7E0 35 26331 [main] XWin 3360 link: -1 = link (/tmp/.tX0-lock, /tmp/.X0-lock)
Thanks for the bug entry; this is definitely something that needs to be worked on. A few questions: 1) Do we have "cygcheck -srv" output for the failed cases? BLODA, a particular OS version, anything else to indicate a common factor? 2) Has anyone with failures on cygwin-1.5 tried this with 1.7?
Does this make any sense: http://cygwin.com/ml/cygwin-xfree/2009-02/msg00121.html
(In reply to comment #3) > Does this make any sense: > > http://cygwin.com/ml/cygwin-xfree/2009-02/msg00121.html Yes, it makes a lot of sense. Thanks for pointing it out.
Do we know any more about this so far?
The problem is that /tmp/.X0-lock is created with the NTFS permissions of the current user. If the next user on the same machine does not have administrator privileges, he is unable to read/remove/reinitialise the file. The work around i have found is the rename the underlying /tmp dir and wait for an administrator to do the clean up. I'm currently seeking how to move the .X0-lock file into the users home directory instead of creating it in the general /tmp.
(In reply to comment #6) > I'm currently seeking how to move the .X0-lock file into the users home > directory instead of creating it in the general /tmp. This is not a solution. The point of the lock file is to prevent two instances of an X server with the same display number from being run at the same time (even by different users). So all users must share the same lock file name.
http://cygwin.com/ml/cygwin-xfree/2010-01/msg00117.html This makes it clear that a problem exists with both lock file and log file if XWin is run by a user with administrator privileges, exited, then run by a user without administrator privileges. This may actually be an upstream issue as X still runs as root on unicies (normally), or maybe we need to do something special to ensure these files get the correct permissions as the Windows level.
Created attachment 4582 [details] Ensure logfile can be opened for writing even if a different user created the last one Ater doing a bit of testing alternating Administrator and non-Adminstrator running of XWin, when XWin is shut-down cleanly the only issue seems to be writing to the logfile. Patch attached.
Created attachment 4583 [details] Move lock files from /tmp to /var/run When X server terminates abnormally, it doesn't tidy up lock file and socket, which cause other problems when alternating Adminstrator and non-Adminstrator user runs. I guess this means we have other crash problems which aren't being reported. /tmp normally has restricted deletion (sticky) bit set, so only the user which created the file can delete it.
Created attachment 4584 [details] Patch for xtrans to not use sticky bit on /tmp/.X11-unix Doing this opens an enormous security hole. Need to think of a better way...
The first patch makes perfect sense, so I'll put that in the queue for 1.7.4. In the second patch, is the #define USE_OPENGL32 an artifact from something else? I don't see how that's related to lock files. I need to think more about the second and third patches. I'm particularly hesitant about the third, given the security implications.
Created attachment 4585 [details] Move lock files from /tmp to /var/run
(In reply to comment #12) > In the second patch, is the #define USE_OPENGL32 an artifact from something else? > I don't see how that's related to lock files. Oops. Yeah, that's a private build fix that slipped in by accident. Update patch attached. > I need to think more about the second and third patches. I'm particularly > hesitant about the third, given the security implications. Yes. The only alternative which immediately occured to me was to move the socket to a user-specific directory owned by the user, but then that would break every old application.
(In reply to comment #14) > > I need to think more about the second and third patches. I'm particularly > > hesitant about the third, given the security implications. > > Yes. The only alternative which immediately occured to me was to move the > socket to a user-specific directory owned by the user, but then that would break > every old application. Perhaps the alternative to the third patch is just to document that you need to use '-nolisten unix' if you need to switch between root and non-root users.
What happens if one user has an XWin :0 *running*, then allows a "Switch User", upon which the second also tries to start XWin (again, :0 by default)? It seems that it still tries to delete the existing XWin.0.log *before* it fails with a dup error, but it shouldn't (and doesn't fully succeed) because the first user is still using it. So I think isn't quite so simple. It's clear that we're not properly dealing with multiuser setups OOTB. Thinking outside the box for a moment (forgive the pun), when no DISPLAY number is specified when launching, instead of that meaning :0, should that instead mean "find me the next available DISPLAY"?
(In reply to comment #16) > What happens if one user has an XWin :0 *running*, then allows a "Switch User", > upon which the second also tries to start XWin (again, :0 by default)? It seems > that it still tries to delete the existing XWin.0.log *before* it fails with a > dup error, but it shouldn't (and doesn't fully succeed) because the first user > is still using it. So I think isn't quite so simple. Hmmm.. maybe the point in time when we open the log file is wrong (as stuff is buffered until then). In any case, the other user could always just do a rm /var/log/Xwin.0.log. > It's clear that we're not properly dealing with multiuser setups OOTB. Thinking > outside the box for a moment (forgive the pun), when no DISPLAY number is > specified when launching, instead of that meaning :0, should that instead mean > "find me the next available DISPLAY"? There's a patch which adds the -displayfd option, which allocates a display number and writes it to the specified fd. But to be useful to us, xinit needs some code to read that display number and use it for the clients it creates. I thought it had been merged upstream, but now I look I don't see it there. http://cvs.fedoraproject.org/viewvc/devel/xorg-x11-server/xserver-1.6.0-displayfd.patch
with the fix for #11774, this should only be happening now: 1) if /tmp is on FAT 2) if the X server crashed leaving behind a stale lock file we can't remove. I have a patch to turn on -nolock automatically if /tmp is on FAT, which should address 1). 2) would be fixed by having the lock file have delete-on-close behaviour. I guess upstream doesn't have this simply because X usually runs as root.
(In reply to comment #15) > (In reply to comment #14) > Perhaps the alternative to the third patch is just to document that you need to > use '-nolisten unix' if you need to switch between root and non-root users. Ofc, this is in itself a possible security hole, since it allows someone else to start a server '-nolock -nolisten tcp', which would get connected intended for that server if the client was using DISPLAY of just :0.0 with no explicit protocol specified.
(In reply to comment #19) > Ofc, this is in itself a possible security hole, since it allows someone else to > start a server '-nolock -nolisten tcp', which would get connected intended for > that server if the client was using DISPLAY of just :0.0 with no explicit > protocol specified. Not AFAICS; if you use -nolisten unix, then you *must* use DISPLAY=127.0.0.1:0, just DISPLAY=:0 won't work.
http://cygwin.com/ml/cygwin-xfree/2010-08/msg00090.html Another multiuser problem: if /var/log/ has 1777 permissions (which is what setup seems to set), a non-Admin user can't delete the previous logfile.
(In reply to comment #18) > with the fix for #11774, this should only be happening now: > > 1) if /tmp is on FAT > 2) if the X server crashed leaving behind a stale lock file we can't remove. A long-standing problem with XWin not exiting cleanly on WM_ENDSESSION causing it to leave a stale lock file behind was fixed in 1.10.1-1 > I have a patch to turn on -nolock automatically if /tmp is on FAT, which should > address 1). This patch was added in 1.10.2-1 So, the only situations were we might have this problem now is when XWin run by another user crashed, leaving a stale lock file we are prevented from removing by permissions. The real fix for that is to arrange for the lock file to be deleted-on-close, or change to use some locking resource which is released on process exit (e.g. posix named semaphores). There's also probably more we can do to make multiple X servers on TS work OOTB, or at least better document the needed configuration, but that should be a separate defect.