Bug 9778 - Fatal server error: Can't read lock file /tmp/.X0-lock
Summary: Fatal server error: Can't read lock file /tmp/.X0-lock
Status: RESOLVED FIXED
Alias: None
Product: cygwin
Classification: Unclassified
Component: Cygwin/X (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Yaakov Selkowitz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-23 15:23 UTC by Jon Turney
Modified: 2011-07-01 22:47 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Ensure logfile can be opened for writing even if a different user created the last one (659 bytes, patch)
2010-02-09 17:46 UTC, Jon Turney
Details | Diff
Move lock files from /tmp to /var/run (921 bytes, patch)
2010-02-09 17:49 UTC, Jon Turney
Details | Diff
Patch for xtrans to not use sticky bit on /tmp/.X11-unix (1.04 KB, patch)
2010-02-09 17:54 UTC, Jon Turney
Details | Diff
Move lock files from /tmp to /var/run (690 bytes, patch)
2010-02-09 19:50 UTC, Jon Turney
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jon Turney 2009-01-23 15:23:16 UTC
Attempting to start the X server fails "Fatal server error: Can't read lock file
/tmp/.X0-lock"

http://cygwin.com/ml/cygwin-xfree/2008-11/msg00058.html
http://cygwin.com/ml/cygwin-xfree/2008-12/msg00027.html
http://cygwin.com/ml/cygwin-xfree/2008-12/msg00019.html
http://cygwin.com/ml/cygwin-xfree/2009-01/msg00004.html

Adding '-nolock' to the X server options works around this.
Comment 1 Jon Turney 2009-01-23 16:36:33 UTC
Comparing strace output shows CreateHardLink() is failing for some reason

Good:

   14 1091860 [main] X 4372 fhandler_base::open: 0 = NtCreateFile (0x71C, 20100,
C:\cygwin\tmp\.tX1-lock, io, NULL, 0, 7, 1, 4400, NULL, 0)
   17 1091877 [main] X 4372 fhandler_base::open: 1 = fhandler_base::open
(C:\cygwin\tmp\.tX1-lock, 0x110000)
   16 1091893 [main] X 4372 fhandler_base::open_fs: 1 = fhandler_disk_file::open
(C:\cygwin\tmp\.tX1-lock, 0x10000)
  191 1092084 [main] X 4372 fhandler_base::close: closing '/tmp/.tX1-lock'
handle 0x71C
   44 1092128 [main] X 4372 link: 0 = link (/tmp/.tX1-lock, /tmp/.X1-lock)

Bad:

   33   21431 [main] XWin 3360 fhandler_base::open: 0 = NtCreateFile (0x7E0,
20100, C:\cygwin\tmp\.tX0-lock, io, NULL, 0, 7, 1, 4400, NULL, 0)
   24   21455 [main] XWin 3360 fhandler_base::open: 1 = fhandler_base::open
(C:\cygwin\tmp\.tX0-lock, 0x110000)
   32   21487 [main] XWin 3360 fhandler_base::open_fs: 1 =
fhandler_disk_file::open (C:\cygwin\tmp\.tX0-lock, 0x10000)
 4691   26178 [main] XWin 3360 fhandler_disk_file::link: CreateHardLinkA failed
   33   26211 [main] XWin 3360 seterrno_from_win_error:
/ext/build/netrel/src/cygwin-1.5.25-15/winsup/cygwin/fhandler_disk_file.cc:893
windows error 3
   31   26242 [main] XWin 3360 geterrno_from_win_error: windows error 3 == errno 2
   22   26264 [main] XWin 3360 __set_errno: void seterrno_from_win_error(const
char*, int, DWORD):310 val 2
   32   26296 [main] XWin 3360 fhandler_base::close: closing '/tmp/.tX0-lock'
handle 0x7E0
   35   26331 [main] XWin 3360 link: -1 = link (/tmp/.tX0-lock, /tmp/.X0-lock)
Comment 2 Yaakov Selkowitz 2009-01-23 18:29:35 UTC
Thanks for the bug entry; this is definitely something that needs to be worked on.

A few questions:

1) Do we have "cygcheck -srv" output for the failed cases?  BLODA, a particular
OS version, anything else to indicate a common factor?

2) Has anyone with failures on cygwin-1.5 tried this with 1.7?
Comment 3 Yaakov Selkowitz 2009-02-13 00:52:58 UTC
Does this make any sense:

http://cygwin.com/ml/cygwin-xfree/2009-02/msg00121.html
Comment 4 Jon Turney 2009-02-13 18:15:20 UTC
(In reply to comment #3)
> Does this make any sense:
> 
> http://cygwin.com/ml/cygwin-xfree/2009-02/msg00121.html

Yes, it makes a lot of sense.  Thanks for pointing it out.

Comment 5 Yaakov Selkowitz 2009-06-25 02:03:46 UTC
Do we know any more about this so far?
Comment 6 Jan K 2009-11-23 10:06:16 UTC
The problem is that /tmp/.X0-lock is created with the NTFS permissions of the
current user. If the next user on the same machine does not have administrator
privileges, he is unable to read/remove/reinitialise the file.

The work around i have found is the rename the underlying /tmp dir and wait for
an administrator to do the clean up.

I'm currently seeking how to move the .X0-lock file into the users home
directory instead of  creating it in the general /tmp.
Comment 7 Jon Turney 2009-11-23 10:34:26 UTC
(In reply to comment #6)
> I'm currently seeking how to move the .X0-lock file into the users home
> directory instead of  creating it in the general /tmp.

This is not a solution.  The point of the lock file is to prevent two instances
of an X server with the same display number from being run at the same time
(even by different users).  So all users must share the same lock file name.
Comment 8 Jon Turney 2010-02-03 19:37:06 UTC
http://cygwin.com/ml/cygwin-xfree/2010-01/msg00117.html

This makes it clear that a problem exists with both lock file and log file if
XWin is run by a user with administrator privileges, exited, then run by a user
without administrator privileges.

This may actually be an upstream issue as X still runs as root on unicies
(normally), or maybe we need to do something special to ensure these files get
the correct permissions as the Windows level.
Comment 9 Jon Turney 2010-02-09 17:46:07 UTC
Created attachment 4582 [details]
Ensure logfile can be opened for writing even if a different user created the last one

Ater doing a bit of testing alternating Administrator and non-Adminstrator
running of XWin, when XWin is shut-down cleanly the only issue seems to be
writing to the logfile.  Patch attached.
Comment 10 Jon Turney 2010-02-09 17:49:46 UTC
Created attachment 4583 [details]
Move lock files from /tmp to /var/run

When X server terminates abnormally, it doesn't tidy up lock file and socket,
which cause other problems when alternating Adminstrator and non-Adminstrator
user runs.

I guess this means we have other crash problems which aren't being reported.

/tmp normally has restricted deletion (sticky) bit set, so only the user which
created the file can delete it.
Comment 11 Jon Turney 2010-02-09 17:54:55 UTC
Created attachment 4584 [details]
Patch for xtrans to not use sticky bit on /tmp/.X11-unix

Doing this opens an enormous security hole.  Need to think of a better way...
Comment 12 Yaakov Selkowitz 2010-02-09 19:45:10 UTC
The first patch makes perfect sense, so I'll put that in the queue for 1.7.4.

In the second patch, is the #define USE_OPENGL32 an artifact from something else?  
I don't see how that's related to lock files.

I need to think more about the second and third patches.  I'm particularly 
hesitant about the third, given the security implications.
Comment 13 Jon Turney 2010-02-09 19:50:16 UTC
Created attachment 4585 [details]
Move lock files from /tmp to /var/run
Comment 14 Jon Turney 2010-02-09 19:55:45 UTC
(In reply to comment #12)
> In the second patch, is the #define USE_OPENGL32 an artifact from something
else?  
> I don't see how that's related to lock files.

Oops.  Yeah, that's a private build fix that slipped in by accident. Update
patch attached.
 
> I need to think more about the second and third patches.  I'm particularly 
> hesitant about the third, given the security implications.

Yes.  The only alternative which immediately occured to me was to move the
socket to a user-specific directory owned by the user, but then that would break
every old application.
Comment 15 Jon Turney 2010-02-10 14:32:24 UTC
(In reply to comment #14)
> > I need to think more about the second and third patches.  I'm particularly 
> > hesitant about the third, given the security implications.
> 
> Yes.  The only alternative which immediately occured to me was to move the
> socket to a user-specific directory owned by the user, but then that would break
> every old application.

Perhaps the alternative to the third patch is just to document that you need to
use '-nolisten unix' if you need to switch between root and non-root users.

 

Comment 16 Yaakov Selkowitz 2010-02-19 19:18:13 UTC
What happens if one user has an XWin :0 *running*, then allows a "Switch User", 
upon which the second also tries to start XWin (again, :0 by default)?  It seems 
that it still tries to delete the existing XWin.0.log *before* it fails with a 
dup error, but it shouldn't (and doesn't fully succeed) because the first user 
is still using it.  So I think isn't quite so simple.

It's clear that we're not properly dealing with multiuser setups OOTB.  Thinking 
outside the box for a moment (forgive the pun), when no DISPLAY number is 
specified when launching, instead of that meaning :0, should that instead mean 
"find me the next available DISPLAY"?
Comment 17 Jon Turney 2010-02-19 20:50:22 UTC
(In reply to comment #16)
> What happens if one user has an XWin :0 *running*, then allows a "Switch User", 
> upon which the second also tries to start XWin (again, :0 by default)?  It seems 
> that it still tries to delete the existing XWin.0.log *before* it fails with a 
> dup error, but it shouldn't (and doesn't fully succeed) because the first user 
> is still using it.  So I think isn't quite so simple.

Hmmm.. maybe the point in time when we open the log file is wrong (as stuff is
buffered until then).

In any case, the other user could always just do a rm /var/log/Xwin.0.log.

> It's clear that we're not properly dealing with multiuser setups OOTB.  Thinking 
> outside the box for a moment (forgive the pun), when no DISPLAY number is 
> specified when launching, instead of that meaning :0, should that instead mean 
> "find me the next available DISPLAY"?

There's a patch which adds the -displayfd option, which allocates a display
number and writes it to the specified fd.  But to be useful to us, xinit needs
some code to read that display number and use it for the clients it creates.

I thought it had been merged upstream, but now I look I don't see it there.

http://cvs.fedoraproject.org/viewvc/devel/xorg-x11-server/xserver-1.6.0-displayfd.patch

Comment 18 Jon Turney 2010-08-06 12:15:46 UTC
with the fix for #11774, this should only be happening now:

1) if /tmp is on FAT
2) if the X server crashed leaving behind a stale lock file we can't remove.

I have a patch to turn on -nolock automatically if /tmp is on FAT, which should
address 1).

2) would be fixed by having the lock file have delete-on-close behaviour.  I
guess upstream doesn't have this simply because X usually runs as root.
Comment 19 Jon Turney 2010-08-12 16:04:01 UTC
(In reply to comment #15)
> (In reply to comment #14)
> Perhaps the alternative to the third patch is just to document that you need to
> use '-nolisten unix' if you need to switch between root and non-root users.

Ofc, this is in itself a possible security hole, since it allows someone else to
start a server '-nolock -nolisten tcp', which would get connected intended for
that server if the client was using DISPLAY of just :0.0 with no explicit
protocol specified.
Comment 20 Yaakov Selkowitz 2010-08-12 18:39:29 UTC
(In reply to comment #19)
> Ofc, this is in itself a possible security hole, since it allows someone else to
> start a server '-nolock -nolisten tcp', which would get connected intended for
> that server if the client was using DISPLAY of just :0.0 with no explicit
> protocol specified.

Not AFAICS; if you use -nolisten unix, then you *must* use DISPLAY=127.0.0.1:0, 
just DISPLAY=:0 won't work.
Comment 21 Jon Turney 2010-08-13 15:17:32 UTC
http://cygwin.com/ml/cygwin-xfree/2010-08/msg00090.html

Another multiuser problem: if /var/log/ has 1777 permissions (which is what
setup seems to set), a non-Admin user can't delete the previous logfile.
Comment 22 Jon Turney 2011-07-01 22:46:44 UTC
(In reply to comment #18)
> with the fix for #11774, this should only be happening now:
> 
> 1) if /tmp is on FAT
> 2) if the X server crashed leaving behind a stale lock file we can't remove.

A long-standing problem with XWin not exiting cleanly on WM_ENDSESSION causing it to leave a stale lock file behind was fixed in 1.10.1-1
 
> I have a patch to turn on -nolock automatically if /tmp is on FAT, which should
> address 1).

This patch was added in 1.10.2-1

So, the only situations were we might have this problem now is when XWin run by another user crashed, leaving a stale lock file we are prevented from removing by permissions.

The real fix for that is to arrange for the lock file to be deleted-on-close, or change to use some locking resource which is released on process exit (e.g. posix named semaphores).

There's also probably more we can do to make multiple X servers on TS work OOTB, or at least better document the needed configuration, but that should be a separate defect.