This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Crash in g_file_monitor on 32-bit Cygwin

On 10/14/2014 12:26 PM, Ken Brown wrote:
On 6/28/2014 7:08 AM, Ken Brown wrote:
On 6/27/2014 1:52 PM, Yaakov Selkowitz wrote:
On 2014-06-27 12:11, Ken Brown wrote:
On 6/25/2014 10:17 PM, Ken Brown wrote:
This is a followup to, from which I
extracted the following test case:

$ cat gfile-test.c
#include <stdio.h>
#include <gio/gio.h>

gfile_add_watch (const char *file)
   GFile *gfile = g_file_new_for_path (file);
   GFileMonitor *monitor;
   GFileMonitorFlags gflags = G_FILE_MONITOR_NONE;
   monitor = g_file_monitor (gfile, gflags, NULL, NULL);
   if (! monitor)
     printf ("Can't watch file %s\n", file);
     printf ("Watching file %s\n", file);

main ()
   const char *file = "gfile-test.c";
   gfile_add_watch (file);

$ gcc -g -O0 -o gfile-test $(pkg-config --cflags gio-2.0) gfile-test.c
$(pkg-config --libs gio-2.0)

In the 64-bit case, this behaves as expected:

$ ./gfile-test.exe
Watching file gfile-test.c

In the 32-bit case, however, it crashes.  Running it under gdb shows
that the call to g_file_monitor leads to a SEGV, but I can't tell
exactly where; when I try to single step through the Glib code, I
eventually hit an assertion violation in gdb.  strace shows lots of
exceptions, but I can't make much sense out of it otherwise.

I rebuilt glib and gamin without optimization so that I could step
through the code in gdb.  But stepping through the code turned out to be
unnecessary, because the bug was gone after the rebuilds.  I don't know
if optimization was really the issue or whether just rebuilding with the
latest tools is what fixed it.

My builds can be obtained from

if anyone else wants to try to reproduce this without rebuilding the
packages themselves.

Yaakov, could you take a look?

Sure.  Are you narrow this down to only one of glib or gamin?

The culprit is gamin, and optimization *is* relevant.  What's strange, though,
is that when I rebuild it with optimization, my test case hangs instead of
crashing.  Summary:

- With gamin-0.1.10-14 (and its subpackages), my test case crashes.  The outward
symptom is that there's no output, but running the test case under gdb shows the

- If I rebuild gamin without optimization, I don't see any bug.  More precisely,
I build it using your gamin.cygport with the following line added:

   CFLAGS+=" -O0 -g3"

- If I rebuild gamin with optimization (i.e., just using your gamin.cygport with
no changes), my test case hangs.

I made another attempt to debug this, and I found the problem, but I don't know
how to fix it.  First, I have to correct the last assertion I made above about
my test case hanging; I just didn't wait long enough for it to finish.  What
happens is that there is a retry loop in
libgamin/gam_api.c:gamin_connect_unix_socket that gives up after 25 seconds. And
the reason it fails is that /usr/libexec/gam_server.exe has crashed.  In fact,
the latter always crashes on 32-bit Cygwin if it's built with optimization and
if the directory /tmp/fam-<username> exists before it is run.  [And this
directory will always exist after one run of gam_server.exe.]

The crash occurs in a call to g_free at server/gam_channel.c:525 because the
pointer 'dir' that is being freed has been clobbered by a call to
gam_check_not_fat on line 497.  Here are some details, based on a build using
Yaakov's gamin.cygport file with the added line

   CFLAGS+=" -O1 -g3"

I've appended at the end of this message a transcript of a gdb session that
illustrates some of the assertions I'll be making.

At line 447 of server/gam_channel.c, g_strconcat is called to get a pointer to
the directory name "/tmp/fam-<username>".  The value of this pointer is assigned
to the variable 'dir' at line 473, and in my run it is 0x8005c068.  Although
'dir' is optimized out, I can see from a disassembly that the pointer is stored
on the stack at -0x510(%ebp):

    0x004058fc <+266>:    call   0x408bf8 <g_strconcat>
    0x00405901 <+271>:    mov    %eax,-0x510(%ebp)

And I verified in my gdb session that this stack location does indeed contain
0x8005c068.  After the call to gam_check_not_fat a little later, that stack
location contains the value 0x00000104.  Then when g_free attempts to free the
bogus pointer 0x00000104, we get a crash.

I can't tell from the disassembly why the call to gam_check_not_fat clobbers the
stack.  My best guess is that it happens as a result of calls to some Windows
functions.  I hope someone more knowledgeable can take this further and fix it.

I stepped into gam_check_not_fat (which I should have done to begin with) and narrowed this down further. The stack location in question gets clobbered by the call to GetVolumeInformation:

(gdb) s
gam_check_not_fat (path=0x8005c068 "/tmp/fam-kbrown")
    at /usr/src/debug/gamin-0.1.10-16/server/gam_channel.c:35
35        cygwin_conv_path(CCP_POSIX_TO_WIN_A, path, winpath, MAX_PATH);
(gdb) x/x $ebp-0x510
0x28a6a8:       0x8005c068
(gdb) n
37        pGVPN = GetProcAddress(LoadLibrary("kernel32"), "GetVolumePathNameA");
(gdb) x/x $ebp-0x510
0x28a6a8:       0x8005c068
(gdb) n
38        if (!pGVPN || !(pGVPN)(winpath, root, MAX_PATH))
(gdb) x/x $ebp-0x510
0x28a6a8:       0x8005c068
(gdb) n
52        if (!GetVolumeInformation (root, volname, MAX_PATH, NULL,
(gdb) x/x $ebp-0x510
0x28a6a8:       0x8005c068
(gdb) n
58        if (!strncmp(fsname, "FAT", 3))       /* FAT, FAT32 */
(gdb) x/x $ebp-0x510
0x28a6a8:       0x00000104

Here's the code near the call to GetVolumeInformation, followed by what I think is the relevant disassembly:

  if (!GetVolumeInformation (root, volname, MAX_PATH, NULL,
                             NULL, NULL, fsname, MAX_PATH))
      fprintf (stderr, "GetVolumeInformation: %d\n", GetLastError ());
      return 0;

   0x00405b3a <+840>:	movl   $0x104,0x1c(%esp) <<<<<<<<<<<<<<<<
   0x00405b42 <+848>:	lea    -0x120(%ebp),%eax
   0x00405b48 <+854>:	mov    %eax,0x18(%esp)
   0x00405b4c <+858>:	movl   $0x0,0x14(%esp)
   0x00405b54 <+866>:	movl   $0x0,0x10(%esp)
   0x00405b5c <+874>:	movl   $0x0,0xc(%esp)
   0x00405b64 <+882>:	movl   $0x104,0x8(%esp)  <<<<<<<<<<<<<<<<
   0x00405b6c <+890>:	lea    -0x224(%ebp),%eax
   0x00405b72 <+896>:	mov    %eax,0x4(%esp)
   0x00405b76 <+900>:	lea    -0x328(%ebp),%eax
   0x00405b7c <+906>:	mov    %eax,(%esp)
   0x00405b7f <+909>:	call   *0x41248c    <----- GetVolumeInformation?
   0x00405b85 <+915>:	sub    $0x20,%esp
   0x00405b88 <+918>:	test   %eax,%eax
   0x00405b8a <+920>:	jne    0x405bb5 <gam_server_create+963>
   0x00405b8c <+922>:	call   *0x412480    <----- GetLastError?
   0x00405b92 <+928>:	mov    %eax,%esi
   0x00405b94 <+930>:	call   0x408df0 <__getreent>
   0x00405b99 <+935>:	mov    %esi,0x8(%esp)
   0x00405b9d <+939>:	movl   $0x40c70f,0x4(%esp)
   0x00405ba5 <+947>:	mov    0xc(%eax),%eax
   0x00405ba8 <+950>:	mov    %eax,(%esp)
   0x00405bab <+953>:	call   0x408df8 <fprintf>
   0x00405bb0 <+958>:	jmp    0x406073 <gam_server_create+2177>

Note the two marked movl instructions involving 0x104; I guess one of these is the culprit, but I don't really know what's going on.


Problem reports:
Unsubscribe info:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]