This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

FYI: GDB no longer works on older ia64-linux kernels...

From: Joel Brobecker <brobecker at adacore dot com>
To: gdb-patches at sourceware dot org
Date: Wed, 23 Sep 2009 11:39:24 -0700
Subject: FYI: GDB no longer works on older ia64-linux kernels...

Hello,

This is more FYI for engineers who are still working on ia64-linux
with an older version of the Linux kernel.  I reproduced the problem
on RHAS 4.x (Linux kernel 2.6.9-67.0.20.EL) and on SuSE 9.x
(2.6.5-7.97-default).  Jan kindly tested on RedHat-5 and I did some
testing on SuSE-10, and the problem seems to be fixed there.  These
newer versions of the distros use more recent kernels. I will not
be checking in any patch, since I wasn't able to find a way to detect
the failure condition, but anyone stuck with an older version of the
kernel might want to apply the attached patch.

We noticed that when GDB stopped being able to unwind the call stack
from the GNAT exception hook that gets called when an Ada exception is
raised.  But you can observe that failure as unexpected FAILs within
the testsuite as well. For instance "gdb.base/scope.exp: args in correct
order" is failing for me.

To reproduce, build the following program:

    procedure A is
    begin
       raise Constraint_Error;
    end A;

Using the following command:

    % gnatmake -g a

And then run the program until an exception is raised:

    (gdb) catch exception
    Catchpoint 1: all Ada exceptions
    (gdb) run
    Starting program: /[...]/a

    Catchpoint 1, CONSTRAINT_ERROR at <__gnat_debug_raise_exception> (
        e=0x6000000000002af8) at s-except.adb:46

The debugger is unable to unwind past the second frame.  The first
clue is that the debugger has not switched to the first user-code
frame.  But a full backtrace confirms the source of the problem:

    (gdb) bt
    #0  <__gnat_debug_raise_exception> (e=0x6000000000002af8) at s-except.adb:46
    #1  0x4000000000006940 in <__gnat_raise_nodefer_with_msg> (
        e=0x6000000000002af8) at a-except.adb:833
    #2  0x0000000000000000 in ?? ()

The problem was introduced by the use of a "data cache" which uncovered
a latent problem: On Linux, we have an optimization where large read or
write of the inferior memory are performed through procfs, instead of
the usual ptrace (ptrace only operates 4/8 bytes at a time). Small
memory read/writes are still performed through procfs.

In our case, we're trying to read the memory where the IP register
has been saved.  The read is performed through the data cache. As
this memory address hasn't been read yet, the data cache decides
to do the actual memory read.  But instead of just reading the 8 bytes
where our register was saved, it reads a larger chunk, just in case
we might need to read some memory in the same area.  This is what
triggers the change of behavior, because the memory chunk that the
data cache reads is large enough that we now attempt to read it through
procfs instead of through ptrace. Only if procfs fails or decides to return
zero do we revert to using the normal ptrace method.

The code that reads inferior memory from procfs looks like this:

  sprintf (filename, "/proc/%d/mem", PIDGET (inferior_ptid));
  fd = open (filename, O_RDONLY | O_LARGEFILE);
  if (fd == -1)
    return 0;
  if (pread64 (fd, readbuf, len, offset) != len)
    ret = 0;
  else
    ret = len;

We try to read len=64 bytes at offset=0x60000fff7fffc148. pread64
returns 64, signifying that the read was succesful, but the contents
is junk. Using the "x/2x 0x60000fff7fffc148" command, whose memory
write is performed through ptrace, shows the correct data.

To prevent the procfs interface from being used, we limited the size
of the memory transfers to 8 bytes thusly.

2009-09-23  Joel Brobecker  <brobecker@adacore.com>

        * ia64-linux-nat.c (ia64_linux_xfer_partial): Limit the transfer
        size to 8 bytes to prevent the use of the procfs interface.

Tested on an "old" ia64-linux kernel, fixes about 500 FAILs or so...

-- 
Joel

diff --git a/gdb/ia64-linux-nat.c b/gdb/ia64-linux-nat.c
index e8ffc89..5f95590 100644
--- a/gdb/ia64-linux-nat.c
+++ b/gdb/ia64-linux-nat.c
@@ -803,6 +803,19 @@ ia64_linux_xfer_partial (struct target_ops *ops,
   if (object == TARGET_OBJECT_UNWIND_TABLE && writebuf == NULL && offset == 0)
     return syscall (__NR_getunwind, readbuf, len);
 
+  /* With Linux kernels, we normally perform large memory reads/writes
+     through procfs, as we can perform it through one single operation.
+     However, some older versions of the distribution we support (such
+     as RedHat-4 and SuSE-9) ship a kernel that seems to have a bug
+     causing this to fail: The call to pread64 seems to have successfully
+     read all LEN bytes, but the data read is junk.  The problem appears
+     to be fixed in newer versions of these distributions (RedHat-5 and
+     SuSE-10), but until we stop supporting these older distributions,
+     we need to prevent the procfs interface from being used.  We do this 
+     by limiting the transfer to 8 bytes.  */
+  if (len > 8)
+    len = 8;
+
   return super_xfer_partial (ops, object, annex, readbuf, writebuf,
 			     offset, len);
 }

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]