This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Debugging return.exp on ARM


Hi everyone,

In an attempt to fix flaky tests on ARM, I started looking at gdb.base/return.exp.

The last test, which tests the "return" command on a function that returns a double,
fails randomly on our ODroid XU-4 board.  We have another board, a Firefly RK3288,
which fails the same way (and even more frequently).  I have the feeling that there's
a race somewhere in the kernel/cache/memory/something.

I isolated a minimal reproducer from the test case, that goes like this:

  double
  func3 ()
  {
    return -5.0;
  }

  double tmp3;

  int main ()
  {
    tmp3 = func3 ();
    return 0;
  }

Built with:

  $ arm-linux-gnueabihf-gcc -g3 -O0 return.c -o return

And here is the gdb script to run:

  file ~/return
  b func3
  run
  return 2.0
  n
  print tmp3
  quit tmp3 != 2

I simply run gdb like this:

  $ ./gdb -nx -batch -x run.gdb

What the test does is run to the beginning of func3, then issues the command
"return 2.0", which makes the function artificially return with the value 2.0.
It then does a "next" to complete the assignment to tmp3, and then prints the
value of tmp3.  Most of the time, we see the expected value, 2.0.  Once in a
while, we get 0.

When doing the return, GDB writes 2.0 in the d0 register, which is the place where
a return value of type "double" should be (and writes other registers including pc and
sp to actually pop the stack frame).  I added debug traces to confirm that the
right value is written in d0 though ptrace by GDB (even in failure cases).  So when we
resume the thread (when doing the "next" command), it should have the right value in
its d0 register.  When doing the next, those are the exact instructions it executes (also
confirmed by infrun debug):

    83e4:  eeb0 7b40  vmov.f64  d7, d0
    83e8:  f241 0330  movw      r3, #4144	; 0x1030
    83ec:  f2c0 0301  movt      r3, #1
    83f0:  ed83 7b00  vstr      d7, [r3]

In other words, move d0 to d7 and then store it to tmp3's address (0x11030).  I
don't see anything that can go wrong with these instructions... if d0 contains
the right value at the time the thread is resumed, the tmp3 should contain the
right value at the end.  However, as I said earlier, we get the wrong value once
in a while.  So it sounds like somehow the value didn't make it in time to the d0
register when the thread was resumed, or it's GDB reads the value of tmp3 before
the effect of the vstr is visible...

Given that we give the right input to the kernel, even in the cases that
fail, I assume that the problem must be something like wrong cache invalidation
or memory barrier/sequencing.

I ran this test in a loop and got these results:

ODroid XU-4:
  263 fails
  737 successes

Firefly RK3288:
  336 fails
  163 success

First, is anybody able to reproduce the problem on other boards?  Then, does anybody
have an idea what could cause this?

Thanks!

Simon


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]