This is the mail archive of the
gdb@sourceware.org
mailing list for the GDB project.
Re: Debugging return.exp on ARM
- From: Simon Marchi <simon dot marchi at ericsson dot com>
- To: Pedro Alves <palves at redhat dot com>, <gdb at sourceware dot org>
- Cc: Yao Qi <qiyaoltc at gmail dot com>
- Date: Fri, 27 May 2016 09:35:37 -0400
- Subject: Re: Debugging return.exp on ARM
- Authentication-results: sourceware.org; auth=none
- References: <574712FC dot 5090409 at ericsson dot com> <a8700f42-7139-bf7e-7ad9-48a7e47d5863 at redhat dot com>
On 16-05-26 03:11 PM, Pedro Alves wrote:
Thanks for the suggestions.
> - I'd suspect something odd with caches / barriers too.
> Did you try sprinkling in memory barrier instructions, and
> see whether it makes a difference?
I tried to put some dmb a bit everywhere, it didn't help.
> - I'd also try "si" + "info regs" instead of "next" after the return,
> and see if a register with a bad value pops up always at some
> specific instruction.
Good point.
If I replace next with si, only the vmov.f64 d7, d0 gets executed. So if everything
goes well, I should have the "right" value in both d0 and d7. I made a more
focused reproducer, see below.
> - I'd try to see if pinning the thread to a core makes a difference.
Indeed, pinning GDB to a single CPU makes it work (as in the result is right) every time.
As far as I can tell, pinning the inferior has no effect (I am not sure i worked, but I
used "set exec-wrapper taskset 0xffffffff" to reset the affinity).
> - Might help to show the kernel version.
ODroid: Linux odroid 3.10.96+ #5 SMP PREEMPT Thu May 26 15:03:58 EDT 2016 armv7l armv7l armv7l GNU/Linux
Firefly: Linux firefly 3.10.0 #40 SMP PREEMPT Tue Jan 27 16:12:04 CST 2015 armv7l armv7l armv7l GNU/Linux
I also reproduced it on my Rasp Pi 2, which has:
Linux alarmpi 4.4.8-2-ARCH #1 SMP Tue Apr 26 19:14:58 MDT 2016 armv7l GNU/Linux
So here's another case that reproduces the problem, but without a memory read, so
it isolates the problem a bit more. It verifies whether the thread sees our register
write or not.
test.S:
.global _start
_start:
vldr.64 d0, constante
vldr.64 d1, constante
break_here:
vcmp.f64 d0, d1
vmrs APSR_nzcv, fpscr
# Exit code
moveq r0, #1
movne r0, #0
# Exit syscall
mov r7, #1
svc 0
.align 8
constante:
.word 0xc8b43958
.word 0x40594676
Built with:
$ gcc -g3 -O0 -o test test.S -nostdlib
And the gdb script test.gdb:
file test
b break_here
run
p $d0 = 4.0
c
The test is ran with
$ ./gdb -nx -x test.gdb -batch
The test loads the same constant in d0 and d1. It then does a comparison between
them and exits with 1 (failure) if they are the same, 0 (success) if they are different.
The GDB script breaks at "break_here", tries to change the value of d0 to some other
constant (4.0) and lets the program continue and exit. If our register write succeeded,
the program should exit with 0 (values are different). If our register write failed, the
program will exit with 1 (values are still the same).
The result is that I randomly see both cases, hinting that the race is really between the
register write through ptrace and the kernel restoring the thread's vfp registers. Again,
pinning GDB to a single code seems to hide/bypass the bug.
Simon