Regression for gdb.threads/fork-plus-threads.exp [Re: [PATCH 3/6] List inferiors/threads/pspaces in ascending order]

Tue Jan 12 11:22:00 GMT 2016

On 01/11/2016 02:39 PM, Pedro Alves wrote:
> On 01/08/2016 08:39 PM, Jan Kratochvil wrote:
>> On Thu, 22 Oct 2015 11:59:01 +0200, Pedro Alves wrote:
>>
>> 7e0aa6aa9983c745aedc203db0cc360a0ad47cac is the first bad commit
>> commit 7e0aa6aa9983c745aedc203db0cc360a0ad47cac
>> Author: Pedro Alves <palves@redhat.com>
>> Date:   Tue Nov 24 18:11:21 2015 +0000
>>     List inferiors/threads/pspaces in ascending order
>>
>> PASS->FAIL:
>> FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited
>> FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: no threads left (timeout)
>> FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left (the program exited)
>>
>> -PASS: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited
>> +warning: Error removing breakpoint 1^M
>> +Error in re-setting breakpoint 1: Warning:^M
>> +Cannot insert breakpoint 1.^M
>> +Cannot access memory at address 0x8048700^M
>> +^M
>> +Warning:^M
>> +Cannot insert breakpoint 1.^M
>> +Cannot access memory at address 0x8048700^M
>> +^M
>> +(gdb) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited
>>
>> I haven't tried to debug it.
>>
>> It happens on Fedora 23 x86_64 with -m32 for the testsuite.
> 
> I see it too on F20, and indeed only with -m32.  I don't know why yet.

This looks like just exposed a preexisting problem.

When the second fork happens, we do a breakpoint reset, which wipes all locations and tries
to recreate locations for inferior 2.  Because that inferior is either running or exits and
is now zombie, prologue skipping fails to read memory, and thus a breakpoint that used to be
at line 64 (after prologue) ends up re-set to line 60 (before prologue).  We then try to
remove the old breakpoint at line 64 from inferior 2, which fails, because the inferior is
either running or zombie.

The reason we didn't see this before is that the code cache masked it.  Before, we
resumed the threads in the opposite order (newest first), and "luckily" the sequence
of events in this particular test would be such that the code cache hasn't been flushed
yet when we went to prologue skipping.

"set code-cache off" makes the problem visible even before the patch.

This doesn't trigger with native-extended-gdbserver/-m32 because gdbserver can
read memory even when the inferior is running.

This is also yet another instance of breakpoint re-setting being too coarse [1]...
If inferior 1 forked inferior 3, why would we need to re-set breakpoint locations
of inferior 2?  I think we can avoid revamping breakpoint re-set completely, by
instead limiting re-sets to the program space that triggered it.

[1] - https://sourceware.org/gdb/wiki/BreakpointReset

Thanks,
Pedro Alves