This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH] gdbserver-support: Handle gdbserver start failures
- From: Pedro Alves <palves at redhat dot com>
- To: "Maciej W. Rozycki" <macro at codesourcery dot com>
- Cc: gdb-patches at sourceware dot org
- Date: Thu, 04 Sep 2014 10:30:59 +0100
- Subject: Re: [PATCH] gdbserver-support: Handle gdbserver start failures
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot DEB dot 1 dot 10 dot 1407282028050 dot 16254 at tp dot orcam dot me dot uk> <53D79950 dot 505 at redhat dot com> <alpine dot DEB dot 1 dot 10 dot 1408212053570 dot 2958 at tp dot orcam dot me dot uk>
Hi Maciej,
On 08/26/2014 06:14 PM, Maciej W. Rozycki wrote:
>>> Index: gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp
>>> ===================================================================
>>> --- gdb-fsf-trunk-quilt.orig/gdb/testsuite/lib/gdbserver-support.exp 2014-05-13 02:52:11.347706187 +0100
>>> +++ gdb-fsf-trunk-quilt/gdb/testsuite/lib/gdbserver-support.exp 2014-05-30 01:45:51.658977074 +0100
>>> @@ -275,6 +275,7 @@ proc gdbserver_start { options arguments
>>> # Wait for the server to open its TCP socket, so that GDB can connect.
>>> expect {
>>> -i $server_spawn_id
>>> + -timeout 120
>>> -notransfer
>>> -re "Listening on" { }
>>> -re "Can't bind address: Address already in use\\.\r\n" {
>>>
>>
>> OK.
>>
>> Wouldn't it be good to add a 'timeout {...}' case that emits a
>> warning or some such?
>
> Good point. I actually went further than that.
>
> As it happens we have a board that fails a gdb.base/gcore-relro.exp test
> case reproducibly and moreover the case appears to trigger a kernel bug
> making the it less than usable. Specifically the board remains responsive
> to some extent, however processes do not appear to be able to successfully
> complete termination anymore and perhaps more importantly further
> gdbserver processes can be started, but they never reach the stage of
> listening on the RSP socket.
>
> This change handles timeouts in gdbserver start properly, by throwing a
> TCL error exception when gdbserver does not report listening on the RSP
> socket in time. This is then caught at the outer level and reported, and
> 2 rather than 1 is returned so that the caller may tell the failure to
> start gdbserver and other issues apart and act accordingly (or do
> nothing).
>
> I thought letting the exception unwind further on might be a good idea
> for any test harnesses out there to break outright where a gdbserver start
> error is silently ignored right now, however I figured out the calls to
> gdbserver-support.exp are buried down too deep in the GDB test suite for
> such a change to be made easily. I think returning a distinct return
> value is good enough (the API says "non-zero", so 2 is as good as 1) and
> we can always make the error harder in a later step if required.
>
> With config/gdbserver.exp being used this change remains transparent to
> the target board, the return value is passed up by gdb_reload and the
> error exception unwinds through gdbserver_gdb_load and is caught and
> handled by mi_gdb_target_load. A call to perror is still made, reporting
> the timeout, and in the case of mi_gdb_target_load the procedure returns a
> value denoting unsuccessful completion. An unsuccessful completion of
> gdb_reload is already handled elsewhere.
>
> An alternative gdbserver board configuration can interpret the return
> value in its gdb_reload implementation and catch the error in
> gdbserver_gdb_load in an attempt to recover a target board that has gone
> astray, for example by rebooting the board somehow. This has proved
> effective with our failing board, that now completes the remaining test
> cases with no further hiccups.
>
> I pushed it through regression testing with the powerpc-linux-gnu target
> and a some half a dozen of multilibs (including ones that trip over the
> faulty kernel on the problematic board) and there were no issues. The
> full list of the offending test cases is as follows:
>
> FAIL: gdb.base/gcore-relro.exp: save a corefile
> FAIL: gdb.base/print-symbol-loading.exp: save a corefile
> FAIL: gdb.threads/gcore-thread.exp: save a corefile
>
> so essentially the same stuff in different places.
>
> OK to apply?
OK.
Thanks,
Pedro Alves