Summary: | Incorrect evaluation of systemtap probes due to register being signed and probe expression assuming unsigned | ||
---|---|---|---|
Product: | gdb | Reporter: | Andrew Burgess <aburgess> |
Component: | breakpoints | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | brobecker, fche, fweimer, jakub, sergiodj |
Priority: | P2 | ||
Version: | HEAD | ||
Target Milestone: | 8.3.1 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Test case. |
After some additional investigation I was slightly wrong last night. The probe does claim that the argument is in the $di register, this is the system tap entry from readelf on /lib/libstdc++.so.6: stapsdt 0x00000028 NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx Name: throw Location: 0x00072c96, Base: 0x00133d64, Semaphore: 0x00000000 Arguments: 4@%si 4@%di However, I suspect the issue is that the probe information might be wrong. The second argument is described as an unsigned 32-bit value in register $di, however, register $di is only 16-bits, so GDB currently extracts its value and casts it up to uint32_t. If I look at the code around the trace point however, I see this: 00072c80 <__cxa_throw@@CXXABI_1.3>: 72c80: 57 push %edi 72c81: 56 push %esi 72c82: 53 push %ebx 72c83: 8b 74 24 10 mov 0x10(%esp),%esi 72c87: e8 74 bf ff ff call 6ec00 <__cxa_finalize@plt+0x980> 72c8c: 81 c3 74 e3 10 00 add $0x10e374,%ebx 72c92: 8b 7c 24 14 mov 0x14(%esp),%edi 72c96: 90 nop <----------------- PROBE IS HERE 72c97: e8 d4 a2 ff ff call 6cf70 <__cxa_get_globals@plt> 72c9c: 83 40 04 01 addl $0x1,0x4(%eax) 72ca0: 83 ec 04 sub $0x4,%esp 72ca3: ff 74 24 1c pushl 0x1c(%esp) 72ca7: 57 push %edi 72ca8: 56 push %esi 72ca9: e8 62 a3 ff ff call 6d010 <__cxa_init_primary_exception@plt> 72cae: 8d 70 40 lea 0x40(%eax),%esi 72cb1: c7 00 01 00 00 00 movl $0x1,(%eax) 72cb7: 89 34 24 mov %esi,(%esp) 72cba: e8 61 96 ff ff call 6c320 <_Unwind_RaiseException@plt> 72cbf: 89 34 24 mov %esi,(%esp) 72cc2: e8 c9 84 ff ff call 6b190 <__cxa_begin_catch@plt> 72cc7: e8 d4 b3 ff ff call 6e0a0 <_ZSt9terminatev@plt> 72ccc: 66 90 xchg %ax,%ax 72cce: 66 90 xchg %ax,%ax Notice that $di is never used, only the 32-bit $edi. I suspect that the issue is incorrect generation of the probe arguments. I'm taking a look at GCC to see if I can confirm this, but I'm posting this update in case anyone is interested. So this issue seems to be relevant: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80115 which links to this change in SystemTap: https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=commitdiff;h=272146660f54786bb61d388f6d3a4eb20e7d9369 Which is where SystemTap changed to emit the narrowed register names by adding the 'w' prefix. To see how GCC interprets the 'w' prefix, see: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/i386/i386.c;h=ab1524c3524846468185d01302d86f1ddbbc268c;hb=refs/heads/master#l12247 The implication seems to be that when SystemTap says the argument is this `4@%di` then we should interpret this to mean `((uint32_t) $edi)`. Thanks a lot for the throrough investigation, Andrew! I can confirm that replacing "di" by "edi" works, i.e., the inferior finishes (with return code 1). It's not clear to me whether we should special-case this scenario (doesn't seem like a very elegant solution); I'll look deeper into this and see if I can come up with a solution. Thanks. We should work out why the sys/sdt.h machinery marks the %si operand as though it was unsigned. It's not a property of the register, but of the gcc expression type at that point. (In reply to Frank Ch. Eigler from comment #4) > We should work out why the sys/sdt.h machinery marks the %si operand as > though it was unsigned. It's not a property of the register, but of the gcc > expression type at that point. Thanks for the reply, Frank. This is an important question, but the real problem here is the use of %di (or %si) instead of their extended versions (%edi and %esi), as far as I know. Even if we have "-4@%di" (which is still wrong; should be "-2@%di"), GDB still can't proceed and errors out. (In reply to Sergio Durigan Junior from comment #5) > (In reply to Frank Ch. Eigler from comment #4) > > We should work out why the sys/sdt.h machinery marks the %si operand as > > though it was unsigned. It's not a property of the register, but of the gcc > > expression type at that point. > > Thanks for the reply, Frank. > > This is an important question, but the real problem here is the use of %di > (or %si) instead of their extended versions (%edi and %esi), as far as I > know. Even if we have "-4@%di" (which is still wrong; should be "-2@%di"), > GDB still can't proceed and errors out. In this case changing to "4@edi" would be what GDB needs, the particular argument is a 32-bit address. I'm trying to be careful not to blame either SystemTap or GDB here, it's not clear which needs to change. What isn't clear to me from reading the original GCC issue, is why SystemTap changed to force the use of the 16-bit register name instead of the name that matches the size of the operand. The "%w[id]" syntax is meant to do nothing but permit gcc to offer a more specific (subregister) name for values that fit. If the operand is 32 bits long, it seems to me that gcc shouldn't be describing it as %di. No, GCC does the documented thing. %w[id] says take the argument of id and print it as if it was 16-bit. It is like all the other size modifiers and is heavily used that way by both GCC itself and various projects in inline-asm. If you want to auto-detect the size of the operand rather than override it, don't use any size modifiers. Note, in http://PR80115 you said systemtap doesn't care how the register is spelled and that the width is encoded differently, which is why I've suggested that w modifier, it could have been k modifier too and then everything would be printed as %edi etc.; the reason for w was that it is shorter. (In reply to Jakub Jelinek from comment #8) > No, GCC does the documented thing. > [...] If you want to auto-detect > the size of the operand rather than override it, don't use any size > modifiers. OK, in that case it seems like we were wrong to take the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80115 patch. We don't want gcc to give information known not to match the actual data. I wonder why we opted for this instead of # define STAP_SDT_ARG_CONSTRAINT norq or such, so that the basic constraint includes the 1-byte subregister type. Would we need another code for 2-byte subregisters, if those are also excluded from "r" class? (In reply to Frank Ch. Eigler from comment #10) > (In reply to Jakub Jelinek from comment #8) > > No, GCC does the documented thing. > > [...] If you want to auto-detect > > the size of the operand rather than override it, don't use any size > > modifiers. > > OK, in that case it seems like we were wrong to take the > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80115 patch. We don't want gcc > to give information known not to match the actual data. I wonder why we > opted for this instead of > > # define STAP_SDT_ARG_CONSTRAINT norq Not really sure what you want to achieve with that. That means the argument can be either integer, or offsettable memory, or any GPR register, or any GPR register ok for 8-bit values, the union of the last two sets is the third set, so norq is the same thing as nor. If you don't have the constraint strings dependent on the size of the argument (not really sure how you could do it except for requiring to use a different macro for 8-bit arguments, perhaps with some compile time verification), then you either use nor and then need to use some address override (w and all registers will be printed as %ax, %bx, %cx, %dx, %si, %di, %bp, %sp or k and all registers will be printed as %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp, regardless of the size), or use noq on ia32 and then that will penalize code quite a lot, because will force all those arguments into just %al, %bl, %cl or %dl. From what I see, the size is encoded in the string too, so for 8-bit arguments you have there 1%ax, for 16-bit arguments 2%ax, for 32-bit arguments 4%ax and for 64-bit arguments 8%ax, all that can be interpreted by stap as %al, %ax, %eax and %eax + (%edx<<32) respectively. So, I believe you have quite compact representation that doesn't omit any details and is easy to interpret. OK, trying to figure out where the mismatches are. One one hand, one may interpret the register names as -a- name for the container that holds the value, and use the operand-size value to identify the N-bit bitstring payload within it. In this case, the decoder must settle which names are deemed equivalent. (Note that there is not a need to shorten register name strings, so this is not worth doing on that account only.) On another hand, one may interpret the register names as a complete site for the value in question, so %al would refer to an at-most-8-bit value in AL, definitely not a 64-bit value in %rax. In this case, the names don't have to be reprocessed by a smart decoder. Register names would carry an implicit maximum size. So on hand, 4@%si would mean the 4-byte value %esi, with the decoder put in charge of figuring this pairing out. On the other hand, 4@%si would be an error because %si is only 2 bytes wide in the first place. To my knowledge, we have not specified which of these interpretations we intend. The sys/sdt.h patch from gcc bug# 80115 enshrined the first interpretation, but we did not fully grok/document this. Does anyone have concerns about moving to case 2 - which in the i386 case would mean using _SDT_ARGTMPL(id) = %k[id], so gcc would emit 4@%edi etc. for this case? (In reply to Frank Ch. Eigler from comment #12) > Does anyone have concerns about moving to case 2 - which in the i386 case > would mean using _SDT_ARGTMPL(id) = %k[id], so gcc would emit 4@%edi etc. > for this case? Then the decoder still needs to know what does it mean when 8@%eax appears (how to find what is the next register and take endianity into account etc.). But true, for 8/16/32 bit sizes it could just take the named register, take the lowest N*8 bits out of that and use that. Transforming %di to %edi isn't that hard though and will mean you don't have to rebuild all the software which has the notes already created. Good point about 8@eax ... assuming that's how long-longs would show up. At least if we could state the policy that if a stated width larger than the register's own width, then this is a signal that a decoder must be intelligent and match gcc's notion of register-pairings etc. This assumes that this is a fixed / context-free / stateless algorithm that we could document as a table over time. In the mean time, we could switch to the %k[] variant to make this case less likely to trigger, requiring that documentation & logic. (In reply to Frank Ch. Eigler from comment #14) > Good point about 8@eax ... assuming that's how long-longs would show up. At > least if we could state the policy that if a stated width larger than the > register's own width, then this is a signal that a decoder must be > intelligent and match gcc's notion of register-pairings etc. This assumes > that this is a fixed / context-free / stateless algorithm that we could > document as a table over time. > > In the mean time, we could switch to the %k[] variant to make this case less > likely to trigger, requiring that documentation & logic. Thanks for taking a look into this, Frank and Jakub. If I am understanding correctly, the proposal here is to make both modifications: use %k[] on sdt.h, and also make the decoder interpret when the informed size is actually larger than the actual register size. I am more than fine with using %k[], of course, if it means that we will have the right pair of size-string@register there. As for the second part, I think I can accomodate and expand the current parser to take that into account, even though my first impression is that the parser shouldn't need to have this knowledge built into it. > If I am understanding correctly, the proposal here is to make both > modifications: use %k[] on sdt.h, and also make the decoder interpret when > the informed size is actually larger than the actual register size. I wish we didn't have to deal with the latter, but we may not have a choice on the latter, for larger-than-any-register types. > I am more than fine with using %k[], of course, if it means that we will > have the right pair of size-string@register there. Just to confirm, there is no sign/size problem in the sdt.h machinery? > As for the second part, I think I can accomodate and expand the current > parser to take that into account, even though my first impression is that > the parser shouldn't need to have this knowledge built into it. Indeed. And there are multiple projects that consume sdt.h macro notes (gdb, stap, perf, several other tracing tools), so this code would to be replicated. We can start assembling a table in one of the wikis, or the header file itself. (In reply to Frank Ch. Eigler from comment #16) > > I am more than fine with using %k[], of course, if it means that we will > > have the right pair of size-string@register there. > > Just to confirm, there is no sign/size problem in the sdt.h machinery? Hm, not sure I understood the question. The current arguments do have sign and size problems; "4@%di" should actually be "-2@di". But maybe the question wasn't for me; sorry if that's the case. > > As for the second part, I think I can accomodate and expand the current > > parser to take that into account, even though my first impression is that > > the parser shouldn't need to have this knowledge built into it. > > Indeed. And there are multiple projects that consume sdt.h macro notes > (gdb, stap, perf, several other tracing tools), so this code would to be > replicated. We can start assembling a table in one of the wikis, or the > header file itself. Sounds like a good idea. > > Just to confirm, there is no sign/size problem in the sdt.h machinery?
>
> Hm, not sure I understood the question.
>
> The current arguments do have sign and size problems; "4@%di" should
> actually be "-2@di". But maybe the question wasn't for me; sorry if that's
> the case.
That's almost exactly the question.
Here, by "%di", gcc meant "%edi". So the question is whether the 4@ part is correct, if we understand that it's referring to the entire hypothetical register. And I believe it is, being a 32-bit pointer.
(In reply to Frank Ch. Eigler from comment #18) > > > Just to confirm, there is no sign/size problem in the sdt.h machinery? > > > > Hm, not sure I understood the question. > > > > The current arguments do have sign and size problems; "4@%di" should > > actually be "-2@di". But maybe the question wasn't for me; sorry if that's > > the case. > > That's almost exactly the question. > Here, by "%di", gcc meant "%edi". So the question is whether the 4@ part is > correct, if we understand that it's referring to the entire hypothetical > register. And I believe it is, being a 32-bit pointer. Thanks for clarifying. According to GDB: (gdb) ptype $edi type = int32_t So it should be "-4@%edi", to account for the signedness. > (gdb) ptype $edi
> type = int32_t
>
> So it should be "-4@%edi", to account for the signedness.
But the signedness is that of the value (the pointer), not that of the register holding the value (which in this case is an invention of gdb anyway, and not something the CPU considers).
We're spinning a new stap rawhide build which includes the %k[] modification. https://koji.fedoraproject.org/koji/buildinfo?buildID=1267378 would appreciate a check that, at least for this test, it's good enough (In reply to Frank Ch. Eigler from comment #22) > https://koji.fedoraproject.org/koji/buildinfo?buildID=1267378 > > would appreciate a check that, at least for this test, it's good enough I built a scratch GCC package on Fedora's Koji, which used the new systemtap package (systemtap-sdt-devel 4.2-0.20190513git8b868f3dd030.fc31), and then I installed the libstdc++.i686 RPM on my system: https://koji.fedoraproject.org/koji/taskinfo?taskID=34985489 Then, I ran the test again, and unfortunately I still see the same failure. The new /usr/lib/libstdc++.so.6.0.26 still refers to %di: $ readelf -n /usr/lib/libstdc++.so.6.0.26 [...] Displaying notes found in: .note.stapsdt Owner Data size Description stapsdt 0x0000002e NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx Name: catch Location: 0x0008abf3, Base: 0x0019648b, Semaphore: 0x00000000 Arguments: 4@%dx 4@-48(%esi) stapsdt 0x00000028 NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx Name: throw Location: 0x0008c13a, Base: 0x0019648b, Semaphore: 0x00000000 Arguments: 4@%si 4@%di stapsdt 0x0000002a NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx Name: rethrow Location: 0x0008c1fd, Base: 0x0019648b, Semaphore: 0x00000000 Arguments: 4@%bx 4@%ax The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=93cb9841d68263174a600dc70af742a8e2eabfc6 commit 93cb9841d68263174a600dc70af742a8e2eabfc6 Author: Andrew Burgess <andrew.burgess@embecosm.com> Date: Sun Jun 16 16:17:59 2019 +0100 gdb/testsuite: Improve detection of bug gdb/24541 In bug gdb/24686 a testsuite failure was reported, this failure was actually just another instance of bug gdb/24541, however, due to the non-deterministic nature of bug gdb/24541 the testsuite pattern that was intended to catch this bug failed. This commit adds a second pattern to help detect gdb/24541, which should change the FAIL reported in gdb/24686 into a KFAIL. gdb/testsuite/ChangeLog: PR gdb/24686 * gdb.mi/mi-catch-cpp-exceptions.exp: Add an extra pattern to improve detection of bug gdb/24541. Thanks, Andrew. I haven't forgotten about this bug. If I have the time, I'll work on it this next week. The master branch has been updated by Sergio Durigan Junior <sergiodj@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7d7571f0c14b4673ca95f6dc31d6f07d429e6697 commit 7d7571f0c14b4673ca95f6dc31d6f07d429e6697 Author: Sergio Durigan Junior <sergiodj@redhat.com> Date: Wed Jun 26 17:34:50 2019 -0400 Adjust i386 registers on SystemTap probes' arguments (PR breakpoints/24541) This bug has been reported on PR breakpoints/24541, but it is possible to reproduce it easily by running: make check-gdb TESTS=gdb.base/stap-probe.exp RUNTESTFLAGS='--target_board unix/-m32' The underlying cause is kind of complex, and involves decisions made by GCC and the sys/sdt.h header file about how to represent a probe argument that lives in a register in 32-bit programs. I'll use Andrew's example on the bug to illustrate the problem. libstdc++ has a probe named "throw" with two arguments. On i386, the probe is: stapsdt 0x00000028 NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx Name: throw Location: 0x00072c96, Base: 0x00133d64, Semaphore: 0x00000000 Arguments: 4@%si 4@%di I.e., the first argument is an unsigned 32-bit value (represented by the "4@") that lives on %si, and the second argument is an unsigned 32-bit value that lives on %di. Note the discrepancy between the argument size reported by the probe (32-bit) and the register size being used to store the value (16-bit). However, if you take a look at the disassemble of a program that uses this probe, you will see: 00072c80 <__cxa_throw@@CXXABI_1.3>: 72c80: 57 push %edi 72c81: 56 push %esi 72c82: 53 push %ebx 72c83: 8b 74 24 10 mov 0x10(%esp),%esi 72c87: e8 74 bf ff ff call 6ec00 <__cxa_finalize@plt+0x980> 72c8c: 81 c3 74 e3 10 00 add $0x10e374,%ebx 72c92: 8b 7c 24 14 mov 0x14(%esp),%edi 72c96: 90 nop <----------------- PROBE IS HERE 72c97: e8 d4 a2 ff ff call 6cf70 <__cxa_get_globals@plt> 72c9c: 83 40 04 01 addl $0x1,0x4(%eax) 72ca0: 83 ec 04 sub $0x4,%esp 72ca3: ff 74 24 1c pushl 0x1c(%esp) 72ca7: 57 push %edi 72ca8: 56 push %esi 72ca9: e8 62 a3 ff ff call 6d010 <__cxa_init_primary_exception@plt> 72cae: 8d 70 40 lea 0x40(%eax),%esi 72cb1: c7 00 01 00 00 00 movl $0x1,(%eax) 72cb7: 89 34 24 mov %esi,(%esp) 72cba: e8 61 96 ff ff call 6c320 <_Unwind_RaiseException@plt> 72cbf: 89 34 24 mov %esi,(%esp) 72cc2: e8 c9 84 ff ff call 6b190 <__cxa_begin_catch@plt> 72cc7: e8 d4 b3 ff ff call 6e0a0 <_ZSt9terminatev@plt> 72ccc: 66 90 xchg %ax,%ax 72cce: 66 90 xchg %ax,%ax Note how the program is actually using %edi, and not %di, to store the second argument. This is the problem here. GDB will basically read the probe argument, then read the contents of %di, and then cast this value to uint32_t, which causes the wrong value to be obtained. In the gdb.base/stap-probe.exp case, this makes GDB read the wrong memory location, and not be able to display a test string. In Andrew's example, this causes GDB to actually stop at a "catch throw" when it should actually have *not* stopped. After some discussion with Frank Eigler and Jakub Jelinek, it was decided that this bug should be fixed on the client side (i.e., the program that actually reads the probes), and this is why I'm proposing this patch. The idea is simple: we will have a gdbarch method, which, for now, is only used by i386. The generic code that deals with register operands on gdb/stap-probe.c will call this method if it exists, passing the current parse information, the register name and its number. The i386 method will then verify if the register size is greater or equal than the size reported by the stap probe (the "4@" part). If it is, we're fine. Otherwise, it will check if we're dealing with any of the "extendable" registers (like ax, bx, si, di, sp, etc.). If we are, it will change the register name to include the "e" prefix. I have tested the patch here in many scenarios, and it fixes Andrew's bug and also the regressions I mentioned before, on gdb.base/stap-probe.exp. No regressions where found on other tests. Comments? gdb/ChangeLog: 2019-06-27 Sergio Durigan Junior <sergiodj@redhat.com> PR breakpoints/24541 * gdbarch.c: Regenerate. * gdbarch.h: Regenerate. * gdbarch.sh: Add 'stap_adjust_register'. * i386-tdep.c: Include '<unordered_set>'. (i386_stap_adjust_register): New function. (i386_elf_init_abi): Register 'i386_stap_adjust_register'. * stap-probe.c (stap_parse_register_operand): Call 'gdbarch_stap_adjust_register'. Fixed. The master branch has been updated by Andrew Burgess <aburgess@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6b78c3f83c8bcbfa714aab7627ece9673b2d602a commit 6b78c3f83c8bcbfa714aab7627ece9673b2d602a Author: Andrew Burgess <andrew.burgess@embecosm.com> Date: Tue Jul 2 12:06:06 2019 +0100 gdb: Remove a non-const reference parameter Non-const reference parameter should be avoided according to the GDB coding standard: https://sourceware.org/gdb/wiki/Internals%20GDB-C-Coding-Standards#Avoid_non-const_reference_parameters.2C_use_pointers_instead This commit updates the gdbarch method gdbarch_stap_adjust_register, and the one implementation i386_stap_adjust_register to avoid using a non-const reference parameter. I've also removed the kfail from the testsuite for bug 24541, as this issue is now resolved. gdb/ChangeLog: PR breakpoints/24541 * gdbarch.c: Regenerate. * gdbarch.h: Regenerate. * gdbarch.sh: Adjust return type and parameter types for 'stap_adjust_register'. (i386_stap_adjust_register): Adjust signature and return new register name. * stap-probe.c (stap_parse_register_operand): Adjust use of 'gdbarch_stap_adjust_register'. gdb/testsuite/ChangeLog: PR breakpoints/24541 * gdb.mi/mi-catch-cpp-exceptions.exp: Remove kfail due to 24541. The gdb-8.3-branch branch has been updated by Tom de Vries <vries@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=3b0e664e2fa479d66e7b39ba28739f6d6315f27c commit 3b0e664e2fa479d66e7b39ba28739f6d6315f27c Author: Sergio Durigan Junior <sergiodj@redhat.com> Date: Wed Sep 11 22:03:40 2019 +0200 Adjust i386 registers on SystemTap probes' arguments (PR breakpoints/24541) [ Backport of master commit 7d7571f0c1. ] This bug has been reported on PR breakpoints/24541, but it is possible to reproduce it easily by running: make check-gdb TESTS=gdb.base/stap-probe.exp RUNTESTFLAGS='--target_board unix/-m32' The underlying cause is kind of complex, and involves decisions made by GCC and the sys/sdt.h header file about how to represent a probe argument that lives in a register in 32-bit programs. I'll use Andrew's example on the bug to illustrate the problem. libstdc++ has a probe named "throw" with two arguments. On i386, the probe is: stapsdt 0x00000028 NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx Name: throw Location: 0x00072c96, Base: 0x00133d64, Semaphore: 0x00000000 Arguments: 4@%si 4@%di I.e., the first argument is an unsigned 32-bit value (represented by the "4@") that lives on %si, and the second argument is an unsigned 32-bit value that lives on %di. Note the discrepancy between the argument size reported by the probe (32-bit) and the register size being used to store the value (16-bit). However, if you take a look at the disassemble of a program that uses this probe, you will see: 00072c80 <__cxa_throw@@CXXABI_1.3>: 72c80: 57 push %edi 72c81: 56 push %esi 72c82: 53 push %ebx 72c83: 8b 74 24 10 mov 0x10(%esp),%esi 72c87: e8 74 bf ff ff call 6ec00 <__cxa_finalize@plt+0x980> 72c8c: 81 c3 74 e3 10 00 add $0x10e374,%ebx 72c92: 8b 7c 24 14 mov 0x14(%esp),%edi 72c96: 90 nop <----------------- PROBE IS HERE 72c97: e8 d4 a2 ff ff call 6cf70 <__cxa_get_globals@plt> 72c9c: 83 40 04 01 addl $0x1,0x4(%eax) 72ca0: 83 ec 04 sub $0x4,%esp 72ca3: ff 74 24 1c pushl 0x1c(%esp) 72ca7: 57 push %edi 72ca8: 56 push %esi 72ca9: e8 62 a3 ff ff call 6d010 <__cxa_init_primary_exception@plt> 72cae: 8d 70 40 lea 0x40(%eax),%esi 72cb1: c7 00 01 00 00 00 movl $0x1,(%eax) 72cb7: 89 34 24 mov %esi,(%esp) 72cba: e8 61 96 ff ff call 6c320 <_Unwind_RaiseException@plt> 72cbf: 89 34 24 mov %esi,(%esp) 72cc2: e8 c9 84 ff ff call 6b190 <__cxa_begin_catch@plt> 72cc7: e8 d4 b3 ff ff call 6e0a0 <_ZSt9terminatev@plt> 72ccc: 66 90 xchg %ax,%ax 72cce: 66 90 xchg %ax,%ax Note how the program is actually using %edi, and not %di, to store the second argument. This is the problem here. GDB will basically read the probe argument, then read the contents of %di, and then cast this value to uint32_t, which causes the wrong value to be obtained. In the gdb.base/stap-probe.exp case, this makes GDB read the wrong memory location, and not be able to display a test string. In Andrew's example, this causes GDB to actually stop at a "catch throw" when it should actually have *not* stopped. After some discussion with Frank Eigler and Jakub Jelinek, it was decided that this bug should be fixed on the client side (i.e., the program that actually reads the probes), and this is why I'm proposing this patch. The idea is simple: we will have a gdbarch method, which, for now, is only used by i386. The generic code that deals with register operands on gdb/stap-probe.c will call this method if it exists, passing the current parse information, the register name and its number. The i386 method will then verify if the register size is greater or equal than the size reported by the stap probe (the "4@" part). If it is, we're fine. Otherwise, it will check if we're dealing with any of the "extendable" registers (like ax, bx, si, di, sp, etc.). If we are, it will change the register name to include the "e" prefix. I have tested the patch here in many scenarios, and it fixes Andrew's bug and also the regressions I mentioned before, on gdb.base/stap-probe.exp. No regressions where found on other tests. Comments? gdb/ChangeLog: 2019-06-27 Sergio Durigan Junior <sergiodj@redhat.com> PR breakpoints/24541 * gdbarch.c: Regenerate. * gdbarch.h: Regenerate. * gdbarch.sh: Add 'stap_adjust_register'. * i386-tdep.c: Include '<unordered_set>'. (i386_stap_adjust_register): New function. (i386_elf_init_abi): Register 'i386_stap_adjust_register'. * stap-probe.c (stap_parse_register_operand): Call 'gdbarch_stap_adjust_register'. Tentatively setting the milestone to 8.3.1 in preparation for that release. I say "tentatively" because not all patches have been backported, but IIUC, the one that really matters has? The gdb-8.3-branch branch has been updated by Sergio Durigan Junior <sergiodj@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=6f4f8f476a4e41cc7117a8e85087963c0ac3e95b commit 6f4f8f476a4e41cc7117a8e85087963c0ac3e95b Author: Sergio Durigan Junior <sergiodj@redhat.com> Date: Tue Sep 17 00:01:13 2019 -0400 Update ChangeLog entry of commit e9224f6d80ea35e90a61d44575f12b26742eaaf3 and mention PR breakpoints/24541 The master branch has been updated by Joel Brobecker <brobecke@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=a1726c38a9e8ac07b3fd6bbfac04da27c32c0d70 commit a1726c38a9e8ac07b3fd6bbfac04da27c32c0d70 Author: Joel Brobecker <brobecker@adacore.com> Date: Wed Sep 18 14:42:24 2019 -0700 Update ChangeLog entry of commit 677052f2a5c67f1d9b2e6d1b2a2149b5f0c20cd0... ... to include PR breakpoints/24541 (for documentation purposes). For the reader's convenience, the commit in question was the following: commit 677052f2a5c67f1d9b2e6d1b2a2149b5f0c20cd0 Author: Sergio Durigan Junior <sergiodj@redhat.com> Date: Thu May 16 16:23:24 2019 -0400 Subject: Make stap-probe.c:stap_parse_register_operand's "regname" an std::string |
Created attachment 11768 [details] Test case. Grab the attached tar file, then: $ tar -xf probe-bug.tar.xz cd probe-bug make make check The included test program does throw and catch some exceptions, and the gdb script sets up a possible catch with 'catch throw blahblah'. However, 'blahblah' doesn't match any of the exception types we throw or catch, so the expectation is that GDB will not stop at any of the throws, and instead the program will run to completion. What I actually observe is the program stops at the 'throw' and GDB prints this error: could not find minimal symbol for typeinfo address 0xffff8804 Unfortunately this bug relies on the specific address that happens to be in a register, if the address doesn't have the "correct" bit set then the bug will not trigger. What has happened is that GDB stops at the SystemTap probe point for "throw", it then tries to extract the arguments for this probe point. For the particular problem argument the systemtap expression as understood by gdb is: 0 UNOP_CAST Type @0x55ade9b8e430 (uint32_t) 3 OP_REGISTER Register $di The value of $di register when we stop is this: (gdb) p/x $edi $1 = 0x8048804 (gdb) p/x $di $2 = 0x8804 And, critically, the type of $di is: (gdb) ptype $di type = int16_t What's happening then is when GDB evaluates the SystemTap expression it is doing this: (gdb) p/x (unsigned int) $di $3 = 0xffff8804 Which gives us the '0xffff8804' problem value we see in the error. However what SystemTap is expecting is something more like this: (gdb) p/x (unsigned int) ((unsigned short) $di) $4 = 0x8804 Or to put it another way, SystemTap is assuming the register is unsigned. It's not clear to me if the problem is (a) SystemTaps expressions are wrong, and should not assume the register is signed / unsigned, (b) GDB's register type is wrong, and we should change the registers to be unsigned, or (c) GDB's expression evaluation is wrong, and we should somehow be special casing SystemTap expressions to force registers unsigned in this case... One final note, this bug is present on older and more recent versions of GDB (current HEAD 9-May-2019) however, in current HEAD a completely different bug will hit you first. When GDB tries to print the error "could not find minimal symbol for typeinfo address 0xffff8804" a terminal ownership bug will cause GDB to receive SIGSTOP and go into the background. I've included a GDB patch to work around this issue inside the probe-bug.tar.xz tar file (see probe-bug/sigstop-bug.patch).