[PATCH 4/7] btrace: Add support for interrupt events.

Thu Sep 12 06:23:44 GMT 2024

Hello Felix,

>> >+	case BTRACE_INSN_AUX:
>> >+	  /* An aux insn couldn't have switched the function.  But the
>> >+	     segment might not have had a symbol name resolved yet, as events
>> >+	     might not have an IP.  Use the current IP in that case and update
>> >+	     the name.  */
>> >+	  if (bfun->sym == nullptr && bfun->msym == nullptr)
>> >+	    {
>> >+	      bfun->sym = fun;
>> >+	      bfun->msym = mfun;
>> >+	    }
>> >+	  return bfun;
>>
>> We shouldn't normally switch functions just because we turn on event
>tracing.
>> Since an event may or may not provide an IP, however, there is the
>possibility
>> that two adjacent events belong to different functions.
>
>I don't really understand this scenario. Are you talking about:
>Last instruction Function 1 - event (function 1) - event (function 2) - first
>instruction function 2

Yes.

>How can that actually happen?
>
>Aren't events always bound to the instruction they arrive at (if they have no
>IP)?
>Can events really arrive between two instructions? And even then, isn't the
>event
>then automatically bound to the previous instruction, not the future
>instruction?
>The future instruction didn't start executing yet.

I don't think it can happen today without filtering or a bug somewhere.

The libipt interface allows this, however, and with remote debugging and a
custom gdbserver that does support filtering, we could run into this.

I believe it is easy enough to handle in gdb so I would support it for additional
robustness, even if our standard gdbserver implementation does not support
filtering.

>> It is OK to update the function symbol in case a previous event did not
>provide
>> an IP, but I believe we should also handle the case where we do switch
>functions.
>
>I don't see how we could do that without an IP. Even if GDB would look
>forward
>in the recording, how could it choose which function to put the second
>event in without an IP? It could just as well have been two events ending the
>first function.

Without an IP, we just add the AUX to the current BFUN.  All I'm asking is
if we do have an IP and it belongs to a different function, that we start a new
BFUN.

And for determining whether there is a preceding gap, we not only look at
the last BFUN but iterate backwards until we either find a non-AUX or a gap.

>> > static int
>> >@@ -1236,6 +1280,7 @@ handle_pt_insn_events (struct
>btrace_thread_info
>> >*btinfo,
>> >     {
>> >       struct pt_event event;
>> >       uint64_t offset;
>> >+      CORE_ADDR pc = 0;
>>
>> Wasn't there some feedback on the ptwrite series to go with local
>declarations?
>> IIRC Simon asked for that.
>
>I guess you are talking about this discussion?
>https://sourceware.org/pipermail/gdb-patches/2024-August/211183.html
>I think it was mostly about the unnecessary passing of bfun.
>
>I just felt like having a separate declaration for all of these cases is making
>the code a bit bloated (11 times the same line instead of 1).
>And we already have offset as an example of a variable that is not local
>and not used by all cases. Now that I read the code again, we could even
>remove offset and have the two cases that use it use pc instead.
>Then every single cases would use the variable and we would only use
>one instead of two. What do you think?

I would rather not mix offset and pc.  Those are different things and we
should use appropriate names.

Personally, I'm fine with a single declaration.

>> >@@ -1251,8 +1296,16 @@ handle_pt_insn_events (struct
>btrace_thread_info
>> >*btinfo,
>> > 	    if (event.status_update != 0)
>> > 	      break;
>> >
>> >+	    /* Only create a new gap if the last function segment contains
>> >+	       non-aux instructions.  We could be at the beginning of the
>> >+	       recording and could already have handled one or more events,
>> >+	       like ptev_iret, that created aux insns.  In that case we don't
>> >+	       want to create a gap or print a warning.  Note that if the last
>> >+	       function segment contains only aux insns, we are guaranteed
>> >+	       to be at the beginning of a recording or after a gap.  See
>> >+	       handle_pt_aux_insn ().  */
>>
>> If we allow switching functions based on event IPs, the last part of the
>comment
>> no longer holds. It may no longer be sufficient to look at the last function
>> segment.
>
>Note that we are in the ptev_enabled case here, and the comment is only
>talking about that case. It isn't a generic statement for all events.
>Please elaborate if you still see a problem here.

See above.  It isn't sufficient anymore to look at the last function segment.

>> >+# Test printing of at least one INTERRUPT event.
>> >+gdb_test "record function-call-history" [multi_line \
>> >+  "$decimal\tmain" \
>> >+  "\t  \\\[interrupt: vector = 0x1 \\\(#db\\\), ip = $hex\\\]" \
>>
>> The #db comes from stepping over the breakpoint at main, correct?
>
>Right.
>
>> It is not really part of the test and if we ever want to change runto_main
>> to use a temporary breakpoint, we'd need to adjust the test.
>
>We already rely on this to not be a temporary breakpoint when we test
>IRET. I get from where you are coming from, but I think this would also be
>totally fine to change once someone changes runto_main. The test will fail.
>If you think that is not acceptable, we could make this more robust by
>setting an explicit breakpoint e.g. in call1, and rely on that.
>Though that makes the output more complicated and the test longer.

The IRET comes from resuming the thread.  We'd also get this for a temporary
breakpoint.

Doesn't gdb_test_sequence allow arbitrary stuff between the patterns we
want to match?  I'd just remove the #db interrupt event from the list.

If that doesn't work, I agree that turning this into a complicated
gdb_test_multiple does not make sense as it would make the test
much harder to understand.

>> >+  "$decimal\tcall1" \
>>
>> Also, it is not clear to me that we are guaranteed to not have another
>> entry in main between stepping over the breakpoint and calling call1.
>
>You are right. I could make this a test sequence, or add some ".*"
>I have opted for that latter for now. Neither is perfect.

If gdb_test_sequence can ignore unrelated stuff in-between, I'd prefer
that over .*.

>> >+  "$decimal\tmain" \
>> >+  "\t  \\\[interrupt: vector = 0xe \\\(#pf\\\), cr2 = 0x0, ip = $hex\\\]"
>> >+  ]
>>
>> Both CR2 and IP are optional in the libipt interface.
>
>I had them as optional in the regex at some point, but you asked me to
>remove it in an internal discussion.

Hmmmm.  I remember that we accepted a random mnemonic.

>You said "We know that there is cr2 for '#pf'." and " There is no cr2 for '#db'"
>And you mentioned that you preferred it more readable.
>I am fine with it either way.

We shouldn't allow CR2 for #db, that's clear.  Also, if we get CR2, we do know
its value from the test.  And we know the mnemonic from the vector.

Let's make both CR2 and IP optional.  Sorry if I'm inconsistent in my feedback.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928