[PATCH] Add basic event handling in the NetBSD target

Sun Apr 26 21:14:34 GMT 2020

On 2020-04-24 6:09 a.m., Kamil Rytarowski wrote:
>> I know that you'll say it's because FreeBSD does it this way, but still it surprised
>> me.  If the core of GDB asks to resume one specific thread, it doesn't mean you should
>> suspend the other threads, they should just be left in whatever state they are.  Is there
>> a reason to stop the other threads here?
>>
> 
> In Linux there can be non-stop mode and threads are managed (stopped,
> started, etc) individually. On NetBSD we can start and stop the whole
> process, regardless of the number of threads in it.
> 
> We need to set the state of all threads before PT_CONTINUE (or assume
> unchanged state of them).
> 
> If a thread is not expected to be suspended, we shall set its flag to
> resume. The same for single-step (PT_STEP). This approach simplifies the
> implementation as I don't need to track internally which thread has what
> status and merely save a few syscalls.

Ok.

>>> +    }
>>> +   else
>>> +    {
>>> +      /* If ptid is a wildcard, resume all matching threads (they won't run
>>> +         until the process is continued however).  */
>>> +      for (thread_info *tp : all_non_exited_threads (this, ptid))
>>> +        if (ptrace (PT_RESUME, tp->ptid.pid (), NULL, tp->ptid.lwp ()) == -1)
>>> +          perror_with_name (("ptrace"));
>>> +      ptid = inferior_ptid;
>>
>> Can you explain this?  Ideally the resume method should not rely on inferior_ptid.  We
>> are working on reducing the dependencies on this global, in favor of passing the context
>> by parameters.  Here, the `ptid` parameter should be enough.
>>
> 
> The caller can pass minus_one_ptid, which makes no sense for NetBSD and
> certainly in the multi-target support.
> 
> Scenarios, as I can see them on NetBSD:
> 
>  - passing ptid_t (pid, 0, 0) -> resume the whole process with all the
> threads
>  - passing ptid_t (pid, lwp, 0) -> resume pid::lwp with all the other
> threads suspended in pid (`set scheduler-lock')
>  - passing ptid_t (-1, ?, ?) -> confusion to me so I try to find any
> fallback
> 
> If we can abandon the -1 case, I can drop the inferior_ptid usage. If
> the core no longer passes -1 (it could be), we shall cleanup existing
> code that handles this as it could happen. If a refactoring can happen,
> this shall be followed by addition of gdb_assert(ptid != minus_one_ptid).

Or a 4th one: resume all threads of all processes that this target manages.

I don't know off-hand if the core of GDB today can pass minus_one_ptid.  You could
add such an assert, run all your tests, and if everything works, conclude it's not
needed.  If the assert is hit, you'll know why.

>>> +      /* The common code passes WNOHANG that leads to crashes, overwrite it.  */
>>> +      pid = waitpid (ptid.pid (), &status, 0);
>>
>> What is crashing exactly?
>>
> 
> The core code asks to waitpid() with WNOHANG, receives pid=0 (nobody)
> and crashes.
> 
> It's a generic bug that shall be fixed... but it looks like nobody
> really obeys the core and the 3rd argument is overwritten (compare:
> rs6000-nat.c, obsd-nat.c, inf-ptrace.c or Linux...).
> 
> I've landed into more similar generic nuances during my effort on GDB,
> that handling events promptly can crash the core and I need to change
> the behavior to emulate other kernels.
> 
> It would be better to fix the core, but it is not NetBSD's fault and not
> a blocker for NetBSD support.
> 
> If there is interest in fixing it, I can file a bug report
> independently. I don't pledge to work on it myself (at least as long as
> I can merely focus on one kernel support).

Ok, I'm fine with doing this and documenting it somewhere.

Simon