Bug 25111 - [Zephyr/thread aware debugging] remote: write_ptid returns negative tid.
Summary: [Zephyr/thread aware debugging] remote: write_ptid returns negative tid.
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: remote (show other bugs)
Version: 8.3.1
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 14314 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-17 07:24 UTC by Evgeniy
Modified: 2024-02-15 16:59 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2022-01-07 00:00:00


Attachments
write_ptid_long_type.patch (681 bytes, patch)
2021-11-07 08:06 UTC, Fredrik Hederstierna
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Evgeniy 2019-10-17 07:24:37 UTC
In Zephyr the k_thread_create function returns thread ID which is actually pointer to k_thread structure. See:
https://docs.zephyrproject.org/2.0.0/reference/kernel/threads/index.html?highlight=k_thread_create#_CPPv315k_thread_createP8k_threadP16k_thread_stack_t6size_t16k_thread_entry_tPvPvPvi5u32_t5s32_t
https://elixir.bootlin.com/zephyr/latest/source/include/kernel.h#L592

In ARC case the address space starts from 0x80000000, and passing such big values(>2147483648) to write_ptid function leads to overflow of "int tid" variable. Actually process/thread id's are positive and rarely exceeds the maximum of Int. 

Do we really need this negative tid/pid handling? Maybe moving to uint32_t type
would be better there?
Comment 1 Fredrik Hederstierna 2021-11-07 08:04:16 UTC
When looking into Zephyr and OpenOCD I just saw this ticket. Isn't there a discrepancy in <gdb/remote.c> for ptid_t write/read when it comes to types. The 'lwp' is used as 'tid', but has a wider type:

  int 	ptid_get_pid (ptid_t ptid)
  long 	ptid_get_lwp (ptid_t ptid)
  long 	ptid_get_tid (ptid_t ptid)

so in the <read_ptid> it reads into ULONGEST, which I guess is uint64_t, but the <write_tpid> instead implicit cast long to int when using 'lwp as 'tid' (which I believe is often int32_t on a normal host target machine):

  int tid;
  tid = ptid.lwp ();

Wouldn't it be better to keep the wider long type, and keep calling 'tid' for 'lwp' which it actually is? And also keeping the long-type in the write function? In this case there won't be any overflow if an 32bit int representing 'tid' is becoming >0x80000000. The read function I guess will still handle it as it is using ULONGEST to my understanding.

Attaching idea of a possible patch.
Comment 2 Fredrik Hederstierna 2021-11-07 08:06:39 UTC
Created attachment 13765 [details]
write_ptid_long_type.patch

Extends type of 'tid' when writing ptid.
Comment 3 Simon Marchi 2021-11-07 12:44:20 UTC
At first glance, it seems you're right.
Comment 4 Tom Tromey 2022-01-07 21:11:04 UTC
*** Bug 14314 has been marked as a duplicate of this bug. ***
Comment 5 Tom Tromey 2022-01-07 21:11:48 UTC
Can you send the patch to gdb-patches?
See the contribution checklist for all the details:
https://sourceware.org/gdb/wiki/ContributionChecklist

I think this is a longstanding bug, see the dup I just closed...
Comment 6 Tom Tromey 2023-08-03 17:07:30 UTC
I guess that never happened, the bug is still there.
It seems weird to me that we used signed types in ptid_t.
Comment 7 Fredrik Hederstierna 2023-08-04 07:32:03 UTC
Unfortunately I am currently not using Zephyr myself, so I would probably need some help to get a better formatting and testing the patch. Anyone is welcome to assist in improving the packaging/testing the patch, if interested in getting this patch merged. Unfortunately I haven't any good Zephyr test setup myself.
Comment 8 Tom Tromey 2024-02-15 16:59:11 UTC
I looked a little.

Changing ptid_t::pid_type to be unsigned is probably a colossal
change, bordering on the impossible.

However, it does seem that negative values aren't really intended --
except for the special case of -1.  The manual says:

The PID (process) and TID
(thread) components each have the format described above: a positive
number with target-specific interpretation formatted as a big-endian hex
string, literal ‘-1’ to indicate all processes or threads
(respectively), or ‘0’ to indicate an arbitrary process or thread.

So in remote_target::write_ptid, I wonder if it's sufficient to
change the "pid < 0" check to "pid == -1" (and likewise for tid).