[PATCH] gdb: include address in names of objfiles created by jit reader API

Fri Feb 4 13:15:49 GMT 2022

On 2022-02-04 07:39, Jan Vrany wrote:
> On Wed, 2022-02-02 at 11:17 -0500, Simon Marchi wrote:
>> On 2022-02-02 7:03 a.m., Jan Vrany via Gdb-patches wrote:
>>> This commit includes jited object address in the names of objfiles
>>> created by jit reader API (e.g., << JIT compiled code at 0x7ffd8a0c77a0 >>).
>>> This allows one to at least differentiate one from another.
>>> ---
>>>  gdb/jit.c                             | 8 ++++++--
>>>  gdb/testsuite/gdb.base/jit-reader.exp | 4 ++--
>>>  2 files changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gdb/jit.c b/gdb/jit.c
>>> index 42776b95683..371cb8b1d48 100644
>>> --- a/gdb/jit.c
>>> +++ b/gdb/jit.c
>>> @@ -624,12 +624,16 @@ jit_object_close_impl (struct gdb_symbol_callbacks *cb,
>>>  		       struct gdb_object *obj)
>>>  {
>>>    struct objfile *objfile;
>>> +  char objfile_name[64];
>>>    jit_dbg_reader_data *priv_data;
>>>  
>>>
>>>
>>>
>>>    priv_data = (jit_dbg_reader_data *) cb->priv_data;
>>>  
>>>
>>>
>>>
>>> -  objfile = objfile::make (nullptr, "<< JIT compiled code >>",
>>> -			   OBJF_NOT_FILENAME);
>>> +  snprintf (objfile_name, sizeof (objfile_name) - 1,
>>> +            "<< JIT compiled code at 0x%" PRIxPTR " >>",
>>> +            reinterpret_cast<uintptr_t> (priv_data));
>>> +  objfile_name[sizeof (objfile_name) - 1] = '\0';
>>> +  objfile = objfile::make (nullptr, objfile_name, OBJF_NOT_FILENAME);
>>
>> I think this is printing a random stack address in GDB itself, doesn't it?
>> priv_data is initialized to point to a stack variable in
>> jit_reader_try_read_symtab.
> 
> Yes, I misread the code, sorry. 
> 
>>
>> I think what would speak more to the user would be the address in the
>> inferior where the JIT-ed code is.  The JIT engine is likely to have some
>> logging that says "I am JIT-ing some code and placing it at address 0x1234".
>> So having the objfile name say 0x1234 allows them to correlate the code they
>> generated with GDB's objfiles.  Maybe that was your intention, not sure.
> 
> This seems tricky to me. IIUC, using JIT reader API, JIT (inferior) creates some 
> debug info somewhere in its address space and then tell GDB to read it from there,
> right? This address is the symfile_addr your patch below is putting into objfile's name.
> 
> But this address may differ from the location where the actual executable code is. I'd even
> think in most cases it will be different.

Yes, that's my understanding.  If the program was not using a custom JIT
reader, it would put something readable by BFD (e.g. an ELF) at that
address.  But if using a custom JIT reader, it puts a data structure of
its choice, as long as the JIT (the inferior) and the custom JIT reader
agree on the format.

So you are right, we can't easily put the address of the actual code.  I
probably confused the two.

My only little concern is whether that symfile_addr is always going to
be unique.  Could you have a JIT and custom JIT reader where the JIT
always re-uses the same buffer to hold the symfile?  After you
registered one JIT object and the custom JIT reader has created the
objfile, I don't really see a requirement for keeping the symfile data
around.  I could imagine that the JIT could fill the buffer with data
for a second object an register ir.  So if we use the symfile_addr in
the objfile's name, we would have two objfiles with the same name.

Code entries are more likely to be unique, as the inferior is supposed
to keep code entries for existing JIT objects in a linked list, it can't
get rid of them after registering them.

> Also, the jitted language may support nested functions / lambdas so it may produce multiple 
> "machine" functions for a single "language" function. 
> 
> Even the jit-reader.exp does "JIT" two functions and register them at once.

Yes, it's two function in a single object file, as if you had two
functions in one ELF.

> 
>>
>> Also, let's use string_printf instead of playing with char buffers.
>>
>> What about the patch below?
> 
> Yes, this is better than my original attempt. But still, the value it
> prints can be bit confusing in the context of "maintenance info jit"
> command. When I applied your patch and run jit-reader.exp, then:
> 
> (gdb) maintenance info jit
> Base address of known JIT-ed objfiles:
>   0x5555555580a0
> (gdb) print(gdb.objfiles()[-1])
> No symbol "gdb" in current context.
> (gdb) python print(gdb.objfiles()[-1])
> <gdb.Objfile filename=<< JIT compiled code at 0x5555555592a0 >>>
> 
> As you can see, the addresses differ. IIUC that's because "maint info jit" shows
> address of struct jit_code_entry (in inferior address space) while the name 
> of the objfile contains value of jit_code_entry->symfile_addr. 
> 
> What about changing "maintenance info jit" to print the symfile_addr instead? 
> Or the other way round. See the patch below (to be applied after yours):

I agree, it would make sense for "maintenance info jit" and the objfile
name to display the same addresses.  Given the concern I shared above,
maybe the code entry address is better in the end?

In any case, maybe can change "maint info jit" to display both values in
a way that makes it clear which is which.  If I was debugging my JIT
stuff and was confused with all those addresses, I think that would help
me if it said, for each jit object:

  - Code entry address: 0x1234
  - Symfile address: 0x2345

That's with the understanding that the symfile address could be re-used
for multiple objects, but at least it shows what was the address given
at the time of registration, which can still be helpful to debug
problems.

Simon