[PATCH] (revised) read inlining info in an NVIDIA extended line map (was: Extension ...)

Wed Sep 15 18:25:22 GMT 2021

Mark,

Thanks for your feedback. I am working on a new version of the patch that changes
the interface for dwarf_linecontext and dwarf_linefunctionname to return a line pointer and a character pointer so it will be future proof.

See my other comments below, e.g. about an idea for reworking the Dwarf_line_s data structure.
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

> On Sep 10, 2021, at 12:11 PM, Mark Wielaard <mark@klomp.org> wrote:
> 
> Hi John,
> 
> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
> devel wrote:
>> My previous patch submission seems to have been overlooked as
>> buildbot issues consumed several days this week. However, discussion
>> in the mailing list now seems to have moved on beyond my submission
>> and I would like my patch considered. Here, I echo my previous
>> submission, except I improved my submission by including the prefix
>> [PATCH] in the subject and I included the patch in the message body
>> rather than as an attachment.
> 
> Sorry for the buildbot noise, that was indeed a little painful. But a
> buildbot that randomly fails is not much fun, so it took highest
> priority to get it back to green.
> 
> Your submission is really nice, having extensive documentation, all
> features implemented, a testcase. Well done.
> 
> There are however some concerns outside your control. It is somewhat
> disappointing NVIDIA didn't document this themselves, or tried to
> standardize this. You seems to have a good grasp of what was intended
> though. We have to be careful not to extend the api in a way that makes
> a better standard/implementation impossible. And the way we implemented
> Dwarf_Lines isn't ideal, so this extension shows we should maybe change
> it to be more efficient/compact. But maybe we can do that after adding
> the extension, we should however have a plan.
> 
>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
>> I now believe that NVIDIA’s implementation is consistent with that
>> proposal although NVIDIA uses a DWARF extended opcode for inlined
>> calls whereas Cary’s proposal uses DW_LNS_inlined_call (a standard
>> opcode), NVIDIA’s implementation uses DW_LNE_inlined_call (an
>> extended opcode).
> 
> The naming is one of the concerns. It would be better to use a name
> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
> show they are vendor extensions and don't clash with possible future
> standard opcode names.

I renamed the two new DWARF extended opcodes as you suggested.

> That it mimics the two-level linemaps proposal is a good thing. But
> lets make sure that the new accessor functions, dwarf_linecontext and
> dwarf_linefunctionname, are generic enough that they can hopefully be
> reused when two-level linemaps or a similar proposal becomes a
> standard.

>> A note about the source code for the test case reading the extended
>> linemap entries using libdw: this code was copied from another test
>> that used dwarf_next_lines and extended with code that reads the new
>> context and functionname fields of a line table entry.
> 
> Thanks for the test case, it makes clear how the new functionality can
> be used. How was the test binary, testfile_nvidia_linemap, generated?
> That should probably be documented inside the testcase.

I documented how the NVIDIA binary used in the two test cases was created
by adding comments to the two test cases.

> I won't be able to review all the code right now, but here are some
> high-level comments, so you know what I am thinking.
> 
> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu>
>>> wrote:
>>> 
>>> As of CUDA 11.2, NVIDIA added extensions to the line map section
>>> of CUDA binaries to represent inlined functions. These extensions
>>> include
>>> 
>>> - two new fields in a line table row to represent inline 
>>>  information: context, and functionname,
> 
> We didn't design the Dwarf_Line_s very well/compact. We already have
> the op_index, isa and discriminator fields which are almost never used.
> This adds two more. I wonder if we can split the struct up so that the
> extra fields are only used when actually used.

> Maybe this can be done with a flag in Dwarf_Lines_s indicating whether
> the array contains only the basic line attributes or also some extended
> values. Of course this makes dwarf_getsrclines even more complex
> because it then has to expand the line state struct whenever it first
> sees an extended attribute. But I really like to see if we can use less
> memory here. If we agree on some way to make that possible we can
> implement it afterwards.

Here are my thoughts about that. 

I think the type for Dwarf_Lines should be opaque rather than simply
a pointer to a Dwarf_Line_s record. The dwarf_onesrcline or routine
requires constant time random access to line records for efficiency.
Here is how I think that could be preserved.

Let the Dwarf_Line_s structure be a  union type: a small record with a bit or two that indicates the record kind, e.g.
enum dwarf_line_type { dwarf_line_compact, dwarf_line_nvidia_inline, dwarf_line_extended2, …}. 
All the fields for the compact record would be in one structure in the union. Other structures in the union
would just contain a pointer to an out-of-band extended record that is stored elsewhere. The storage could be managed as follows:
Have a buffer to store the lines. A sequence of compact Dwarf_Line_s records begin at the front of the buffer. Any time an extended record is needed, allocate it at the end of the buffer before the previous extended record. An extended record will have both a This could accommodate extended records of various lengths. Allocations in the buffer would manage the fact that compact line records are allocated at the front and the smaller number of larger extended line records are allocated at the back. When the the next allocation would cause the cursors to cross in the middle,  the buffer is full and another buffer is needed. The Dwarf_Lines type could have a pointer to a next Dwarf_Lines structure. If there is concern that finding a line would not really be constant time if there was a huge number of lines for a file line lookup could require following an unbounded chain of pointers to a next buffer, then the buffers could be managed in a splay tree or a skip list, which would give O(log n) time lookup of a line buffer followed by a constant time indexing into the buffer.

> 
>>> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>>>  DW_LNE_set_function_name,
> 
> Like I said above I think these names should contain the "vendor" name
> DW_LNE_NVIDIA_...

Done.

> 
>>> - an additional word in the line table header that indicates 
>>>  the offset in the .debug_str function where the function 
>>>  names for this line table begin, and
> 
> This is the only part I think is somewhat tricky. I believe you
> implement it cleverly by checking the header_length.

That strategy came from NVIDIA’s implementation in cuda-gdb . They check when there
is a gap between the end of the standard header and the header length.

> And your
> implementation should work, but it is also the part that no other tool
> understands, which means any such binary cannot really be handled by
> any other tool. And I don't really understand how this works when
> linking objects with such a offset into .debug_str. Normally the
> .debug_str section contains strings that can be merged, but that would
> disrupt the order, so I don't fully understand how the function_name
> indexes are kept correct. This would have been nicer as a DWARF5
> extension, where there is already a .debug_str_offsets section (but
> also confusing because there is also a special .debug_line_str section
> where these strings should probably point at).
> 
>>> - two new functions in the libdw API: dwarf_linecontext and 
>>>  dwarf_linefunctionname, which return the new line table fields.
> 
> I think it would be nice if we could make these functions return a
> Dwarf_Line * and a const char * to make them future proof in case a
> standard extension encodes these differently.

I will work on a patch that does just this as soon as I have time.