[PATCH] (v2) read inlining info in an NVIDIA extended line map

John Mellor-Crummey johnmc@rice.edu
Thu Nov 4 21:41:58 GMT 2021

[We would really like this patch in the forthcoming release]

Attached is a new version of the patch for reading inlining information encoded in an enhanced line map format used in NVIDIA GPU binaries for CUDA 11.2+.

This is an updated version of a patch first submitted on Sept. 5. A copy of the original submission email is quoted below this note. 

Here I describe just the improvements to that patch that address Mark’s concerns:

(1) all of the code for handling NVIDIA DWARF extensions is always available; there is no special configuration switch needed.
(2) all changes are bracketed by comments that mark them NVIDIA extensions
(3) the DWARF extended opcodes have been renamed with names that include NVIDIA in them
(4) the two new API functions to surface the new information have been improved to separate the interface result from the internal representation (at Mark’s request)
	(4a) the API for extracting the name of an inlined function in a DWARF line now returns a const char * instead of a string table index
	(4b) the API for extracting an inline “context” now returns a pointer to a DWARF line where the code is inlined rather than returning an unsigned int (an index into the line table that one could use to compute the pointer)
(5) there are test cases for readelf and libdw that use a binary generated by NVIDIA’s compiler. the test cases include information about how the binary was generated
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

Description of the first version of the patch: 

> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu <mailto:johnmc@rice.edu>> wrote:
> As of CUDA 11.2, NVIDIA added extensions to the line map section
> of CUDA binaries to represent inlined functions. These extensions
> include
> - two new fields in a line table row to represent inline 
>   information: context, and functionname,
> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>   DW_LNE_set_function_name,
> - an additional word in the line table header that indicates 
>   the offset in the .debug_str function where the function 
>   names for this line table begin, and
> - two new functions in the libdw API: dwarf_linecontext and 
>   dwarf_linefunctionname, which return the new line table fields.
> A line table row for an inlined function contains a non-zero
> "context" value. The “context” field indicates the index of the
> line table row that serves as the call site for an inlined
> context.
> The "functionname" field in a line table row is only meaningful
> if the "context" field of the row is non-zero. A meaningful
> "functionname" field contains an index into the .debug_str
> section relative to the base offset established in the line table
> header; the position in the .debug_str section indicates the name
> of the inlined function.
> These extensions resemble the proposed DWARF extensions
> (http://dwarfstd.org/ShowIssue.php?issue=140906.1 <http://dwarfstd.org/ShowIssue.php?issue=140906.1>) by Cary
> Coutant, but are not identical.
> This patch adds integrates support for handling NVIDIA's extended
> line maps into elfutil's libdw library and the readelf command
> line utility.
> Since this support is a non-standard extension to DWARF, all code
> that implements the extensions is implemented between markers  
> The definition below
> is added to elfutils/version.h, which enables a client of elfutils 
> to test whether the NVIDIA line map extensions are present. 
> Note: The support for NVIDIA extended line maps adds two integer
> fields (context and functionname) to struct Dwarf_Line_s, which
> makes the structure about 30% larger.
> The patch includes a binary testfile_nvidia_linemap.bz2 that
> contains an NVIDIA extended linemap along with two tests that
> read the line map.
> - A test script run-nvidia-extended-linemap-readelf.sh 
>   checks the output of readelf on the new test binary to 
>   validate its dump of the line op codes containing context 
>   and functionname entries.
> - A test program tests/nvidia_extended_linemap_libdw.c reads 
>   the extended line map with dwarf_next_lines and dumps the 
>   context and functionname fields of the line map where they 
>   are relevant, i.e. the value of context is non-zero. A test 
>   script run-nvidia-extended-linemap-libdw.sh runs this test 
>   and validates its output.
> A patch with the new functionality described above is attached.

Discussion about the first version of the patch:

> On Sep 15, 2021, at 1:25 PM, John Mellor-Crummey <johnmc@rice.edu> wrote:
> Mark,
> Thanks for your feedback. I am working on a new version of the patch that changes
> the interface for dwarf_linecontext and dwarf_linefunctionname to return a line pointer and a character pointer so it will be future proof.
> See my other comments below, e.g. about an idea for reworking the Dwarf_line_s data structure.
> --
> John Mellor-Crummey		Professor
> Dept of Computer Science	Rice University
> email: johnmc@rice.edu <mailto:johnmc@rice.edu>		phone: 713-348-5179
>> On Sep 10, 2021, at 12:11 PM, Mark Wielaard <mark@klomp.org <mailto:mark@klomp.org>> wrote:
>> Hi John,
>> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
>> devel wrote:
>>> My previous patch submission seems to have been overlooked as
>>> buildbot issues consumed several days this week. However, discussion
>>> in the mailing list now seems to have moved on beyond my submission
>>> and I would like my patch considered. Here, I echo my previous
>>> submission, except I improved my submission by including the prefix
>>> [PATCH] in the subject and I included the patch in the message body
>>> rather than as an attachment.
>> Sorry for the buildbot noise, that was indeed a little painful. But a
>> buildbot that randomly fails is not much fun, so it took highest
>> priority to get it back to green.
>> Your submission is really nice, having extensive documentation, all
>> features implemented, a testcase. Well done.
>> There are however some concerns outside your control. It is somewhat
>> disappointing NVIDIA didn't document this themselves, or tried to
>> standardize this. You seems to have a good grasp of what was intended
>> though. We have to be careful not to extend the api in a way that makes
>> a better standard/implementation impossible. And the way we implemented
>> Dwarf_Lines isn't ideal, so this extension shows we should maybe change
>> it to be more efficient/compact. But maybe we can do that after adding
>> the extension, we should however have a plan.
>>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
>>> I now believe that NVIDIA’s implementation is consistent with that
>>> proposal although NVIDIA uses a DWARF extended opcode for inlined
>>> calls whereas Cary’s proposal uses DW_LNS_inlined_call (a standard
>>> opcode), NVIDIA’s implementation uses DW_LNE_inlined_call (an
>>> extended opcode).
>> The naming is one of the concerns. It would be better to use a name
>> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
>> show they are vendor extensions and don't clash with possible future
>> standard opcode names.
> I renamed the two new DWARF extended opcodes as you suggested.
>> That it mimics the two-level linemaps proposal is a good thing. But
>> lets make sure that the new accessor functions, dwarf_linecontext and
>> dwarf_linefunctionname, are generic enough that they can hopefully be
>> reused when two-level linemaps or a similar proposal becomes a
>> standard.
>>> A note about the source code for the test case reading the extended
>>> linemap entries using libdw: this code was copied from another test
>>> that used dwarf_next_lines and extended with code that reads the new
>>> context and functionname fields of a line table entry.
>> Thanks for the test case, it makes clear how the new functionality can
>> be used. How was the test binary, testfile_nvidia_linemap, generated?
>> That should probably be documented inside the testcase.
> I documented how the NVIDIA binary used in the two test cases was created
> by adding comments to the two test cases.
>> I won't be able to review all the code right now, but here are some
>> high-level comments, so you know what I am thinking.
>> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu <mailto:johnmc@rice.edu>>
>>>> wrote:
>>>> As of CUDA 11.2, NVIDIA added extensions to the line map section
>>>> of CUDA binaries to represent inlined functions. These extensions
>>>> include
>>>> - two new fields in a line table row to represent inline 
>>>>  information: context, and functionname,
>> We didn't design the Dwarf_Line_s very well/compact. We already have
>> the op_index, isa and discriminator fields which are almost never used.
>> This adds two more. I wonder if we can split the struct up so that the
>> extra fields are only used when actually used.
>> Maybe this can be done with a flag in Dwarf_Lines_s indicating whether
>> the array contains only the basic line attributes or also some extended
>> values. Of course this makes dwarf_getsrclines even more complex
>> because it then has to expand the line state struct whenever it first
>> sees an extended attribute. But I really like to see if we can use less
>> memory here. If we agree on some way to make that possible we can
>> implement it afterwards.
> Here are my thoughts about that. 
> I think the type for Dwarf_Lines should be opaque rather than simply
> a pointer to a Dwarf_Line_s record. The dwarf_onesrcline or routine
> requires constant time random access to line records for efficiency.
> Here is how I think that could be preserved.
> Let the Dwarf_Line_s structure be a  union type: a small record with a bit or two that indicates the record kind, e.g.
> enum dwarf_line_type { dwarf_line_compact, dwarf_line_nvidia_inline, dwarf_line_extended2, …}. 
> All the fields for the compact record would be in one structure in the union. Other structures in the union
> would just contain a pointer to an out-of-band extended record that is stored elsewhere. The storage could be managed as follows:
> Have a buffer to store the lines. A sequence of compact Dwarf_Line_s records begin at the front of the buffer. Any time an extended record is needed, allocate it at the end of the buffer before the previous extended record. An extended record will have both a This could accommodate extended records of various lengths. Allocations in the buffer would manage the fact that compact line records are allocated at the front and the smaller number of larger extended line records are allocated at the back. When the the next allocation would cause the cursors to cross in the middle,  the buffer is full and another buffer is needed. The Dwarf_Lines type could have a pointer to a next Dwarf_Lines structure. If there is concern that finding a line would not really be constant time if there was a huge number of lines for a file line lookup could require following an unbounded chain of pointers to a next buffer, then the buffers could be managed in a splay tree or a skip list, which would give O(log n) time lookup of a line buffer followed by a constant time indexing into the buffer.
>>>> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>>>>  DW_LNE_set_function_name,
>> Like I said above I think these names should contain the "vendor" name
> Done.
>> ...
>>>> - two new functions in the libdw API: dwarf_linecontext and 
>>>>  dwarf_linefunctionname, which return the new line table fields.
>> I think it would be nice if we could make these functions return a
>> Dwarf_Line * and a const char * to make them future proof in case a standard extension encodes these differently.

More information about the Elfutils-devel mailing list