30967 – Discriminator in Dwarf_Line_s may exceed 24 bits

Bug 30967 - Discriminator in Dwarf_Line_s may exceed 24 bits

Summary: Discriminator in Dwarf_Line_s may exceed 24 bits

Status:	NEW

Alias:	None

Product:	elfutils
Classification:	Unclassified
Component:	libdw (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2023-10-12 13:30 UTC by Aleksei Vetrov
Modified:	2023-11-08 15:27 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2023-10-31 00:00:00

Attachments
liblog.so (222.61 KB, application/x-sharedlib) 2023-10-12 14:12 UTC, Aleksei Vetrov	Details
logger_write.cpp (337 bytes, application/octet-stream) 2023-11-07 19:11 UTC, Aleksei Vetrov	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Aleksei Vetrov 2023-10-12 13:30:43 UTC

Hello!

I've got a binary that have debug_line information unsupported by libdw, because "discriminator" attribute exceeds 24 bit allocated for it in Dwarf_Line_s.

```
$ llvm-dwarfdump  --debug-line=4453 liblog.so | grep 2013278720
0x0000000000006167   1427     22      4   0    2013278720  is_stmt
```

Problematic binary attached. I don't have a small test to reproduce an issue, but I'll try to create one.

Comment 1 Aleksei Vetrov 2023-10-12 14:12:14 UTC

Created attachment 15166 [details]
liblog.so

Comment 2 Mark Wielaard 2023-10-12 14:30:33 UTC

Note that eu-readelf --debug-dump=info also has trouble with this (probably also because it cannot correct parse the large discriminator):

$ eu-readelf --debug-dump=info ./liblog.so
[...]
 Version: 5, Abbreviation section offset: 3225, Address size: 4, Offset size: 4
 Unit type: compile (1)
 [  51c9]  compile_unit         abbrev: 1
           producer             (strx1) "Android (10742551, +pgo, +bolt, +lto, +
mlgo, based on r498229b) clang version 17.0.4 (https://android.googlesource.com/
toolchain/llvm-project e34ed7d63863b45858e74126edaa738e75887800)"
           language             (data2) C_plus_plus_14 (33)
           name                 (strx1) "system/logging/liblog/logger_write.cpp"
           str_offsets_base     (sec_offset) str offsets base [  1198]
           stmt_list            (sec_offset) 4453
           low_pc               (addr) +0000000000
           ranges               (rnglistx) range index [    24]
           addr_base            (sec_offset) address base [   1d8]
           rnglists_base        (sec_offset) range list [   10a]
           loclists_base        (sec_offset) location list [   777]
 [  51e7]    base_type            abbrev: 2
             name                 (strx2) "DW_ATE_signed_32"
             encoding             (data1) signed (5)
             byte_size            (data1) 4
 [  51ec]    base_type            abbrev: 2
             name                 (strx2) "DW_ATE_signed_64"
             encoding             (data1) signed (5)
             byte_size            (data1) 8
 [  51f1]    subprogram           abbrev: 3
             low_pc               (addrx) [1a] +0x00005880 <_Z13GetDefaultTagv>
             high_pc              (data4) 114 (+0x000058f2)
             frame_base           (exprloc) 
              [ 0] reg5
             call_all_calls       (flag_present) yes
             abstract_origin      (ref4) [  9e3d]
 [  51fd]      variable             abbrev: 4
               name                 (strx1) "default_tag"
               type                 (ref4) [  5215]
eu-readelf: no srcfiles for CU [51c9]
               decl_file            (data1) ??? (0)
               decl_line            (data1) 152
               location             (exprloc) 
                [ 0] addrx [0] +0xfe58 <_ZZ13GetDefaultTagvE11default_tag>
[...]

Note that the CU DIE says: stmt_list            (sec_offset) 4453

And there definitely is a table at offset 4453 as can be seen with eu-readelf --debug-dump=line ./liblog.so

Comment 3 Frank Ch. Eigler 2023-10-25 15:47:37 UTC

Is there some reason not to just bump up that bitfield width from :24 to :32 or more for now, until a deeper analysis of llvm informs us as to how wide these discriminator codes can really be?

Comment 4 Aleksei Vetrov 2023-10-26 11:19:54 UTC

(In reply to Frank Ch. Eigler from comment #3)
> Is there some reason not to just bump up that bitfield width from :24 to :32
> or more for now, until a deeper analysis of llvm informs us as to how wide
> these discriminator codes can really be?

For me it is ok to bump that bitfield, but there is a warning in the code:

> All the flags and other bit fields should add up to 48 bits
> to give the whole struct a nice round size.

So this question should be directed to the author of this code: Roland McGrath <roland@redhat.com>. I think there may be slight performance/memory issues, but as a temporary solution it looks good.

Comment 5 Mark Wielaard 2023-10-26 11:32:57 UTC

(In reply to Aleksei Vetrov from comment #4)
> (In reply to Frank Ch. Eigler from comment #3)
> > Is there some reason not to just bump up that bitfield width from :24 to :32
> > or more for now, until a deeper analysis of llvm informs us as to how wide
> > these discriminator codes can really be?
> 
> For me it is ok to bump that bitfield, but there is a warning in the code:
> 
> > All the flags and other bit fields should add up to 48 bits
> > to give the whole struct a nice round size.
> 
> So this question should be directed to the author of this code: Roland
> McGrath <roland@redhat.com>. I think there may be slight performance/memory
> issues, but as a temporary solution it looks good.

Added Roland (now with another email address) to the CC.

Note that we have blown the size of this struct already to support the NVIDIA extensions :{

The issue here is that we create (very) large arrays of struct Dwarf_Line_s and we do that eagerly, see bug #27405

So we would like to keep that struct as small as possible.

Another "solution"/workaround would be to just ignore such crazy big discriminator values and just set them to zero or store them modulo 24 bits (they are probably still unique then).

Comment 6 Mark Wielaard 2023-10-26 14:55:01 UTC

So my preferred workaround:

diff --git a/libdw/dwarf_getsrclines.c b/libdw/dwarf_getsrclines.c
index df003c5f..69e10c7b 100644
--- a/libdw/dwarf_getsrclines.c
+++ b/libdw/dwarf_getsrclines.c
@@ -129,6 +129,12 @@ add_new_line (struct line_state *state, struct linelist *new_line)
        return true;                                                  \
    } while (0)
 
+  /* Same as above, but don't flag as "invalid" just use truncated
+     value.  Used for discriminator for which llvm might use a value
+     that won't fit 24 bits.  */
+#define SETX(field)                                                  \
+     new_line->line.field = state->field;                            \
+
   SET (addr);
   SET (op_index);
   SET (file);
@@ -140,7 +146,7 @@ add_new_line (struct line_state *state, struct linelist *new_line)
   SET (prologue_end);
   SET (epilogue_begin);
   SET (isa);
-  SET (discriminator);
+  SETX (discriminator);
   SET (context);
   SET (function_name);
 

https://code.wildebeest.org/git/user/mjw/elfutils/commit/?h=really-large-disc

Comment 7 Frank Ch. Eigler 2023-10-26 15:12:24 UTC

> So my preferred workaround:

appears to be based on the assumption that truncated bitfields will not collide.  Has this assumption been tested?

Comment 8 Frank Ch. Eigler 2023-10-26 15:30:14 UTC

> The issue here is that we create (very) large arrays of struct Dwarf_Line_s
> and we do that eagerly, see bug #27405
> So we would like to keep that struct as small as possible.

Do we have an estimate about the numbers we're talking about here?  The current Dwarf_Line_s object is 40 bytes long (on x86-64).  Even if packing becomes less efficient, it could cost us say 2 bytes more per record.  These records are packed together in allocation via realloc() et al.  How many records do we see in programs of interest?  readelf -wL /bin/emacs indicates about 800 thousand; libxul.so about 8 million.  Is this about single digit megabytes more RAM in grand total?

Note that bug #27405 was a request for optimization, not in order to save a few percent of memory for used data, but to save ALL the related memory & time for totally unused data.


struct Dwarf_Line_s {
        Dwarf_Files *              files;                /*     0     8 */
        Dwarf_Addr                 addr;                 /*     8     8 */
        unsigned int               file;                 /*    16     4 */
        int                        line;                 /*    20     4 */
        short unsigned int         column;               /*    24     2 */

        /* Bitfield combined with previous fields */

        unsigned int               is_stmt:1;            /*    24:16  4 */
        unsigned int               basic_block:1;        /*    24:17  4 */
        unsigned int               end_sequence:1;       /*    24:18  4 */
        unsigned int               prologue_end:1;       /*    24:19  4 */
        unsigned int               epilogue_begin:1;     /*    24:20  4 */
        unsigned int               op_index:8;           /*    24:21  4 */

        /* XXX 3 bits hole, try to pack */

        /* Force alignment to the next boundary: */
        unsigned int               :0;

        unsigned int               isa:8;                /*    28: 0  4 */
        unsigned int               discriminator:24;     /*    28: 8  4 */
        unsigned int               context;              /*    32     4 */
        unsigned int               function_name;        /*    36     4 */

        /* size: 40, cachelines: 1, members: 15 */
        /* sum members: 34 */
        /* sum bitfield members: 45 bits, bit holes: 1, sum bit holes: 3 bits */
        /* last cacheline: 40 bytes */
};

Comment 9 Aleksei Vetrov 2023-10-26 15:37:38 UTC

(In reply to Frank Ch. Eigler from comment #7)
> So my preferred workaround:

I like this workaround and it works in our use case.

> appears to be based on the assumption that truncated bitfields will not
> collide.  Has this assumption been tested?

No such assumption can be made, there is no guarantees from DWARF standard and
it is possible to manually generate DWARF with discriminators exceeding even
"unsigned int": just use any library and shift all values by 32 bits. It will be
well within standard, however compilers are not expected to behave like this.
In an "liblog.so" attachment truncated values don't collide.

I'm more interested in this: what is worst that can happen on discriminator
collision? We are not using discriminator at all in ABI monitoring, so I'm not
familiar with its use cases.

Comment 10 Mark Wielaard 2023-10-26 15:46:19 UTC

(In reply to Frank Ch. Eigler from comment #7)
> > So my preferred workaround:
> 
> appears to be based on the assumption that truncated bitfields will not
> collide.  Has this assumption been tested?

Most of the values used as discriminators in the test file are unique when truncated to their 24bit value. Note that the values only need to be unique per source line. I haven't tested that they are. But this is already much better than rejecting the whole line table. And all this only really matters for something actually using the discriminator (worst case some instructions are grouped together).

Comment 11 Frank Ch. Eigler 2023-10-26 15:47:43 UTC

> I'm more interested in this: what is worst that can happen on discriminator
> collision? We are not using discriminator at all in ABI monitoring, so I'm
> not familiar with its use cases.

That's a good point.  There is an api call dwarf_linediscriminator() to fetch it for a client.  In a bit of searching here and there (github etc.), I have not found any user so far, other than readelf itself to print.

In gnu binutils, the field is "unsigned int" wide, and is used in bfd & gdb.

Comment 12 Aleksei Vetrov 2023-10-31 16:57:53 UTC

(In reply to Mark Wielaard from comment #6)
> So my preferred workaround:
> https://code.wildebeest.org/git/user/mjw/elfutils/commit/?h=really-large-disc


Do you have plan to merge it to master?

Comment 13 Mark Wielaard 2023-10-31 23:19:19 UTC

(In reply to Aleksei Vetrov from comment #12)
> (In reply to Mark Wielaard from comment #6)
> > So my preferred workaround:
> > https://code.wildebeest.org/git/user/mjw/elfutils/commit/?h=really-large-disc
> 
> 
> Do you have plan to merge it to master?

I pushed it, so we have at least a workaround in for the release.
But I'll keep this bug open so we can do a proper fix.

Comment 14 Aleksei Vetrov 2023-11-07 19:01:01 UTC

Here are investigation results on why this happens in LLVM. It is a new algorithm for assigning "hierarchical discriminator for FSAFDO".

From https://reviews.llvm.org/D102246:
   #define BASE_DIS_BIT_BEG 0
   #define BASE_DIS_BIT_END 7

   #define PASS_1_DIS_BIT_BEG 8
   #define PASS_1_DIS_BIT_END 13

   #define PASS_2_DIS_BIT_BEG 14
   #define PASS_2_DIS_BIT_END 19

   #define PASS_3_DIS_BIT_BEG 20
   #define PASS_3_DIS_BIT_END 25

   #define PASS_LAST_DIS_BIT_BEG 26
   #define PASS_LAST_DIS_BIT_END 31
   
   unsigned DiscriminatorCurrPass;
   DiscriminatorCurrPass = R.second ? ++LDCM[LD] : LDCM[LD];
   DiscriminatorCurrPass = DiscriminatorCurrPass << LowBit;
   // LowBit is 26, this guarantees a big discriminator ^
   DiscriminatorCurrPass += getCallStackHash(BB, I, DIL);
   DiscriminatorCurrPass &= BitMaskThisPass;
   unsigned NewD = Discriminator | DiscriminatorCurrPass;
   const auto *const NewDIL = DIL->cloneWithDiscriminator(NewD);

FS discriminator seems to deliberately use bit 26-31 to add fields for its pass.
So this big discriminator is caused by `-enable-fs-discriminator=true`.
The good news is this will not increase discriminators to bigger than UINT_MAX.

Comment 15 Aleksei Vetrov 2023-11-07 19:11:37 UTC

Created attachment 15213 [details]
logger_write.cpp

Attaching "logger_write.cpp" and a way to reproduce binary with big
discriminator with clang-16:

```
clang++-16 -c -std=gnu++17 -O2 -g -nostdlibinc -m32 -march=prescott -fPIC \
  -fdebug-info-for-profiling -funique-internal-linkage-names -mllvm \
  -enable-fs-discriminator=true -o logger_write.o logger_write.cpp
```

And the output of `llvm-dwarfdump --debug-line logger_write.o` should show
something like this:

```
0x00000000000000b3      5    127      0   0    1207968256
0x00000000000000bb      5    127      0   0    1207968257
0x00000000000000bf      5    144      0   0    1207968258
```