Hello! I've got a binary that have debug_line information unsupported by libdw, because "discriminator" attribute exceeds 24 bit allocated for it in Dwarf_Line_s. ``` $ llvm-dwarfdump --debug-line=4453 liblog.so | grep 2013278720 0x0000000000006167 1427 22 4 0 2013278720 is_stmt ``` Problematic binary attached. I don't have a small test to reproduce an issue, but I'll try to create one.
Created attachment 15166 [details] liblog.so
Note that eu-readelf --debug-dump=info also has trouble with this (probably also because it cannot correct parse the large discriminator): $ eu-readelf --debug-dump=info ./liblog.so [...] Version: 5, Abbreviation section offset: 3225, Address size: 4, Offset size: 4 Unit type: compile (1) [ 51c9] compile_unit abbrev: 1 producer (strx1) "Android (10742551, +pgo, +bolt, +lto, + mlgo, based on r498229b) clang version 17.0.4 (https://android.googlesource.com/ toolchain/llvm-project e34ed7d63863b45858e74126edaa738e75887800)" language (data2) C_plus_plus_14 (33) name (strx1) "system/logging/liblog/logger_write.cpp" str_offsets_base (sec_offset) str offsets base [ 1198] stmt_list (sec_offset) 4453 low_pc (addr) +0000000000 ranges (rnglistx) range index [ 24] addr_base (sec_offset) address base [ 1d8] rnglists_base (sec_offset) range list [ 10a] loclists_base (sec_offset) location list [ 777] [ 51e7] base_type abbrev: 2 name (strx2) "DW_ATE_signed_32" encoding (data1) signed (5) byte_size (data1) 4 [ 51ec] base_type abbrev: 2 name (strx2) "DW_ATE_signed_64" encoding (data1) signed (5) byte_size (data1) 8 [ 51f1] subprogram abbrev: 3 low_pc (addrx) [1a] +0x00005880 <_Z13GetDefaultTagv> high_pc (data4) 114 (+0x000058f2) frame_base (exprloc) [ 0] reg5 call_all_calls (flag_present) yes abstract_origin (ref4) [ 9e3d] [ 51fd] variable abbrev: 4 name (strx1) "default_tag" type (ref4) [ 5215] eu-readelf: no srcfiles for CU [51c9] decl_file (data1) ??? (0) decl_line (data1) 152 location (exprloc) [ 0] addrx [0] +0xfe58 <_ZZ13GetDefaultTagvE11default_tag> [...] Note that the CU DIE says: stmt_list (sec_offset) 4453 And there definitely is a table at offset 4453 as can be seen with eu-readelf --debug-dump=line ./liblog.so
Is there some reason not to just bump up that bitfield width from :24 to :32 or more for now, until a deeper analysis of llvm informs us as to how wide these discriminator codes can really be?
(In reply to Frank Ch. Eigler from comment #3) > Is there some reason not to just bump up that bitfield width from :24 to :32 > or more for now, until a deeper analysis of llvm informs us as to how wide > these discriminator codes can really be? For me it is ok to bump that bitfield, but there is a warning in the code: > All the flags and other bit fields should add up to 48 bits > to give the whole struct a nice round size. So this question should be directed to the author of this code: Roland McGrath <roland@redhat.com>. I think there may be slight performance/memory issues, but as a temporary solution it looks good.
(In reply to Aleksei Vetrov from comment #4) > (In reply to Frank Ch. Eigler from comment #3) > > Is there some reason not to just bump up that bitfield width from :24 to :32 > > or more for now, until a deeper analysis of llvm informs us as to how wide > > these discriminator codes can really be? > > For me it is ok to bump that bitfield, but there is a warning in the code: > > > All the flags and other bit fields should add up to 48 bits > > to give the whole struct a nice round size. > > So this question should be directed to the author of this code: Roland > McGrath <roland@redhat.com>. I think there may be slight performance/memory > issues, but as a temporary solution it looks good. Added Roland (now with another email address) to the CC. Note that we have blown the size of this struct already to support the NVIDIA extensions :{ The issue here is that we create (very) large arrays of struct Dwarf_Line_s and we do that eagerly, see bug #27405 So we would like to keep that struct as small as possible. Another "solution"/workaround would be to just ignore such crazy big discriminator values and just set them to zero or store them modulo 24 bits (they are probably still unique then).
So my preferred workaround: diff --git a/libdw/dwarf_getsrclines.c b/libdw/dwarf_getsrclines.c index df003c5f..69e10c7b 100644 --- a/libdw/dwarf_getsrclines.c +++ b/libdw/dwarf_getsrclines.c @@ -129,6 +129,12 @@ add_new_line (struct line_state *state, struct linelist *new_line) return true; \ } while (0) + /* Same as above, but don't flag as "invalid" just use truncated + value. Used for discriminator for which llvm might use a value + that won't fit 24 bits. */ +#define SETX(field) \ + new_line->line.field = state->field; \ + SET (addr); SET (op_index); SET (file); @@ -140,7 +146,7 @@ add_new_line (struct line_state *state, struct linelist *new_line) SET (prologue_end); SET (epilogue_begin); SET (isa); - SET (discriminator); + SETX (discriminator); SET (context); SET (function_name); https://code.wildebeest.org/git/user/mjw/elfutils/commit/?h=really-large-disc
> So my preferred workaround: appears to be based on the assumption that truncated bitfields will not collide. Has this assumption been tested?
> The issue here is that we create (very) large arrays of struct Dwarf_Line_s > and we do that eagerly, see bug #27405 > So we would like to keep that struct as small as possible. Do we have an estimate about the numbers we're talking about here? The current Dwarf_Line_s object is 40 bytes long (on x86-64). Even if packing becomes less efficient, it could cost us say 2 bytes more per record. These records are packed together in allocation via realloc() et al. How many records do we see in programs of interest? readelf -wL /bin/emacs indicates about 800 thousand; libxul.so about 8 million. Is this about single digit megabytes more RAM in grand total? Note that bug #27405 was a request for optimization, not in order to save a few percent of memory for used data, but to save ALL the related memory & time for totally unused data. struct Dwarf_Line_s { Dwarf_Files * files; /* 0 8 */ Dwarf_Addr addr; /* 8 8 */ unsigned int file; /* 16 4 */ int line; /* 20 4 */ short unsigned int column; /* 24 2 */ /* Bitfield combined with previous fields */ unsigned int is_stmt:1; /* 24:16 4 */ unsigned int basic_block:1; /* 24:17 4 */ unsigned int end_sequence:1; /* 24:18 4 */ unsigned int prologue_end:1; /* 24:19 4 */ unsigned int epilogue_begin:1; /* 24:20 4 */ unsigned int op_index:8; /* 24:21 4 */ /* XXX 3 bits hole, try to pack */ /* Force alignment to the next boundary: */ unsigned int :0; unsigned int isa:8; /* 28: 0 4 */ unsigned int discriminator:24; /* 28: 8 4 */ unsigned int context; /* 32 4 */ unsigned int function_name; /* 36 4 */ /* size: 40, cachelines: 1, members: 15 */ /* sum members: 34 */ /* sum bitfield members: 45 bits, bit holes: 1, sum bit holes: 3 bits */ /* last cacheline: 40 bytes */ };
(In reply to Frank Ch. Eigler from comment #7) > So my preferred workaround: I like this workaround and it works in our use case. > appears to be based on the assumption that truncated bitfields will not > collide. Has this assumption been tested? No such assumption can be made, there is no guarantees from DWARF standard and it is possible to manually generate DWARF with discriminators exceeding even "unsigned int": just use any library and shift all values by 32 bits. It will be well within standard, however compilers are not expected to behave like this. In an "liblog.so" attachment truncated values don't collide. I'm more interested in this: what is worst that can happen on discriminator collision? We are not using discriminator at all in ABI monitoring, so I'm not familiar with its use cases.
(In reply to Frank Ch. Eigler from comment #7) > > So my preferred workaround: > > appears to be based on the assumption that truncated bitfields will not > collide. Has this assumption been tested? Most of the values used as discriminators in the test file are unique when truncated to their 24bit value. Note that the values only need to be unique per source line. I haven't tested that they are. But this is already much better than rejecting the whole line table. And all this only really matters for something actually using the discriminator (worst case some instructions are grouped together).
> I'm more interested in this: what is worst that can happen on discriminator > collision? We are not using discriminator at all in ABI monitoring, so I'm > not familiar with its use cases. That's a good point. There is an api call dwarf_linediscriminator() to fetch it for a client. In a bit of searching here and there (github etc.), I have not found any user so far, other than readelf itself to print. In gnu binutils, the field is "unsigned int" wide, and is used in bfd & gdb.
(In reply to Mark Wielaard from comment #6) > So my preferred workaround: > https://code.wildebeest.org/git/user/mjw/elfutils/commit/?h=really-large-disc Do you have plan to merge it to master?
(In reply to Aleksei Vetrov from comment #12) > (In reply to Mark Wielaard from comment #6) > > So my preferred workaround: > > https://code.wildebeest.org/git/user/mjw/elfutils/commit/?h=really-large-disc > > > Do you have plan to merge it to master? I pushed it, so we have at least a workaround in for the release. But I'll keep this bug open so we can do a proper fix.
Here are investigation results on why this happens in LLVM. It is a new algorithm for assigning "hierarchical discriminator for FSAFDO". From https://reviews.llvm.org/D102246: #define BASE_DIS_BIT_BEG 0 #define BASE_DIS_BIT_END 7 #define PASS_1_DIS_BIT_BEG 8 #define PASS_1_DIS_BIT_END 13 #define PASS_2_DIS_BIT_BEG 14 #define PASS_2_DIS_BIT_END 19 #define PASS_3_DIS_BIT_BEG 20 #define PASS_3_DIS_BIT_END 25 #define PASS_LAST_DIS_BIT_BEG 26 #define PASS_LAST_DIS_BIT_END 31 unsigned DiscriminatorCurrPass; DiscriminatorCurrPass = R.second ? ++LDCM[LD] : LDCM[LD]; DiscriminatorCurrPass = DiscriminatorCurrPass << LowBit; // LowBit is 26, this guarantees a big discriminator ^ DiscriminatorCurrPass += getCallStackHash(BB, I, DIL); DiscriminatorCurrPass &= BitMaskThisPass; unsigned NewD = Discriminator | DiscriminatorCurrPass; const auto *const NewDIL = DIL->cloneWithDiscriminator(NewD); FS discriminator seems to deliberately use bit 26-31 to add fields for its pass. So this big discriminator is caused by `-enable-fs-discriminator=true`. The good news is this will not increase discriminators to bigger than UINT_MAX.
Created attachment 15213 [details] logger_write.cpp Attaching "logger_write.cpp" and a way to reproduce binary with big discriminator with clang-16: ``` clang++-16 -c -std=gnu++17 -O2 -g -nostdlibinc -m32 -march=prescott -fPIC \ -fdebug-info-for-profiling -funique-internal-linkage-names -mllvm \ -enable-fs-discriminator=true -o logger_write.o logger_write.cpp ``` And the output of `llvm-dwarfdump --debug-line logger_write.o` should show something like this: ``` 0x00000000000000b3 5 127 0 0 1207968256 0x00000000000000bb 5 127 0 0 1207968257 0x00000000000000bf 5 144 0 0 1207968258 ```