Bug 29037 - Systemtap unable to find struct bitfield members for gcc11 compiled code
Summary: Systemtap unable to find struct bitfield members for gcc11 compiled code
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on: 28334
Blocks: 30123
  Show dependency treegraph
 
Reported: 2022-04-07 14:10 UTC by William Cohen
Modified: 2023-03-09 22:09 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
Tarball of the code for gcc11 bitfield reproducer (8.28 KB, application/gzip)
2022-04-07 14:10 UTC, William Cohen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Cohen 2022-04-07 14:10:32 UTC
Created attachment 14053 [details]
Tarball of the code for gcc11 bitfield reproducer

When reviewing the testsuite results for the systemtap examples I noticed several examples (tcp_trace, tcpdumplike, and schedtimes) failed because systemtap was not able to access various struct member bitfields for code generated by gcc11 compiler.  GCC11 is in Fedora 34 and later, so those examples do not work in fedora 34 or later.  However, they work in Fedora 33 and RHEL8.

Made a small reproducer to demonstrate the issue, fields.c and an associated fields_probes.stp.  On RHEL can see that systemtap finds the bitfield and prints out the information:

[wcohen@rhel8 ~]$ gcc -O2 -g -o fields fields.c
[wcohen@rhel8 ~]$ sudo stap -v fields_probes.stp -c "./fields"
Pass 1: parsed user script and 501 library scripts using 352200virt/152564res/18004shr/136660data kb, in 260usr/30sys/282real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 356688virt/158504res/19128shr/141148data kb, in 20usr/0sys/20real ms.
Pass 3: translated to C into "/tmp/staplw3Mn1/stap_728fe8eaae91c669f6cd3df4b02f5bf7_1361_src.c" using 356952virt/158964res/19320shr/141412data kb, in 10usr/60sys/69real ms.
Pass 4: compiled C into "stap_728fe8eaae91c669f6cd3df4b02f5bf7_1361.ko" in 2190usr/470sys/2763real ms.
Pass 5: starting run.
a.icsk_ca_state: false
a.icsk_ca_initialized: false
a.icsk_ca_setsockopt: false
a.icsk_ca_dst_locked: false
$a->icsk_ca_state = 0
Pass 5: run completed in 30usr/40sys/326real ms.


However, on Fedora Rawhide unable to find the bitfield:


[wcohen@rawhide ~]$ gcc -O2 -g -o fields fields.c 
[wcohen@rawhide ~]$ sudo stap -v fields_probes.stp -c "./fields"
Pass 1: parsed user script and 506 library scripts using 454360virt/212868res/17668shr/195016data kb, in 390usr/70sys/463real ms.
Pass 2: analyzed script: 1 probe, 1 function, 0 embeds, 0 globals using 462060virt/222736res/19484shr/202716data kb, in 30usr/0sys/39real ms.
Pass 3: translated to C into "/tmp/stapUdlYqF/stap_d24e675b2dbd63c03baff5e35cb87ef3_1219_src.c" using 463116virt/223908res/19612shr/203772data kb, in 10usr/60sys/63real ms.
Pass 4: compiled C into "stap_d24e675b2dbd63c03baff5e35cb87ef3_1219.ko" in 1760usr/270sys/2064real ms.
Pass 5: starting run.
a.icsk_ca_state: false
a.icsk_ca_initialized: false
a.icsk_ca_setsockopt: false
a.icsk_ca_dst_locked: false
no field icsk_ca_state
Pass 5: run completed in 10usr/40sys/342real ms.

The fields.c, fields_probes.stp and the dumps of the object code and dwarf are in the gcc11_bitfield.tar.gz.
Comment 1 William Cohen 2022-04-07 14:30:16 UTC
There are some differences in member information for the field.  In the dwarf for RHEL8 see:

0x000002f8:   DW_TAG_structure_type
                DW_AT_name	("fields")
                DW_AT_byte_size	(0x01)
                DW_AT_decl_file	("/home/wcohen/fields.c")
                DW_AT_decl_line	(6)
                DW_AT_decl_column	(0x08)
                DW_AT_sibling	(0x00000346)

0x00000305:     DW_TAG_member
                  DW_AT_name	("icsk_ca_state")
                  DW_AT_decl_file	("/home/wcohen/fields.c")
                  DW_AT_decl_line	(7)
                  DW_AT_decl_column	(0x0a)
                  DW_AT_type	(0x000002ec "__u8")
                  DW_AT_byte_size	(0x01)
                  DW_AT_bit_size	(0x05)
                  DW_AT_bit_offset	(0x03)
                  DW_AT_data_member_location	(0x00)


For Rawhide there is no DW_AT_data_member_location:

0x00000088:   DW_TAG_structure_type
                DW_AT_name	("fields")
                DW_AT_byte_size	(0x01)
                DW_AT_decl_file	("/home/wcohen/fields.c")
                DW_AT_decl_line	(6)
                DW_AT_decl_column	(0x08)
                DW_AT_sibling	(0x000000ca)

0x00000095:     DW_TAG_member
                  DW_AT_name	("icsk_ca_state")
                  DW_AT_decl_file	("/home/wcohen/fields.c")
                  DW_AT_decl_line	(7)
                  DW_AT_decl_column	(0x0a)
                  DW_AT_type	(0x0000007c "__u8")
                  DW_AT_bit_size	(0x05)
                  DW_AT_data_bit_offset	(0x00)

dwflpp.cxx:3470 has a check on DW_AT_data_member_location and only adds things if there a DW_AT_data_member_location.  According to DWARF5.pdf DW_AT_data_bit_offset would also be valid:

The member entry corresponding to a data member that is defined in a structure,
union or class may have either a DW_AT_data_member_location attribute or a
DW_AT_data_bit_offset attribute. If the beginning of the data member is the
same as the beginning of the containing entity then neither attribute is required.
Comment 2 William Cohen 2022-04-21 22:03:34 UTC
dwflpp.cxx:3470 has a check on DW_AT_data_member_location and only adds things if there a DW_AT_data_member_location seems to be part of the problem.  However, just adding an additional check for DW_AT_data_bit_offset there is not sufficient. Before adding the DW_AT_data_bit_offset check get the following in the -vvv output:

finding location for local 'a' near address 0x401170, module bias 0
get_cfa_ops @0x401170, module_start @0x400000
dwfl_module_dwarf_cfi failed: no error
got eh cfi bias: 0x0
found cfa, info: -1 [start: 0x7f2cd31f669d, end: 0x561b5f61cff0, nops: 1
chaining to identifier '$a' at fields_probes_f35.stp:2:16
semantic error: no location for field 'icsk_ca_state':no error: operator '->' at fields_probes_f35.stp:2:18
   thrown from: dwflpp.cxx:3481
        source:   if (@defined($a->icsk_ca_state))

After adding the DW_AT_data_bit_offset check:

finding location for local 'a' near address 0x401170, module bias 0
get_cfa_ops @0x401170, module_start @0x400000
dwfl_module_dwarf_cfi failed: no error
got eh cfi bias: 0x0
found cfa, info: -1 [start: 0x770000007c, end: 0x5b0000006e, nops: 1
chaining to identifier '$a' at fields_probes_f35.stp:2:16
semantic error: dwarf_getlocation_addr failed at this address (pc: 0x401170) [man error::dwarf]: identifier '$a' at fields_probes_f35.stp:2:16
        dwarf_error: not a location list value
        dieoffset: 0x95 from /home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/fields_f35
        function: f at /home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/fields.c:14:1
        <error getting alternative locations: not a location list value>
   thrown from: dwflpp.cxx:3266
        source:   if (@defined($a->icsk_ca_state))
                               ^
I did a quick check with gdb to see if it was able to extract the bit field information member information on the field_f35 example that does not have the DW_AT_data_member_location, and gdb was able to get the info, so this looks to be limited to systemtap rather than a problem in the elfutils libraries.

The fedora debuginfo is also missing the DW_AT_byte_size, so should check to see how/where that is being used.
Comment 3 William Cohen 2022-06-15 14:00:59 UTC
Did some comparisons of rr recorded runs of process the rhel8 and f35 binary with systemtap that a patch that recoginizes the DW_AT_data_bit_offset. dwarf_getlocation_address() function doesn't handle DW_AT_data_bit_offset being passed in as a Dwarf_Attribute and that later triggers the message seen in comment#2.

Took a look at the dwarf generated for a couple different data structures to see what is generated by the compiler for structures with bit fields.  The bit fields don't have DW_AT_data_member_location regardless of where they are in the data structure.  Thus something like following the later bit fields have DW_AT_data_bit_offset from the beginning of the struct:

typedef unsigned char __u8;

struct fields {
	int     junk;
	__u8    icsk_ca_setsockopt:1,
		icsk_ca_dst_locked:1;
};

0x0000007c:   DW_TAG_typedef
                DW_AT_name	("__u8")
                DW_AT_decl_file	("/home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/loctest2.c")
                DW_AT_decl_line	(4)
                DW_AT_decl_column	(0x17)
                DW_AT_type	(0x00000046 "unsigned char")

0x00000088:   DW_TAG_structure_type
                DW_AT_name	("fields")
                DW_AT_byte_size	(0x08)
                DW_AT_decl_file	("/home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/loctest2.c")
                DW_AT_decl_line	(6)
                DW_AT_decl_column	(0x08)
                DW_AT_sibling	(0x000000bb)

0x00000095:     DW_TAG_member
                  DW_AT_name	("junk")
                  DW_AT_decl_file	("/home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/loctest2.c")
                  DW_AT_decl_line	(7)
                  DW_AT_decl_column	(0x0a)
                  DW_AT_type	(0x00000031 "int")
                  DW_AT_data_member_location	(0x00)

0x000000a2:     DW_TAG_member
                  DW_AT_name	("icsk_ca_setsockopt")
                  DW_AT_decl_file	("/home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/loctest2.c")
                  DW_AT_decl_line	(8)
                  DW_AT_decl_column	(0x0a)
                  DW_AT_type	(0x0000007c "__u8")
                  DW_AT_bit_size	(1)
                  DW_AT_data_bit_offset	(0x20)

0x000000ae:     DW_TAG_member
                  DW_AT_name	("icsk_ca_dst_locked")
                  DW_AT_decl_file	("/home/wcohen/research/profiling/systemtap_write/gcc11_bitfield/loctest2.c")
                  DW_AT_decl_line	(9)
                  DW_AT_decl_column	(0x03)
                  DW_AT_type	(0x0000007c "__u8")
                  DW_AT_bit_size	(1)
                  DW_AT_data_bit_offset	(0x21)


gdb appears to handle bitfields properly with the existing elfutils. There looks to be patches in gdb to address handling the DW_AT_data_bit_offset:

commit 6ad036d703099508c388038b57c77a8f7aaffb1d
Author: Tom Tromey <tromey@adacore.com>
Date:   Fri Aug 20 10:05:10 2021 -0600

    Fix handling of DW_AT_data_bit_offset
    
    A newer version of GCC will now emit member locations using just
    DW_AT_data_bit_offset, like:


commit 20a5fcbd5b28cca88511ac5a9ad5e54251e8fa6d
Author: Tom Tromey <tom@tromey.com>
Date:   Wed Sep 23 09:39:24 2020 -0600

    Handle bit offset and bit size in base types
Comment 4 William Cohen 2022-06-24 21:31:09 UTC
Thinking through plan for DW_AT_data_bit_offset handling

One of the restrictions is that dwarf_getlocation_addr expects a
DW_AT_data_member_location doesn't understand DW_AT_data_bit_offset
attribute. The dwarf_getlocation_addr returns a DWARF_E_NO_LOC_VALUE
if there is no DW_AT_data_member_location.

Maybe synthesize a DW_AT_data_member_location from
DW_AT_data_bit_offset and the data type size in dwflpp.cxx
dwflpp::find_struct_member.  It is a bit more complicated than
DW_AT_data_bit_offset/8 as that might not be start of the data type
holding the bit field.

There are a number of places where DW_AT_bit_offset is checked in
dwflpp.cxx.  Those places will need to be expanded to handle
DW_AT_data_bit_offset.  Need to handle the differences between
DW_AT_data_bit_offset and DW_AT_bit_offset:
  -DW_AT_data_bit_offset little-endian, but DW_AT_bit_offset is big-endian
  -DW_AT_data_bit_offset is from start of struct, DW_AT_bit_offset is from start of DW_AT_data_member_location

Places where DW_AT_bit_offset used in dwflpp.cxx functions:
get_bitfield and dwflpp::translate_final_fetch_or_store. Also have
DW_AT_bit_offset in tapsets.cxx
dwarf_pretty_print::recurse_struct_members.

Generating an equivalent to DW_AT_data_member_location:


   member_location = (data_bit_offset / byte_size) * byte_size;


Generating bit offset from DW_AT_data_member_location


   bit_offset = data_bit_offset - member_location;


At this point have a partial patch on wcohen/pr29037 to address
pr29037 on
https://sourceware.org/git/?p=systemtap.git;a=shortlog;h=refs/heads/wcohen/pr29037
.  One sticking point point is creating a new
DW_AT_data_member_location attribute.  Copying the
DW_AT_data_bit_offset attribute and overwriting the
DW_AT_data_bit_offset with DW_AT_data_member_location is easy.
However, modifying the valp field with the correct value for
formudata() to read is not so obvious.

Another issue that looks like that is going to come up is PR28334 if
the structure is small enough to be passed in a registers.  Systemtap
doesn't handle extracting a item from a register unless it is in the
lsb bits of the register.
Comment 5 William Cohen 2022-07-01 17:39:22 UTC
Added code to https://sourceware.org/git/?p=systemtap.git;a=shortlog;h=refs/heads/wcohen/pr29037  to translate the DW_AT_data_bit_offset attributes into equivalent DW_AT_data_member_location attributes.  With the patches the reproducer now works:

$ sudo ../systemtap_write/install/bin/stap -v fields_probes.stp -c "./fields a a a"
Pass 1: parsed user script and 483 library scripts using 336212virt/96308res/17160shr/78904data kb, in 120usr/30sys/151real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 337796virt/99060res/18240shr/80488data kb, in 10usr/0sys/5real ms.
Pass 3: using cached /root/.systemtap/cache/82/stap_82e99d289011359ab6eb47f65e883ee3_1612.c
Pass 4: using cached /root/.systemtap/cache/82/stap_82e99d289011359ab6eb47f65e883ee3_1612.ko
Pass 5: starting run.
a.icsk_ca_state: true
a.icsk_ca_initialized: true
a.icsk_ca_setsockopt: true
a.icsk_ca_dst_locked: false
$a->icsk_ca_state = 1
Pass 5: run completed in 10usr/30sys/365real ms.


The code to handle the newer bitfield DWARF encoding needs to be cleaned up as it never frees the memory from the mallocs.
Comment 6 William Cohen 2022-07-08 17:36:41 UTC
Addressed by commit 68dfc446f117f074bae05035f05522b7c6055f25