Bug 28153

Summary: [test] gmon/tst-gmon-gprof* may have a f3 line when built with ld.lld
Product: glibc Reporter: Fangrui Song <i>
Component: libcAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: drepper.fsp, hjl.tools
Priority: P2    
Version: 2.34   
Target Milestone: 2.35   
Host: Target:
Build: Last reconfirmed: 2021-07-29 00:00:00

Description Fangrui Song 2021-07-28 21:10:32 UTC
Extracted from https://sourceware.org/pipermail/libc-alpha/2021-July/129450.html

To build LLD 13.0.0:

# https://github.com/llvm/llvm-project/ origin/release/13.x
cmake -H. -Bout/release -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='clang;lld'
ninja -C out/release lld

I use
sudo ln -sf out/release/bin/lld /usr/local/bin/ld
to ensure the glibc build system definitely picks ld => lld
(messing around with LDFLAGS=-fuse-ld=lld may work as well)


For gmon/tst-gmon-gprof*, ld.lld linked tst-gmon-gprof has a f3 line, which is considered a failure by the test. However, having the f3 line appears more correct to me. I am unclear why ld.bfd tst-gmon-gpro doesn't have f3.

% cat gmon/tst-gmon-gprof.out
--- expected
+++ actual
@@ -1,2 +1,3 @@
 f1 2000
 f2 1000
+f3 1
FAIL
Comment 1 H.J. Lu 2021-07-29 16:50:16 UTC
On ld.bfd generated gmon/tst-gmon, gprof shows

ndex % time    self  children    called     name
                0.00    0.00    1000/2000        f2 [2]
                0.00    0.00    1000/2000        f3 [7]
[1]      0.0    0.00    0.00    2000         f1 [1]
-----------------------------------------------
                0.00    0.00    1000/1000        f3 [7]
[2]      0.0    0.00    0.00    1000         f2 [2]
                0.00    0.00    1000/2000        f1 [1]
-----------------------------------------------
...
Index by function name

   [1] f1                      [2] f2

Why do you think the output is incorrect?
Comment 2 Fangrui Song 2021-07-30 04:35:35 UTC
lld linked gmon/tst-gmon has f3.

% cd ~/Dev/glibc/out/lld
% ./testrun.sh gmon/tst-gmon
% gprof gmon/tst-gmon
...
                                                                                                                                                                                                   
                     Call graph (explanation follows)                                                                                                                                               
                                                                                                                                                                                                    
                                                                                                                                                                                                    
granularity: each sample hit covers 4 byte(s) no time propagated                                                                                                                                    
                                                                                                                                                                                                    
index % time    self  children    called     name
                0.00    0.00    1000/2000        f2 [2]
                0.00    0.00    1000/2000        f3 [3]
[1]      0.0    0.00    0.00    2000         f1 [1]
-----------------------------------------------
                0.00    0.00    1000/1000        f3 [3]
[2]      0.0    0.00    0.00    1000         f2 [2]
                0.00    0.00    1000/2000        f1 [1]
-----------------------------------------------
                0.00    0.00       1/1           main [9]
[3]      0.0    0.00    0.00       1         f3 [3]
                0.00    0.00    1000/1000        f2 [2]
                0.00    0.00    1000/2000        f1 [1]
-----------------------------------------------
...
Index by function name

   [1] f1                      [2] f2                      [3] f3
Comment 3 H.J. Lu 2021-07-30 22:48:20 UTC
ld.bfd generates:

0000000000401000 T _init
0000000000401090 T main
00000000004010b0 T _start
00000000004010e0 T __gmon_start__
0000000000401130 t deregister_tm_clones
0000000000401160 t register_tm_clones
00000000004011a0 t __do_global_dtors_aux
00000000004011d0 t frame_dummy
0000000000401250 T atexit
0000000000401264 T _fini
0000000000401271 T etext

ld.lld generates:

0000000000201ba0 T _start
0000000000201bd0 T __gmon_start__
0000000000201c20 t deregister_tm_clones
0000000000201c50 t register_tm_clones
0000000000201c90 t __do_global_dtors_aux
0000000000201cc0 t frame_dummy
0000000000201d40 T main
0000000000201d60 t atexit
0000000000201d74 t _init
0000000000201d90 t _fini
0000000000201e10 T etext

glibc assumes that _start is the lowest address for which we need o keep profiling records.  The newer GCC places main in .text.startup section.
Since _start is in .text section, ld.bfd places main below _start.   Put
_start in .text.startup section makes f3 to show up with ld.bfd.
Comment 4 H.J. Lu 2021-07-31 04:23:22 UTC
A patch is posted at

https://sourceware.org/pipermail/libc-alpha/2021-July/129685.html
Comment 5 H.J. Lu 2021-08-24 14:18:26 UTC
Fixed for glibc 2.35 by

commit 84a7eb1f87c1d01b58ad887a0ab5d87abbc1c772
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Jul 30 19:07:30 2021 -0700

    Use __executable_start as the lowest address for profiling [BZ #28153]
    
    Glibc assumes that ENTRY_POINT is the lowest address for which we need
    to keep profiling records and BFD linker uses a linker script to place
    the input sections.
    
    Starting from GCC 4.6, the main function is placed in .text.startup
    section and starting from binutils 2.22, BFD linker with
    
    commit add44f8d5c5c05e08b11e033127a744d61c26aee
    Author: Alan Modra <amodra@gmail.com>
    Date:   Thu Nov 25 03:03:02 2010 +0000
    
                * scripttempl/elf.sc: Group .text.exit, text.startup and .text.hot
                sections.
    
    places .text.startup section before .text section, which leave the main
    function out of profiling records.
    
    Starting from binutils 2.15, linker provides __executable_start to mark
    the lowest address of the executable.  Use __executable_start as the
    lowest address to keep the main function in profiling records. This fixes
    [BZ #28153].
    
    Tested on Linux/x86-64, Linux/x32 and Linux/i686 as well as with
    build-many-glibcs.py.