Slow handling of C++ symbol names

Will Cohen wcohen@redhat.com
Wed Nov 19 21:10:00 GMT 2003


When debugging some C++ program with gdb (both versions 5.3 and 6.0) it 
takes a long time to load the debugging information and get a 
command-line prompt. The particular example that demonstrates this 
problem is monotone (http://www.venge.net/monotone/) which makes use of 
the boost libraries (http://www.boost.org/). gdb loading monotone with 
the associated debugging information takes over 30 seconds to get the 
initial command-line prompt on a 1.7GHz Athlon with 512MB or DRAM. 
monotone is not a small program, over 3MB of debugging information. I 
tooks some measurements with OProfile to find out where gdb 5.3 was 
spending its time. 3/4 of the time is spent in dyn_string_substring and 
__GI_strncpy, the column below the '%'. I recompiled gdb with '-pg' to 
find that this is due to the C++ name demangling. The highlights of 
gprof output are shown below. I looked at the symbols for monotone/boost 
in the executable, some of them are VERY long and for monotone/boost 
they average 147 characters in length. An example of ONE symbol over 2KB 
is at the end of this mail (there are others like that). I suspect that 
some of the algorithms used in gdb may not expect such long symbol names 
and be part of the reason for the slow startup, e.g. work 
O(length_of_string ^2). Has anyone else noticed this type of problem before?

[wcohen@slingshot monotone-0.7]$ oprofpp -l -r -k /usr/bin/gdb|more
Cpu type: Athlon
Cpu speed was (MHz estimation) : 1687.55
Counter 0 counted CPU_CLK_UNHALTED events (Cycles outside of halt state) 
with a
unit mask of 0x00 (No unit mask) count 843500
vma      samples  %           symbol name             image name
081e20c0 36781    55.9424     dyn_string_substring 
/usr/lib/debug/usr/bin/gdb.debug
00077290 14743    22.4235     __GI_strncpy            /lib/tls/libc-2.3.2.so
080f5e40 3434     5.22297     msymbol_hash_iw 
/usr/lib/debug/usr/bin/gdb.debug
00021dc0 1748     2.65864     __ctype_b_loc           /lib/tls/libc-2.3.2.so
00076990 1603     2.4381      __GI_strcpy             /lib/tls/libc-2.3.2.so
081e1e10 1417     2.1552      dyn_string_insert 
/usr/lib/debug/usr/bin/gdb.debug


The profile information from gprof.

Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls   s/call   s/call  name
  88.41     18.46    18.46   354132     0.00     0.00  dyn_string_substring
   5.84     19.68     1.22    19299     0.00     0.00  msymbol_hash_iw
...


                 0.00    0.00  220617 demangle_class_enum_type <cycle 2> 
[423]
                 0.00    0.00      95             demangle_function_type 
<cycle 2> [482]
-----------------------------------------------
                 0.00    0.74   14108/354132      demangle_name <cycle 
2> [29]
                 0.00    1.04   19908/354132      demangle_type_ptr 
<cycle 2> [28]
                 0.00    4.55   87228/354132      demangle_prefix <cycle 
2> [24]
                 0.01   12.16  232888/354132      demangle_type <cycle 
2> [23]
[21]    88.6    0.01   18.49  354132         substitution_add [21]
                18.46    0.01  354132/354132      dyn_string_substring [22]
                 0.00    0.01  354132/979183      dyn_string_new [40]
-----------------------------------------------
                18.46    0.01  354132/354132      substitution_add [21]
[22]    88.5   18.46    0.01  354132         dyn_string_substring [22]
                 0.01    0.01  354132/4003799     dyn_string_resize [35]
-----------------------------------------------
                                13904             demangle_type <cycle 
2> [23]
                                   55             demangle_operator_name 
<cycle 2>


Find out how many symbols monotone has
$ nm monotone|wc
   23623   70853 3657239



Some of symbol names from the boost library are really, really, really 
long. Here is an example that is over 2K in length:

08145572 W 
_ZN5boost6spirit12subrule_listINS0_14subrule_parserILi0ENS0_8sequenceINS3_INS3_INS3_INS0_6actionINS0_7subruleILi1ENS0_15closure_contextI17rcs_admin_closureEEEEN7phoenix5actorINSA_9compositeINSA_9assign_opENSB_INSC_INSA_21member_var_ptr_actionI9rcs_admin8rcs_fileEENSB_INSA_14closure_memberILi0ENSA_7closureISG_NSA_5nil_tESK_EEEEEESK_SK_SK_EEEENSB_INSA_8argumentILi0EEEEESK_SK_EEEEEENS0_11alternativeINS0_11kleene_starINS4_INS5_ILi3ENS6_I17rcs_delta_closureEEEENSB_INSC_INSA_26member_function_ptr_actionIvSG_RK9rcs_deltaSK_SK_SK_EESN_SS_SK_SK_EEEEEEEENS3_INSX_INS5_ILi12ENS0_14parser_contextEEEEES19_EEEEEENSW_INS5_ILi2ENS6_I16rcs_desc_closureEEEENS3_IS1C_S1I_EEEEEENSX_INS4_INS5_ILi4ENS6_I21rcs_deltatext_closureEEEENSB_INSC_INS11_IvSG_RK13rcs_deltatextSK_SK_SK_EESN_SS_SK_SK_EEEEEEEEEENS0_10contiguousINSX_INS0_5chlitIcEEEEEEEENS6_I16rcs_file_closureEEEENS1_INS2_ILi10ES20_S1A_EENS1_INS2_ILi14ENS0_8positiveINSW_INS0_12digit_parserES20_EEEES1A_EENS1_INS2_ILi13ENS0_10differenceINS0_12graph

_parserENS0_5chsetIcEEEES1A_EENS1_INS2_ILi12ENS3_INS3_INS5_ILi8ENS6_I14string_closureEEEENSX_INS5_ILi9ES2K_EEEEEENS5_ILi10ES1A_EEEES1A_EENS1_INS2_ILi9ENSW_INSW_INSW_IS2L_NS5_ILi7ES2K_EEEENS5_ILi5ES2K_EEEES2P_EES2K_EENS1_INS2_ILi4ENS3_INS3_INS3_INS3_INS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsS1P_EENSB_INSI_ILi0ENSJ_IS1P_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEENS0_6strlitIPKcEEEENS4_IS2U_S35_EEEENSW_IS3A_NS3_IS1C_S3A_EEEEEES3C_EES1N_EENS1_INS2_ILi2ENS3_IS3A_NS4_IS2U_NSB_INSC_ISD_NSB_INSI_ILi0ENSJ_ISsSK_SK_EEEEEESS_SK_SK_EEEEEEEES1H_EENS1_INS2_ILi3ENS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsS12_EENSB_INSI_ILi0ENSJ_IS12_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEES3A_EES3Z_EES2P_EES3A_EENS4_IS2L_S3Y_EEEES2P_EES3A_EENS0_8optionalIS44_EEEES2P_EES3A_EENSX_INS4_IS2S_NSB_INSC_INS11_IvS12_RKSsSK_SK_SK_EES3U_SS_SK_SK_EEEEEEEEEES2P_EES3A_EENS48_IS3Z_EEEES2P_EESZ_EENS1_INS2_ILi1ENS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS3_INS

3_INS3_INS3_INS3_INS3_INS3_IS3A_NS4_IS2S_NSB_INSC_ISD_NSB_INSC_INSE_ISsSF_EENSB_INSI_ILi0ENSJ_ISF_SK_SK_EEEEEESK_SK_SK_EEEESS_SK_SK_EEEEEEEES2P_EENS48_IS51_EEEES3A_EENSX_INS4_IS2L_S4Y_EEEEEES2P_EES3A_EENSX_INS3_INS3_INS5_ILi6ES2K_EENS5_ILi11ES1A_EEEES2S_EEEEEES2P_EES3A_EENSX_INS3_INS3_IS2L_S5B_EES2S_EEEEEES2P_EENS48_INS3_IS3A_S2P_EEEEEENS48_INS3_INS3_IS3A_NS4_IS2U_S4Y_EEEES2P_EEEEEES5T_EES8_EENS0_5nil_tEEEEEEEEEEEEEEEEEEEEC1ERKS26_RKS66_

Doing a little computation on the wc of the nm output to figure out the 
average length of each symbol. Account for the 8 character address and 1 
character symbol type information:

(3657239 char - (8 char address + 1 char type) * 23623)/23623= 146 char


-Will



More information about the Gdb-patches mailing list