[PATCH v3] Add name_of_main and language_of_main to the DWARF index
Tom de Vries
tdevries@suse.de
Mon Aug 14 07:31:31 GMT 2023
On 8/11/23 20:21, Matheus Branco Borella wrote:
> This should hopefully be the final version. Has a more descriptive entry in the
> NEWS file, but is otherwise the same as v2. I believe the misunderstanding with
> my contributor status should have also been sorted out?
>
Hi,
As per Eli's last comment in this thread, that's indeed the case.
> ---
> This patch adds a new section to the DWARF index containing the name
> and the language of the main function symbol, gathered from
> `cooked_index::get_main`, if available. Currently, for lack of a better name,
> this section is called the "shortcut table". The way this name is both saved and
> applied upon an index being loaded in mirrors how it is done in
> `cooked_index_functions`, more specifically, the full name of the main function
> symbol is saved and `set_objfile_main_name` is used to apply it after it is
> loaded.
>
> The main use case for this patch is in improving startup times when dealing with
> large binaries. Currently, when an index is used, GDB has to expand symtabs
> until it finds out what the language of the main function symbol is. For some
> large executables, this may take a considerable amount of time to complete,
> slowing down startup. This patch bypasses that operation by having both the name
> and language of the main function symbol be provided ahead of time by the index.
>
> In my testing (a binary with about 1.8GB worth of DWARF data) this change brings
> startup time down from about 34 seconds to about 1.5 seconds.
>
I've reviewed the patch, and found a few nits.
There are two spots that look unintentional to me, one adding and one
removing an empty line. You might want to remove those.
I found this bit:
...
+ lang = dwarf_lang_to_enum_language (attr->constant_value (0));
+ dw_lang = (dwarf_source_language)attr->constant_value (0);
...
I would reformulate as:
...
+ dw_lang = (dwarf_source_language)attr->constant_value (0);
+ lang = dwarf_lang_to_enum_language (dw_lang);
...
but I don't feel strongly about that.
I've looked over Tom Tromey's comments earlier in the thread, and I
think they all have been addresses. Furthermore, Eli already approved
the documentation part, so I'd say: approved, but wait for one week in
case somebody else has other comments.
Approved-By: Tom de Vries <tdevries@suse.de>
Thanks,
- Tom
> PR symtab/24549
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24549
> ---
> gdb/NEWS | 3 ++
> gdb/doc/gdb.texinfo | 23 +++++++++++++--
> gdb/dwarf2/index-write.c | 47 +++++++++++++++++++++++++++----
> gdb/dwarf2/read-gdb-index.c | 56 +++++++++++++++++++++++++++++++++++--
> gdb/dwarf2/read.c | 13 +++++++--
> gdb/dwarf2/read.h | 12 ++++++++
> 6 files changed, 143 insertions(+), 11 deletions(-)
>
> diff --git a/gdb/NEWS b/gdb/NEWS
> index d97e3c15a8..ac455f39f2 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -3,6 +3,9 @@
>
> *** Changes since GDB 13
>
> +* GDB index now contains information about the main function. This speeds up
> + startup when it is being used for some large binaries.
> +
> * The AArch64 'org.gnu.gdb.aarch64.pauth' Pointer Authentication feature string
> has been deprecated in favor of the 'org.gnu.gdb.aarch64.pauth_v2' feature
> string.
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index d1059e0cb7..3b2fdcd19e 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -49093,13 +49093,14 @@ unless otherwise noted:
>
> @enumerate
> @item
> -The version number, currently 8. Versions 1, 2 and 3 are obsolete.
> +The version number, currently 9. Versions 1, 2 and 3 are obsolete.
> Version 4 uses a different hashing function from versions 5 and 6.
> Version 6 includes symbols for inlined functions, whereas versions 4
> and 5 do not. Version 7 adds attributes to the CU indices in the
> symbol table. Version 8 specifies that symbols from DWARF type units
> (@samp{DW_TAG_type_unit}) refer to the type unit's symbol table and not the
> -compilation unit (@samp{DW_TAG_comp_unit}) using the type.
> +compilation unit (@samp{DW_TAG_comp_unit}) using the type. Version 9 adds
> +the name and the language of the main function to the index.
>
> @value{GDBN} will only read version 4, 5, or 6 indices
> by specifying @code{set use-deprecated-index-sections on}.
> @@ -49120,6 +49121,9 @@ The offset, from the start of the file, of the address area.
> @item
> The offset, from the start of the file, of the symbol table.
>
> +@item
> +The offset, from the start of the file, of the shortcut table.
> +
> @item
> The offset, from the start of the file, of the constant pool.
> @end enumerate
> @@ -49196,6 +49200,21 @@ don't currently have a simple description of the canonicalization
> algorithm; if you intend to create new index sections, you must read
> the code.
>
> +@item The shortcut table
> +This is a data structure with the following fields:
> +
> +@table @asis
> +@item Language of main
> +A 32-bit little-endian value indicating the language of the main function as a
> +@code{DW_LANG_} constant. This value will be zero if main function information
> +is not present.
> +
> +@item Name of main
> +An @code{offset_type} value indicating the offset of the main function's name
> +in the constant pool. This value must be ignored if the value for the language
> +of main is zero.
> +@end table
> +
> @item
> The constant pool. This is simply a bunch of bytes. It is organized
> so that alignment is correct: CU vectors are stored first, followed by
> diff --git a/gdb/dwarf2/index-write.c b/gdb/dwarf2/index-write.c
> index 62c2cc6ac7..7117a5184b 100644
> --- a/gdb/dwarf2/index-write.c
> +++ b/gdb/dwarf2/index-write.c
> @@ -1080,14 +1080,15 @@ write_gdbindex_1 (FILE *out_file,
> const data_buf &types_cu_list,
> const data_buf &addr_vec,
> const data_buf &symtab_vec,
> - const data_buf &constant_pool)
> + const data_buf &constant_pool,
> + const data_buf &shortcuts)
> {
> data_buf contents;
> - const offset_type size_of_header = 6 * sizeof (offset_type);
> + const offset_type size_of_header = 7 * sizeof (offset_type);
> offset_type total_len = size_of_header;
>
> /* The version number. */
> - contents.append_offset (8);
> + contents.append_offset (9);
>
> /* The offset of the CU list from the start of the file. */
> contents.append_offset (total_len);
> @@ -1105,6 +1106,10 @@ write_gdbindex_1 (FILE *out_file,
> contents.append_offset (total_len);
> total_len += symtab_vec.size ();
>
> + /* The offset of the shortcut table from the start of the file. */
> + contents.append_offset (total_len);
> + total_len += shortcuts.size ();
> +
> /* The offset of the constant pool from the start of the file. */
> contents.append_offset (total_len);
> total_len += constant_pool.size ();
> @@ -1116,6 +1121,7 @@ write_gdbindex_1 (FILE *out_file,
> types_cu_list.file_write (out_file);
> addr_vec.file_write (out_file);
> symtab_vec.file_write (out_file);
> + shortcuts.file_write (out_file);
> constant_pool.file_write (out_file);
>
> assert_file_size (out_file, total_len);
> @@ -1193,6 +1199,34 @@ write_cooked_index (cooked_index *table,
> }
> }
>
> +/* Write shortcut information. */
> +
> +static void
> +write_shortcuts_table (cooked_index *table, data_buf& shortcuts,
> + data_buf& cpool)
> +{
> + const auto main_info = table->get_main ();
> + size_t main_name_offset = 0;
> + dwarf_source_language dw_lang = (dwarf_source_language)0;
> +
> + if (main_info != nullptr)
> + {
> + dw_lang = main_info->per_cu->dw_lang;
> +
> + if (dw_lang != 0)
> + {
> + auto_obstack obstack;
> + const auto main_name = main_info->full_name (&obstack, true);
> +
> + main_name_offset = cpool.size ();
> + cpool.append_cstr0 (main_name);
> + }
> + }
> +
> + shortcuts.append_uint (4, BFD_ENDIAN_LITTLE, dw_lang);
> + shortcuts.append_offset (main_name_offset);
> +}
> +
> /* Write contents of a .gdb_index section for OBJFILE into OUT_FILE.
> If OBJFILE has an associated dwz file, write contents of a .gdb_index
> section for that dwz file into DWZ_OUT_FILE. If OBJFILE does not have an
> @@ -1270,11 +1304,14 @@ write_gdbindex (dwarf2_per_bfd *per_bfd, cooked_index *table,
>
> write_hash_table (&symtab, symtab_vec, constant_pool);
>
> + data_buf shortcuts;
> + write_shortcuts_table (table, shortcuts, constant_pool);
> +
> write_gdbindex_1(out_file, objfile_cu_list, types_cu_list, addr_vec,
> - symtab_vec, constant_pool);
> + symtab_vec, constant_pool, shortcuts);
>
> if (dwz_out_file != NULL)
> - write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {});
> + write_gdbindex_1 (dwz_out_file, dwz_cu_list, {}, {}, {}, {}, {});
> else
> gdb_assert (dwz_cu_list.empty ());
> }
> diff --git a/gdb/dwarf2/read-gdb-index.c b/gdb/dwarf2/read-gdb-index.c
> index 1006386cb2..f09c5ba234 100644
> --- a/gdb/dwarf2/read-gdb-index.c
> +++ b/gdb/dwarf2/read-gdb-index.c
> @@ -88,6 +88,9 @@ struct mapped_gdb_index final : public mapped_index_base
> /* A pointer to the constant pool. */
> gdb::array_view<const gdb_byte> constant_pool;
>
> + /* The shortcut table data. */
> + gdb::array_view<const gdb_byte> shortcut_table;
> +
> /* Return the index into the constant pool of the name of the IDXth
> symbol in the symbol table. */
> offset_type symbol_name_index (offset_type idx) const
> @@ -166,6 +169,7 @@ dwarf2_gdb_index::dump (struct objfile *objfile)
>
> mapped_gdb_index *index = (gdb::checked_static_cast<mapped_gdb_index *>
> (per_objfile->per_bfd->index_table.get ()));
> +
> gdb_printf (".gdb_index: version %d\n", index->version);
> gdb_printf ("\n");
> }
> @@ -583,7 +587,7 @@ to use the section anyway."),
>
> /* Indexes with higher version than the one supported by GDB may be no
> longer backward compatible. */
> - if (version > 8)
> + if (version > 9)
> return 0;
>
> map->version = version;
> @@ -608,8 +612,17 @@ to use the section anyway."),
> map->symbol_table
> = offset_view (gdb::array_view<const gdb_byte> (symbol_table,
> symbol_table_end));
> -
> ++i;
> +
> + if (version >= 9)
> + {
> + const gdb_byte *shortcut_table = addr + metadata[i];
> + const gdb_byte *shortcut_table_end = addr + metadata[i + 1];
> + map->shortcut_table
> + = gdb::array_view<const gdb_byte> (shortcut_table, shortcut_table_end);
> + ++i;
> + }
> +
> map->constant_pool = buffer.slice (metadata[i]);
>
> if (map->constant_pool.empty () && !map->symbol_table.empty ())
> @@ -763,6 +776,43 @@ create_addrmap_from_gdb_index (dwarf2_per_objfile *per_objfile,
> = new (&per_bfd->obstack) addrmap_fixed (&per_bfd->obstack, &mutable_map);
> }
>
> +/* Sets the name and language of the main function from the shortcut table. */
> +
> +static void
> +set_main_name_from_gdb_index (dwarf2_per_objfile *per_objfile,
> + mapped_gdb_index *index)
> +{
> + const auto expected_size = 4 + sizeof (offset_type);
> + if (index->shortcut_table.size () < expected_size)
> + /* The data in the section is not present, is corrupted or is in a version
> + * we don't know about. Regardless, we can't make use of it. */
> + return;
> +
> + auto ptr = index->shortcut_table.data ();
> + const auto dw_lang = extract_unsigned_integer (ptr, 4, BFD_ENDIAN_LITTLE);
> + if (dw_lang >= DW_LANG_hi_user)
> + {
> + complaint (_(".gdb_index shortcut table has invalid main language %u"),
> + (unsigned) dw_lang);
> + return;
> + }
> + if (dw_lang == 0)
> + {
> + /* Don't bother if the language for the main symbol was not known or if
> + * there was no main symbol at all when the index was built. */
> + return;
> + }
> + ptr += 4;
> +
> + const auto lang = dwarf_lang_to_enum_language (dw_lang);
> + const auto name_offset = extract_unsigned_integer (ptr,
> + sizeof (offset_type),
> + BFD_ENDIAN_LITTLE);
> + const auto name = (const char*) (index->constant_pool.data () + name_offset);
> +
> + set_objfile_main_name (per_objfile->objfile, name, (enum language) lang);
> +}
> +
> /* See read-gdb-index.h. */
>
> int
> @@ -848,6 +898,8 @@ dwarf2_read_gdb_index
>
> create_addrmap_from_gdb_index (per_objfile, map.get ());
>
> + set_main_name_from_gdb_index (per_objfile, map.get ());
> +
> per_bfd->index_table = std::move (map);
> per_bfd->quick_file_names_table =
> create_quick_file_names_table (per_bfd->all_units.size ());
> diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
> index 4828409222..89acd94c05 100644
> --- a/gdb/dwarf2/read.c
> +++ b/gdb/dwarf2/read.c
> @@ -17745,7 +17745,9 @@ leb128_size (const gdb_byte *buf)
> }
> }
>
> -static enum language
> +/* Converts DWARF language names to GDB language names. */
> +
> +enum language
> dwarf_lang_to_enum_language (unsigned int lang)
> {
> enum language language;
> @@ -21661,6 +21663,7 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die,
> /* Set the language we're debugging. */
> attr = dwarf2_attr (comp_unit_die, DW_AT_language, cu);
> enum language lang;
> + dwarf_source_language dw_lang = (dwarf_source_language)0;
> if (cu->producer != nullptr
> && strstr (cu->producer, "IBM XL C for OpenCL") != NULL)
> {
> @@ -21669,18 +21672,24 @@ prepare_one_comp_unit (struct dwarf2_cu *cu, struct die_info *comp_unit_die,
> language detection we fall back to the DW_AT_producer
> string. */
> lang = language_opencl;
> + dw_lang = DW_LANG_OpenCL;
> }
> else if (cu->producer != nullptr
> && strstr (cu->producer, "GNU Go ") != NULL)
> {
> /* Similar hack for Go. */
> lang = language_go;
> + dw_lang = DW_LANG_Go;
> }
> else if (attr != nullptr)
> - lang = dwarf_lang_to_enum_language (attr->constant_value (0));
> + {
> + lang = dwarf_lang_to_enum_language (attr->constant_value (0));
> + dw_lang = (dwarf_source_language)attr->constant_value (0);
> + }
> else
> lang = pretend_language;
>
> + cu->per_cu->dw_lang = dw_lang;
> cu->language_defn = language_def (lang);
>
> switch (comp_unit_die->tag)
> diff --git a/gdb/dwarf2/read.h b/gdb/dwarf2/read.h
> index 37023a2070..6707c400cf 100644
> --- a/gdb/dwarf2/read.h
> +++ b/gdb/dwarf2/read.h
> @@ -245,6 +245,14 @@ struct dwarf2_per_cu_data
> functions above. */
> std::vector <dwarf2_per_cu_data *> *imported_symtabs = nullptr;
>
> + /* The original DW_LANG_* value of the CU, as provided to us by
> + * DW_AT_language. It is interesting to keep this value around in cases where
> + * we can't use the values from the language enum, as the mapping to them is
> + * lossy, and, while that is usually fine, things like the index have an
> + * understandable bias towards not exposing internal GDB structures to the
> + * outside world, and so prefer to use DWARF constants in their stead. */
> + dwarf_source_language dw_lang;
> +
> /* Return true of IMPORTED_SYMTABS is empty or not yet allocated. */
> bool imported_symtabs_empty () const
> {
> @@ -755,6 +763,10 @@ struct dwarf2_per_objfile
> std::unique_ptr<dwarf2_cu>> m_dwarf2_cus;
> };
>
> +/* Converts DWARF language names to GDB language names. */
> +
> +enum language dwarf_lang_to_enum_language (unsigned int lang);
> +
> /* Get the dwarf2_per_objfile associated to OBJFILE. */
>
> dwarf2_per_objfile *get_dwarf2_per_objfile (struct objfile *objfile);
More information about the Gdb-patches
mailing list