[RFA, doc RFA] Avoid calling gdb_realpath if basenames are different

Doug Evans dje@google.com
Fri Nov 11 00:57:00 GMT 2011


On Sat, Nov 5, 2011 at 11:30 PM, Doug Evans <dje@google.com> wrote:
> Hi.
> This patch has been brought up before (by others).
> E.g., http://sourceware.org/ml/gdb-patches/2010-04/msg00466.html
> I'm hoping we can get this in now.
> We're paying a real and significant cost for what is mostly a
> theoretical concern.
> [E.g., How often is one source file referred to by the user using a basename
> that is different than what's recorded in the debug info?]
>
> If people are concerned about breaking someone's usage,
> we could default basenames-may-differ to true in 7.4,
> with a warning that it will be set to false in 7.5 (or some such).
> [We could leave the default set to true, especially if someone knew
> of at least some minimally common usage this would break.
> I'd hate to otherwise penalize the vast majority of users if not.]

Hi.
Ok to check in?

Note: I set the default to be the common case (speed up gdb by
assuming basenames never differ).
Let me know if you want the default changed.

Tom: I'll look at the bugs you mentioned separately.

2011-11-10  Doug Evans  <dje@google.com>

        * NEWS: Mention new parameter basenames-may-differ.
        * dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
        ! basenames_may_differ.
        * psymtab.c (lookup_partial_symtab): Ditto.
        * symtab.c (lookup_symtab): Ditto.
        (basenames_may_differ): New global.
        (_initialize_symtab): New parameter basenames-may-differ.
        * symtab.h (basenames_may_differ): Declare.

        doc/
        * gdb.texinfo (Files): Document basenames-may-differ.
-------------- next part --------------
2011-11-10  Doug Evans  <dje@google.com>

	* NEWS: Mention new parameter basenames-may-differ.
	* dwarf2read.c (dw2_lookup_symtab): Avoid calling gdb_realpath if
	! basenames_may_differ.
	* psymtab.c (lookup_partial_symtab): Ditto.
	* symtab.c (lookup_symtab): Ditto.
	(basenames_may_differ): New global.
	(_initialize_symtab): New parameter basenames-may-differ.
	* symtab.h (basenames_may_differ): Declare.

	doc/
	* gdb.texinfo (Files): Document basenames-may-differ.

Index: NEWS
===================================================================
RCS file: /cvs/src/src/gdb/NEWS,v
retrieving revision 1.464
diff -u -p -r1.464 NEWS
--- NEWS	2 Nov 2011 23:44:19 -0000	1.464
+++ NEWS	10 Nov 2011 23:49:26 -0000
@@ -150,6 +150,20 @@ show debug entry-values
   Control display of debugging info for determining frame argument values at
   function entry and virtual tail call frames.
 
+set basenames-may-differ
+show basenames-may-differ
+  Set whether a source file may have multiple base names.
+  A "base name" is the name of a file with the directory part removed.
+  Example: The base name of "/home/user/hello.c" is "hello.c".
+  When doing file name based lookups, gdb will canonicalize file names
+  (e.g., expand symlinks) before comparing them, which is an expensive
+  operation.
+  If set, gdb will not assume a file is known by one base name, and thus
+  it cannot optimize file name comparisions by skipping the canonicalization
+  step if the base names are different.
+  If not set, all source files must be known by one base name,
+  and gdb will do file name comparisons more efficiently.
+
 * New remote packets
 
 QTEnable
Index: dwarf2read.c
===================================================================
RCS file: /cvs/src/src/gdb/dwarf2read.c,v
retrieving revision 1.579
diff -u -p -r1.579 dwarf2read.c
--- dwarf2read.c	10 Nov 2011 20:21:27 -0000	1.579
+++ dwarf2read.c	10 Nov 2011 23:49:26 -0000
@@ -2445,7 +2445,8 @@ dw2_lookup_symtab (struct objfile *objfi
 		   struct symtab **result)
 {
   int i;
-  int check_basename = lbasename (name) == name;
+  const char *name_basename = lbasename (name);
+  int check_basename = name_basename == name;
   struct dwarf2_per_cu_data *base_cu = NULL;
 
   dw2_setup (objfile);
@@ -2478,6 +2479,12 @@ dw2_lookup_symtab (struct objfile *objfi
 	      && FILENAME_CMP (lbasename (this_name), name) == 0)
 	    base_cu = per_cu;
 
+	  /* Before we invoke realpath, which can get expensive when many
+	     files are involved, do a quick comparison of the basenames.  */
+	  if (! basenames_may_differ
+	      && FILENAME_CMP (lbasename (this_name), name_basename) != 0)
+	    continue;
+
 	  if (full_path != NULL)
 	    {
 	      const char *this_real_name = dw2_get_real_path (objfile,
Index: psymtab.c
===================================================================
RCS file: /cvs/src/src/gdb/psymtab.c,v
retrieving revision 1.31
diff -u -p -r1.31 psymtab.c
--- psymtab.c	28 Oct 2011 17:29:37 -0000	1.31
+++ psymtab.c	10 Nov 2011 23:49:26 -0000
@@ -134,6 +134,7 @@ lookup_partial_symtab (struct objfile *o
 		       const char *full_path, const char *real_path)
 {
   struct partial_symtab *pst;
+  const char *name_basename = lbasename (name);
 
   ALL_OBJFILE_PSYMTABS_REQUIRED (objfile, pst)
   {
@@ -142,6 +143,12 @@ lookup_partial_symtab (struct objfile *o
 	return (pst);
       }
 
+    /* Before we invoke realpath, which can get expensive when many
+       files are involved, do a quick comparison of the basenames.  */
+    if (! basenames_may_differ
+	&& FILENAME_CMP (name_basename, lbasename (pst->filename)) != 0)
+      continue;
+
     /* If the user gave us an absolute path, try to find the file in
        this symtab and use its absolute path.  */
     if (full_path != NULL)
@@ -172,7 +179,7 @@ lookup_partial_symtab (struct objfile *o
 
   /* Now, search for a matching tail (only if name doesn't have any dirs).  */
 
-  if (lbasename (name) == name)
+  if (name_basename == name)
     ALL_OBJFILE_PSYMTABS_REQUIRED (objfile, pst)
     {
       if (FILENAME_CMP (lbasename (pst->filename), name) == 0)
Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.285
diff -u -p -r1.285 symtab.c
--- symtab.c	29 Oct 2011 07:26:07 -0000	1.285
+++ symtab.c	10 Nov 2011 23:49:26 -0000
@@ -112,6 +112,11 @@ void _initialize_symtab (void);
 
 /* */
 
+/* Non-zero if a file may be known by two different basenames.
+   This is the uncommon case, and significantly slows down gdb.
+   Default set to "off" to not slow down the common case.  */
+int basenames_may_differ = 0;
+
 /* Allow the user to configure the debugger behavior with respect
    to multiple-choice menus when more than one symbol matches during
    a symbol lookup.  */
@@ -155,6 +160,7 @@ lookup_symtab (const char *name)
   char *real_path = NULL;
   char *full_path = NULL;
   struct cleanup *cleanup;
+  const char* base_name = lbasename (name);
 
   cleanup = make_cleanup (null_cleanup, NULL);
 
@@ -180,6 +186,12 @@ got_symtab:
 	return s;
       }
 
+    /* Before we invoke realpath, which can get expensive when many
+       files are involved, do a quick comparison of the basenames.  */
+    if (! basenames_may_differ
+	&& FILENAME_CMP (base_name, lbasename (s->filename)) != 0)
+      continue;
+
     /* If the user gave us an absolute path, try to find the file in
        this symtab and use its absolute path.  */
 
@@ -4883,5 +4897,22 @@ Show how the debugger handles ambiguitie
 Valid values are \"ask\", \"all\", \"cancel\", and the default is \"all\"."),
                         NULL, NULL, &setlist, &showlist);
 
+  add_setshow_boolean_cmd ("basenames-may-differ", class_obscure,
+			   &basenames_may_differ, _("\
+Set whether a source file may have multiple base names."), _("\
+Show whether a source file may have multiple base names."), _("\
+A \"base name\" is the name of a file with the directory part removed.\n\
+Example: The base name of \"/home/user/hello.c\" is \"hello.c\".\n\
+When doing file name based lookups, gdb will canonicalize file names\n\
+(e.g., expand symlinks) before comparing them, which is an expensive\n\
+operation.\n\
+If set, gdb will not assume a file is known by one base name, and thus\n\
+it cannot optimize file name comparisions by skipping the canonicalization\n\
+step if the base names are different.\n\
+If not set, all source files must be known by one base name,\n\
+and gdb will do file name comparisons much more efficiently."),
+			   NULL, NULL,
+			   &setlist, &showlist);
+
   observer_attach_executable_changed (symtab_observer_executable_changed);
 }
Index: symtab.h
===================================================================
RCS file: /cvs/src/src/gdb/symtab.h,v
retrieving revision 1.191
diff -u -p -r1.191 symtab.h
--- symtab.h	10 Nov 2011 20:21:28 -0000	1.191
+++ symtab.h	10 Nov 2011 23:49:26 -0000
@@ -1306,4 +1306,6 @@ void fixup_section (struct general_symbo
 
 struct objfile *lookup_objfile_from_block (const struct block *block);
 
+extern int basenames_may_differ;
+
 #endif /* !defined(SYMTAB_H) */
Index: doc/gdb.texinfo
===================================================================
RCS file: /cvs/src/src/gdb/doc/gdb.texinfo,v
retrieving revision 1.890
diff -u -p -r1.890 gdb.texinfo
--- doc/gdb.texinfo	8 Nov 2011 21:34:18 -0000	1.890
+++ doc/gdb.texinfo	10 Nov 2011 23:49:28 -0000
@@ -15680,6 +15680,47 @@ This is the default.
 @end table
 @end table
 
+@cindex file name canonicalization
+@cindex base name differences
+When processing file names provided by the user,
+@value{GDBN} will canonicalize them and remove symbolic links.
+This ensures that @value{GDBN} will find the right file,
+even if the debug information specifies an alternate path.
+However, with large programs this canonicalization can noticeably slow
+down @value{GDBN}.  To compensate, @value{GDBN} will try to avoid
+this canonicalization wherever possible.  One way it can do so
+is by first comparing the @samp{base name} of a file.
+The @samp{base name} of a file is simply the file's name without
+any directory information.  For example, the base name of
+@file{/home/user/hello.c} is @file{hello.c}.
+By doing this @value{GDBN} can skip, for example,
+@file{/usr/include/stdio.h} without having to first canonicalize
+and then compare the directory names.
+This works great, except when the base name of a file
+can have multiple names due to symbolic links.
+For example, if @file{/home/user/bar.c} is a symbolic link to
+@file{/home/user/foo.c} then @value{GDBN} cannot just look at
+the base name of two files, it must canonicalize them, expand
+all symbolic links, and @emph{then} compare the file names
+to see if they match.
+Fortunately, having one file known by two different base names
+does not generally occur in practice.
+Should it occur, however, @value{GDBN} provides an escape hatch
+to allow this to work.
+By setting @code{basenames-may-differ} to @code{true}
+@value{GDBN} will always canonicalize file names before
+comparing them, thus ensuring that one file known by multiple
+base names are treated as the same file.
+
+@table @code
+@item set basenames-may-differ
+@kindex set basenames-may-differ
+Set whether a source file may have multiple base names.
+
+@item show basenames-may-differ
+@kindex show basenames-may-differ
+Show whether a source file may have multiple base names.
+@end table
 
 @node Separate Debug Files
 @section Debugging Information in Separate Files


More information about the Gdb-patches mailing list