[PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior)

Pedro Alves palves@redhat.com
Thu Apr 13 03:42:00 GMT 2017


On 04/12/2017 11:26 PM, Sergio Durigan Junior wrote:
> On Wednesday, April 12 2017, Pedro Alves wrote:
> 
>> On 04/12/2017 06:04 AM, Sergio Durigan Junior wrote:
>>
>>>>>>  static void
>>>>>> -breakup_args (char *scratch, char **argv)
>>>>>> +breakup_args (const std::string &scratch, std::vector<char *> &argv)
>>>>>>  {
>>>>>
>>>>> ...
>>>>>
>>>>>> +
>>>>>> +      std::string arg = scratch.substr (cur_pos, next_sep - cur_pos);
>>>>>> +
>>>>>
>>>>> This creates a temporary string (heap allocates) ...
>>>>>
>>>>>> +      argv.push_back (xstrdup (arg.c_str ()));
>>>>>
>>>>> ... and here you create yet another copy.
>>>>>
>>>>> You should be able to avoid it by using e.g., savestring:
>>>>>
>>>>>     char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos);
>>>>>     argv.push_back (arg);
>>>>
>>>> Fair enough.  I had my mind on "C++-only mode" when writing this code.
>>
>> Yup, in C++, it's good to keep unnecessary temporaries
>> and hidden heap allocations in mind.  Actually, now that I look a bit
>> deeper, I think we can avoid a "premature pessimization" here, keeping
>> the same level of clarity.
>>
>> I think it'd be good to push this patch below.  WDYT?
> 
> Hm, I don't know.  I mean, I understand where you're coming from and
> what you're trying to achieve, but somehow I thought the old code
> (especially the part that modified the argument string in place, using
> \0's as separators) was somewhat hacky.  Even though the new code uses
> more memory, it it also more readable, at least IMNSHO.  And also more
> const-correct, since we can make breakup_args_for_exec receive a const
> as the first argument.

There's nothing wrong with writing to a string in place.  A std::string
is just a _container_ of chars with convenience methods.  It's
been designed to support embedded \0s.  And it's not a problem for a
function to take a copy of a value if it needs to modify it!
const-correct refers to when functions that _don't need to
modify an argument nevertheless declare non-const reference/pointer.

Effectively you can't avoid _some_ creating _some_ new string(s). 
But instead of taking one single copy (O(1)), the current code is making
many (N) tiny copies.  Trips to the generic allocator / malloc should
be one of the first things to avoid if easy to avoid, as a principle.

The patch I had posted is exactly equivalent to the one below, except
the state/storage in this one is moved to a class.  Maybe you'd
find this clearer?  (It'd need some cleaning up for comments
further at least, but it's getting quite late here...)

>From 689a881839f9d4aeb36695676856db2eaf8e97ab Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 13 Apr 2017 04:13:22 +0100
Subject: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string
 copying

The previous patch converted the argv building from an
alloca-allocated array of non-owning arg pointers, to a std::vector of
owning pointers, which results in N string dups, with N being the
number of arguments in the vector, and then requires manually
releasing the pointers owned by the vector.

This patch makes the vector hold non-owning pointers, and avoids the
string dups, by doing one single string copy of the arguments upfront,
and replacing separators with NULL terminators in place, like we used
to.

With this, there's no need to remember to call free_vector_argv either.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <palves@redhat.com>

	* fork-child.c (breakup_args): Make 'scratch' non-const.
	Replace separators with NULL terminators in place.  Change
	type of vector.  (fork_inferior): The argument vector now
	holds non-owning pointers.  Don't strdup strings into the
	vector.  Remove free_vector_argv call.
---
 gdb/fork-child.c | 278 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 172 insertions(+), 106 deletions(-)

diff --git a/gdb/fork-child.c b/gdb/fork-child.c
index 6b7386e..91986f7 100644
--- a/gdb/fork-child.c
+++ b/gdb/fork-child.c
@@ -43,38 +43,96 @@ extern char **environ;
 
 static char *exec_wrapper;
 
-/* Break up SCRATCH into an argument vector suitable for passing to
-   execvp and store it in ARGV.  E.g., on "run a b c d" this routine
+/* Build the argument vector for execv.  */
+
+class execv_argv_builder
+{
+public:
+  /* EXEC_FILE is the file to run.  ALLARGS is a string containing the
+     arguments to the program.  If starting with a shell, SHELL_FILE
+     is the shell to run.  Otherwise, SHELL_FILE is NULL.  */
+  execv_argv_builder (const char *exec_file,
+		      const std::string &allargs,
+		      const char *shell_file);
+
+  /* Return a pointer to the built argv, in the type expected by
+     execv.  */
+  char **argv ()
+  {
+    /* It is guaranteed that the exec functions do not modify the
+       arguments, but they nevertheless expect "char **", so that's
+       the type we return.  */
+    return const_cast<char **> (&m_argv[0]);
+  }
+
+private:
+  void build_no_shell (const char *exec_file,
+		       const std::string &allargs);
+
+  void build_shell (const char *exec_file,
+		    const std::string &allargs,
+		    const char *shell_file);
+
+  /* The argument vector built.  Holds non-owning pointers.  */
+  std::vector<const char *> m_argv;
+
+  /* Storage.  Arguments in C_ARGV point inside these.  */
+
+  /* For the !shell case.  */
+  std::string m_broke_up_args;
+
+  /* For the shell case.  */
+  std::string m_shell_command;
+};
+
+/* Break up allargs into an argument vector suitable for passing to
+   execvp and store it in M_ARGV.  E.g., on "run a b c d" this routine
    would get as input the string "a b c d", and as output it would
-   fill in ARGV with the four arguments "a", "b", "c", "d".  */
+   fill in M_ARGV with the four arguments "a", "b", "c", "d".  */
 
-static void
-breakup_args (const std::string &scratch, std::vector<char *> &argv)
+void
+execv_argv_builder::build_no_shell (const char *exec_file,
+				    const std::string &allargs)
 {
-  for (size_t cur_pos = 0; cur_pos < scratch.size ();)
+  /* We're going to call execvp.  Create argument vector.  */
+
+  /* Save/work with a copy.  The pointers pushed to M_ARGV point
+     directly into M_BROKE_UP_ARGS, which is modified in place with
+     the necessary NULL terminators.  */
+  m_broke_up_args = allargs;
+
+  m_argv.push_back (exec_file);
+
+  for (size_t cur_pos = 0; cur_pos < m_broke_up_args.size ();)
     {
       /* Skip whitespace-like chars.  */
-      std::size_t pos = scratch.find_first_not_of (" \t\n", cur_pos);
+      std::size_t pos = m_broke_up_args.find_first_not_of (" \t\n", cur_pos);
 
       if (pos != std::string::npos)
 	cur_pos = pos;
 
       /* Find the position of the next separator.  */
-      std::size_t next_sep = scratch.find_first_of (" \t\n", cur_pos);
+      std::size_t next_sep = m_broke_up_args.find_first_of (" \t\n", cur_pos);
 
-      /* No separator found, which means this is the last
-	 argument.  */
       if (next_sep == std::string::npos)
-	next_sep = scratch.size ();
+	{
+	  /* No separator found, which means this is the last
+	     argument.  */
+	  next_sep = m_broke_up_args.size ();
+	}
+      else
+	{
+	  /* Replace the separator with a terminator.  */
+	  m_broke_up_args[next_sep++] = '\0';
+	}
 
-      char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos);
-      argv.push_back (arg);
+      m_argv.push_back (&m_broke_up_args[cur_pos]);
 
       cur_pos = next_sep;
     }
 
   /* NULL-terminate the vector.  */
-  argv.push_back (NULL);
+  m_argv.push_back (NULL);
 }
 
 /* When executing a command under the given shell, return non-zero if
@@ -101,6 +159,101 @@ escape_bang_in_quoted_argument (const char *shell_file)
   return 0;
 }
 
+execv_argv_builder::execv_argv_builder (const char *exec_file,
+					const std::string &allargs,
+					const char *shell_file)
+{
+  if (shell_file == NULL)
+    build_no_shell (exec_file, allargs);
+  else
+    build_shell (exec_file, allargs, shell_file);
+}
+
+void
+execv_argv_builder::build_shell (const char *exec_file,
+				 const std::string &allargs,
+				 const char *shell_file)
+{
+  /* We're going to call a shell.  */
+  const char *p;
+  int need_to_quote;
+  const int escape_bang = escape_bang_in_quoted_argument (shell_file);
+
+  m_shell_command = "exec ";
+
+  /* Add any exec wrapper.  That may be a program name with arguments,
+     so the user must handle quoting.  */
+  if (exec_wrapper)
+    {
+      m_shell_command += exec_wrapper;
+      m_shell_command += ' ';
+    }
+
+  /* Now add exec_file, quoting as necessary.  */
+
+  /* Quoting in this style is said to work with all shells.  But csh
+     on IRIX 4.0.1 can't deal with it.  So we only quote it if we need
+     to.  */
+  p = exec_file;
+  while (1)
+    {
+      switch (*p)
+	{
+	case '\'':
+	case '!':
+	case '"':
+	case '(':
+	case ')':
+	case '$':
+	case '&':
+	case ';':
+	case '<':
+	case '>':
+	case ' ':
+	case '\n':
+	case '\t':
+	  need_to_quote = 1;
+	  goto end_scan;
+
+	case '\0':
+	  need_to_quote = 0;
+	  goto end_scan;
+
+	default:
+	  break;
+	}
+      ++p;
+    }
+ end_scan:
+  if (need_to_quote)
+    {
+      m_shell_command += '\'';
+      for (p = exec_file; *p != '\0'; ++p)
+	{
+	  if (*p == '\'')
+	    m_shell_command += "'\\''";
+	  else if (*p == '!' && escape_bang)
+	    m_shell_command += "\\!";
+	  else
+	    m_shell_command += *p;
+	}
+      m_shell_command += '\'';
+    }
+  else
+    m_shell_command += exec_file;
+
+  m_shell_command += " " + allargs;
+
+  /* If we decided above to start up with a shell, we exec the shell,
+     "-c" says to interpret the next arg as a shell command to
+     execute, and this command is "exec <target-program> <args>".  */
+  m_argv.reserve (4);
+  m_argv.push_back (shell_file);
+  m_argv.push_back ("-c");
+  m_argv.push_back (m_shell_command.c_str ());
+  m_argv.push_back (NULL);
+}
+
 /* See inferior.h.  */
 
 void
@@ -155,7 +308,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
   static const char *exec_file;
   char **save_our_env;
   int shell = 0;
-  std::vector<char *> argv;
   const char *inferior_io_terminal = get_inferior_io_terminal ();
   struct inferior *inf;
   int i;
@@ -183,95 +335,9 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
       shell = 1;
     }
 
-  if (!shell)
-    {
-      /* We're going to call execvp.  Create argument vector.  */
-      argv.push_back (xstrdup (exec_file));
-      breakup_args (allargs, argv);
-    }
-  else
-    {
-      /* We're going to call a shell.  */
-      std::string shell_command;
-      const char *p;
-      int need_to_quote;
-      const int escape_bang = escape_bang_in_quoted_argument (shell_file);
-
-      shell_command = std::string ("exec ");
-
-      /* Add any exec wrapper.  That may be a program name with arguments, so
-	 the user must handle quoting.  */
-      if (exec_wrapper)
-	{
-	  shell_command += exec_wrapper;
-	  shell_command += ' ';
-	}
-
-      /* Now add exec_file, quoting as necessary.  */
-
-      /* Quoting in this style is said to work with all shells.  But
-         csh on IRIX 4.0.1 can't deal with it.  So we only quote it if
-         we need to.  */
-      p = exec_file;
-      while (1)
-	{
-	  switch (*p)
-	    {
-	    case '\'':
-	    case '!':
-	    case '"':
-	    case '(':
-	    case ')':
-	    case '$':
-	    case '&':
-	    case ';':
-	    case '<':
-	    case '>':
-	    case ' ':
-	    case '\n':
-	    case '\t':
-	      need_to_quote = 1;
-	      goto end_scan;
-
-	    case '\0':
-	      need_to_quote = 0;
-	      goto end_scan;
-
-	    default:
-	      break;
-	    }
-	  ++p;
-	}
-    end_scan:
-      if (need_to_quote)
-	{
-	  shell_command += '\'';
-	  for (p = exec_file; *p != '\0'; ++p)
-	    {
-	      if (*p == '\'')
-		shell_command += "'\\''";
-	      else if (*p == '!' && escape_bang)
-		shell_command += "\\!";
-	      else
-		shell_command += *p;
-	    }
-	  shell_command += '\'';
-	}
-      else
-	shell_command += exec_file;
-
-      shell_command += " " + allargs;
-
-      /* If we decided above to start up with a shell, we exec the
-	 shell, "-c" says to interpret the next arg as a shell command
-	 to execute, and this command is "exec <target-program>
-	 <args>".  We xstrdup all the strings here because they will
-	 be free'd later in the code.  */
-      argv.push_back (xstrdup (shell_file));
-      argv.push_back (xstrdup ("-c"));
-      argv.push_back (xstrdup (shell_command.c_str ()));
-      argv.push_back (NULL);
-    }
+  /* Build the argument vector.  */
+  execv_argv_builder argv_builder (exec_file, allargs,
+				   shell ? shell_file : NULL);
 
   /* Retain a copy of our environment variables, since the child will
      replace the value of environ and if we're vforked, we have to
@@ -376,6 +442,8 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
          path to find $SHELL.  Rich Pixley says so, and I agree.  */
       environ = env;
 
+      char **argv = argv_builder.argv ();
+
       if (exec_fun != NULL)
         (*exec_fun) (argv[0], &argv[0], env);
       else
@@ -393,8 +461,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs,
       _exit (0177);
     }
 
-  free_vector_argv (argv);
-
   /* Restore our environment in case a vforked child clob'd it.  */
   environ = save_our_env;
 
-- 
2.5.5




More information about the Gdb-patches mailing list