[PATCHv3] gdb: building inferior strings from within GDB

Andrew Burgess aburgess@redhat.com
Mon Jun 5 12:26:35 GMT 2023


Simon Marchi <simark@simark.ca> writes:

> On 5/24/23 10:10, Andrew Burgess wrote:
>> History Of This Patch
>> =====================
>> 
>> This commit aims to address PR gdb/21699.  There have now been a
>> couple of attempts to fix this issue.  Simon originally posted two
>> patches back in 2021:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html
>> 
>> Before Pedro then posted a version of his own:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html
>> 
>> After this the conversation halted.  Then in 2023 I (Andrew) also took
>> a look at this bug and posted two versions:
>> 
>>   https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
>>   https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html
>> 
>> The approach taken in my first patch was pretty similar to what Simon
>> originally posted back in 2021.  My second attempt was only a slight
>> variation on the first.
>> 
>> Pedro then pointed out his older patch, and so we arrive at this
>> patch.  The GDB changes here are mostly Pedro's work, but updated by
>> me (Andrew), any mistakes are mine.
>> 
>> The tests here are a combinations of everyone's work, and the commit
>> message is new, but copies bits from everyone's earlier work.
>> 
>> Problem Description
>> ===================
>> 
>> Bug PR gdb/21699 makes the observation that using $_as_string with
>> GDB's printf can cause GDB to print unexpected data from the
>> inferior.  The reproducer is pretty simple:
>> 
>>   #include <stddef.h>
>>   static char arena[100];
>> 
>>   /* Override malloc() so value_coerce_to_target() gets a known
>>      pointer, and we know we"ll see an error if $_as_string() gives
>>      a string that isn't null terminated. */
>>   void
>>   *malloc (size_t size)
>>   {
>>       memset (arena, 'x', sizeof (arena));
>>       if (size > sizeof (arena))
>>           return NULL;
>>       return arena;
>>   }
>> 
>>   int
>>   main ()
>>   {
>>     return 0;
>>   }
>> 
>> And then in a GDB session:
>> 
>>   $ gdb -q test
>>   Reading symbols from /tmp/test...
>>   (gdb) start
>>   Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>>   Starting program: /tmp/test
>> 
>>   Temporary breakpoint 1, main () at test.c:17
>>   17        return 0;
>>   (gdb) printf "%s\n", $_as_string("hello")
>>   "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>   (gdb) quit
>> 
>> The problem above is caused by how value_cstring is used within
>> py-value.c, but once we understand the issue then it turns out that
>> value_cstring is used in an unexpected way in many places within GDB.
>> 
>> Within py-value.c we have a null-terminated C-style string.  We then
>> pass a pointer to this string, along with the length of this
>> string (so not including the null-character) to value_cstring.
>> 
>> In value_cstring GDB allocates an array value of the given character
>> type, and copies in requested number of characters.  However
>> value_cstring does not add a null-character of its own.  This means
>> that the value created by calling value_cstring is only
>> null-terminated if the null-character is included in the passed in
>> length.  In py-value.c this is not the case, and indeed, in most uses
>> of value_cstring, this is not the case.
>> 
>> When GDB tries to print one of these strings the value contents are
>> pushed to the inferior, and then read back as a C-style string, that
>> is, GDB reads inferior memory until it finds a null-terminator.  For
>> the py-value.c case, no null-terminator is pushed into the inferior,
>> so GDB will continue reading inferior memory until a null-terminator
>> is found, with unpredictable results.
>> 
>> Patch Description
>> =================
>> 
>> The first thing this patch does is better define what the arguments
>> for the two function value_cstring and value_string should represent.
>> The comments in the header file are updated to describe whether the
>> length argument should, or should not, include a null-character.
>> Also, the data argument is changed to type gdb_byte.  The functions as
>> they currently exist will handle wide-characters, in which case more
>> than one 'char' would be needed for each character.  As such using
>> gdb_byte seems to make more sense.
>> 
>> To avoid adding casts throughout GDB, I've also added an overload that
>> still takes a 'char *', but asserts that the character type being used
>> is of size '1'.
>> 
>> The value_cstring function is now responsible for adding a null
>> character at the end of the string value it creates.
>> 
>> However, once we start looking at how value_cstring is used, we
>> realise there's another, related, problem.  Not every language's
>> strings are null terminated.  Fortran and Ada strings, for example,
>> are just an array of characters, GDB already has the function
>> value_string which can be used to create such values.
>> 
>> Consider this example using current GDB:
>> 
>>   (gdb) set language ada
>>   (gdb) p $_gdb_setting("arch")
>>   $1 = (97, 117, 116, 111)
>>   (gdb) ptype $
>>   type = array (1 .. 4) of char
>>   (gdb) p $_gdb_maint_setting("test-settings string")
>>   $2 = (0)
>>   (gdb) ptype $
>>   type = array (1 .. 1) of char
>> 
>> This shows two problems, first, the $_gdb_setting and
>> $_gdb_maint_setting functions are calling value_cstring using the
>> builtin_char character, rather than a language appropriate type.  In
>> the first call, the 'arch' case, the value_cstring call doesn't
>> include the null character, so the returned array only contains the
>> expected characters.  But, in the $_gdb_maint_setting example we do
>> end up including the null-character, even though this is not expected
>> for Ada strings.
>> 
>> This commit adds a new language method language_defn::value_string,
>> this function takes a pointer and length and creates a language
>> appropriate value that represents the string.  For C, C++, etc this
>> will be a null-terminated string (by calling value_cstring), and for
>> Fortran and Ada this can be a bounded array of characters with no null
>> terminator.  Additionally, this new language_defn::value_string
>> function is responsible for selecting a language appropriate character
>> type.
>> 
>> After this commit the only calls to value_cstring are from the C
>> expression evaluator and from the default language_defn::value_string.
>> 
>> And the only calls to value_string are from Fortan, Ada, and ObjectC
>> related code.
>> 
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
>> 
>> Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
>> Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
>> Co-Authored-By: Pedro Alves <pedro@palves.net>
>
> This LGTM:
>
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
>
> Just one small question:
>
>> diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
>> index b7b65303a0b..3fa833d596c 100644
>> --- a/gdb/cli/cli-cmds.c
>> +++ b/gdb/cli/cli-cmds.c
>> @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
>>  	  }
>>  
>>  	if (len > 0)
>> -	  return value_cstring (value, len,
>> -				builtin_type (gdbarch)->builtin_char);
>> +	  return current_language->value_string (gdbarch, value, len);
>>  	else
>> -	  return value_cstring ("", 1,
>> -				builtin_type (gdbarch)->builtin_char);
>> +	  return current_language->value_string (gdbarch, "", 0);
>
> Do we still need the two calls, or can we just do with the first (even
> when len is 0)?  Same idea for str_value_from_setting.

You are right.  I merged these two calls, and the other two in
str_value_from_setting, and pushed this patch.

Thanks,
Andrew



More information about the Gdb-patches mailing list