[PATCHv3] gdb: building inferior strings from within GDB
Andrew Burgess
aburgess@redhat.com
Mon Jun 5 12:26:35 GMT 2023
Simon Marchi <simark@simark.ca> writes:
> On 5/24/23 10:10, Andrew Burgess wrote:
>> History Of This Patch
>> =====================
>>
>> This commit aims to address PR gdb/21699. There have now been a
>> couple of attempts to fix this issue. Simon originally posted two
>> patches back in 2021:
>>
>> https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html
>> https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html
>>
>> Before Pedro then posted a version of his own:
>>
>> https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html
>>
>> After this the conversation halted. Then in 2023 I (Andrew) also took
>> a look at this bug and posted two versions:
>>
>> https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html
>> https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html
>>
>> The approach taken in my first patch was pretty similar to what Simon
>> originally posted back in 2021. My second attempt was only a slight
>> variation on the first.
>>
>> Pedro then pointed out his older patch, and so we arrive at this
>> patch. The GDB changes here are mostly Pedro's work, but updated by
>> me (Andrew), any mistakes are mine.
>>
>> The tests here are a combinations of everyone's work, and the commit
>> message is new, but copies bits from everyone's earlier work.
>>
>> Problem Description
>> ===================
>>
>> Bug PR gdb/21699 makes the observation that using $_as_string with
>> GDB's printf can cause GDB to print unexpected data from the
>> inferior. The reproducer is pretty simple:
>>
>> #include <stddef.h>
>> static char arena[100];
>>
>> /* Override malloc() so value_coerce_to_target() gets a known
>> pointer, and we know we"ll see an error if $_as_string() gives
>> a string that isn't null terminated. */
>> void
>> *malloc (size_t size)
>> {
>> memset (arena, 'x', sizeof (arena));
>> if (size > sizeof (arena))
>> return NULL;
>> return arena;
>> }
>>
>> int
>> main ()
>> {
>> return 0;
>> }
>>
>> And then in a GDB session:
>>
>> $ gdb -q test
>> Reading symbols from /tmp/test...
>> (gdb) start
>> Temporary breakpoint 1 at 0x4004c8: file test.c, line 17.
>> Starting program: /tmp/test
>>
>> Temporary breakpoint 1, main () at test.c:17
>> 17 return 0;
>> (gdb) printf "%s\n", $_as_string("hello")
>> "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> (gdb) quit
>>
>> The problem above is caused by how value_cstring is used within
>> py-value.c, but once we understand the issue then it turns out that
>> value_cstring is used in an unexpected way in many places within GDB.
>>
>> Within py-value.c we have a null-terminated C-style string. We then
>> pass a pointer to this string, along with the length of this
>> string (so not including the null-character) to value_cstring.
>>
>> In value_cstring GDB allocates an array value of the given character
>> type, and copies in requested number of characters. However
>> value_cstring does not add a null-character of its own. This means
>> that the value created by calling value_cstring is only
>> null-terminated if the null-character is included in the passed in
>> length. In py-value.c this is not the case, and indeed, in most uses
>> of value_cstring, this is not the case.
>>
>> When GDB tries to print one of these strings the value contents are
>> pushed to the inferior, and then read back as a C-style string, that
>> is, GDB reads inferior memory until it finds a null-terminator. For
>> the py-value.c case, no null-terminator is pushed into the inferior,
>> so GDB will continue reading inferior memory until a null-terminator
>> is found, with unpredictable results.
>>
>> Patch Description
>> =================
>>
>> The first thing this patch does is better define what the arguments
>> for the two function value_cstring and value_string should represent.
>> The comments in the header file are updated to describe whether the
>> length argument should, or should not, include a null-character.
>> Also, the data argument is changed to type gdb_byte. The functions as
>> they currently exist will handle wide-characters, in which case more
>> than one 'char' would be needed for each character. As such using
>> gdb_byte seems to make more sense.
>>
>> To avoid adding casts throughout GDB, I've also added an overload that
>> still takes a 'char *', but asserts that the character type being used
>> is of size '1'.
>>
>> The value_cstring function is now responsible for adding a null
>> character at the end of the string value it creates.
>>
>> However, once we start looking at how value_cstring is used, we
>> realise there's another, related, problem. Not every language's
>> strings are null terminated. Fortran and Ada strings, for example,
>> are just an array of characters, GDB already has the function
>> value_string which can be used to create such values.
>>
>> Consider this example using current GDB:
>>
>> (gdb) set language ada
>> (gdb) p $_gdb_setting("arch")
>> $1 = (97, 117, 116, 111)
>> (gdb) ptype $
>> type = array (1 .. 4) of char
>> (gdb) p $_gdb_maint_setting("test-settings string")
>> $2 = (0)
>> (gdb) ptype $
>> type = array (1 .. 1) of char
>>
>> This shows two problems, first, the $_gdb_setting and
>> $_gdb_maint_setting functions are calling value_cstring using the
>> builtin_char character, rather than a language appropriate type. In
>> the first call, the 'arch' case, the value_cstring call doesn't
>> include the null character, so the returned array only contains the
>> expected characters. But, in the $_gdb_maint_setting example we do
>> end up including the null-character, even though this is not expected
>> for Ada strings.
>>
>> This commit adds a new language method language_defn::value_string,
>> this function takes a pointer and length and creates a language
>> appropriate value that represents the string. For C, C++, etc this
>> will be a null-terminated string (by calling value_cstring), and for
>> Fortran and Ada this can be a bounded array of characters with no null
>> terminator. Additionally, this new language_defn::value_string
>> function is responsible for selecting a language appropriate character
>> type.
>>
>> After this commit the only calls to value_cstring are from the C
>> expression evaluator and from the default language_defn::value_string.
>>
>> And the only calls to value_string are from Fortan, Ada, and ObjectC
>> related code.
>>
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699
>>
>> Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
>> Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
>> Co-Authored-By: Pedro Alves <pedro@palves.net>
>
> This LGTM:
>
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
>
> Just one small question:
>
>> diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c
>> index b7b65303a0b..3fa833d596c 100644
>> --- a/gdb/cli/cli-cmds.c
>> +++ b/gdb/cli/cli-cmds.c
>> @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch)
>> }
>>
>> if (len > 0)
>> - return value_cstring (value, len,
>> - builtin_type (gdbarch)->builtin_char);
>> + return current_language->value_string (gdbarch, value, len);
>> else
>> - return value_cstring ("", 1,
>> - builtin_type (gdbarch)->builtin_char);
>> + return current_language->value_string (gdbarch, "", 0);
>
> Do we still need the two calls, or can we just do with the first (even
> when len is 0)? Same idea for str_value_from_setting.
You are right. I merged these two calls, and the other two in
str_value_from_setting, and pushed this patch.
Thanks,
Andrew
More information about the Gdb-patches
mailing list