[RFC/RFA] Cleaner handling of character entities ?

Fri May 5 18:24:00 GMT 2006

Hello,

We are currently working on transitioning from GCC 3.4 to GCC 4.1,
and we found an issue with character types. For a program like this:

        procedure P is
           A : Character := 'A';
        begin
           A := 'B'; --  START
        end;

The debugging info produced for the character is:

        .uleb128 0x2    # (DIE (0x62) DW_TAG_base_type)
        .long   .LASF0  # DW_AT_name: "__unknown__"
        .byte   0x1     # DW_AT_byte_size
        .byte   0x7     # DW_AT_encoding

The DW_AT_name used to be "character", and ada-lang.c was using
the name to identify character types.

After a small discussion with the company engineer for GCC debug
info production, he agreed that the name is wrong, and should be
changed back.

However, he also suggested that the debugger should avoid relying
on the type name, and use the encoding if available. In the case
above, 0x7 is DW_ATE_unsigned but it should be DW_ATE_unsigned_char
(0x8), so he will change that.

I looked at the GDB side and came up with a few changes here and
there that implement his suggestion. I discovered that we rely
quite a bit on the type name to identify characters, and I guessed
that it was historical because of debugging format shortcomings
(with stabs for instance).

I think the attached patch improves the situation in terms of making
things cleaner in the case of dwarf2, without impacting targets that
still use older debugging format like stabs.

2006-05-05  Joel Brobecker  <brobecker@adacore.com>

        * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
        for char and unsigned char types.
        * ada-lang.c (ada_is_character_type): Always return true if
        the type code is TYPE_CODE_CHAR.
        * c-valprint.c (c_val_print): Print arrays whose element type
        code is TYPE_CODE_CHAR as strings.

Tested on x86-linux, with GCC 3.4 (dwarf2, stabs+), GCC 4.1 (dwarf2).
No regression.

What do you guys think? Wouldn't that be a step forward?

Thanks,
-- 
Joel
-------------- next part --------------
Index: dwarf2read.c
===================================================================
RCS file: /cvs/src/src/gdb/dwarf2read.c,v
retrieving revision 1.194
diff -u -p -r1.194 dwarf2read.c

--- dwarf2read.c	21 Apr 2006 20:26:07 -0000	1.194
+++ dwarf2read.c	5 May 2006 17:31:53 -0000
@@ -4728,10 +4728,15 @@ read_base_type (struct die_info *die, st
 	  code = TYPE_CODE_FLT;
 	  break;
 	case DW_ATE_signed:
+	  break;
 	case DW_ATE_signed_char:
+          code = TYPE_CODE_CHAR;
 	  break;
 	case DW_ATE_unsigned:
+	  type_flags |= TYPE_FLAG_UNSIGNED;
+	  break;
 	case DW_ATE_unsigned_char:
+          code = TYPE_CODE_CHAR;
 	  type_flags |= TYPE_FLAG_UNSIGNED;
 	  break;
 	default:
Index: ada-lang.c
===================================================================
RCS file: /cvs/src/src/gdb/ada-lang.c,v
retrieving revision 1.84
diff -u -p -r1.84 ada-lang.c
--- ada-lang.c	12 Jan 2006 08:36:29 -0000	1.84
+++ ada-lang.c	5 May 2006 17:30:55 -0000
@@ -7145,10 +7145,15 @@ int
 ada_is_character_type (struct type *type)
 {
   const char *name = ada_type_name (type);
+
+  /* If the type code says it's a character, then assume it really is,
+     and don't check any further.  */
+  if (TYPE_CODE (type) == TYPE_CODE_CHAR)
+    return 1;
+  
   return
     name != NULL
-    && (TYPE_CODE (type) == TYPE_CODE_CHAR
-        || TYPE_CODE (type) == TYPE_CODE_INT
+    && (TYPE_CODE (type) == TYPE_CODE_INT
         || TYPE_CODE (type) == TYPE_CODE_RANGE)
     && (strcmp (name, "character") == 0
         || strcmp (name, "wide_character") == 0
Index: c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.39
diff -u -p -r1.39 c-valprint.c
--- c-valprint.c	18 Jan 2006 21:24:19 -0000	1.39
+++ c-valprint.c	5 May 2006 17:31:16 -0000
@@ -96,9 +96,8 @@ c_val_print (struct type *type, const gd
 	    }
 	  /* For an array of chars, print with string syntax.  */
 	  if (eltlen == 1 &&
-	      ((TYPE_CODE (elttype) == TYPE_CODE_INT)
-	       || ((current_language->la_language == language_m2)
-		   && (TYPE_CODE (elttype) == TYPE_CODE_CHAR)))
+	      (TYPE_CODE (elttype) == TYPE_CODE_INT
+	       || TYPE_CODE (elttype) == TYPE_CODE_CHAR)
 	      && (format == 0 || format == 's'))
 	    {
 	      /* If requested, look for the first null char and only print
@@ -192,7 +191,8 @@ c_val_print (struct type *type, const gd
 	  /* FIXME: need to handle wchar_t here... */
 
 	  if (TYPE_LENGTH (elttype) == 1
-	      && TYPE_CODE (elttype) == TYPE_CODE_INT
+	      && (TYPE_CODE (elttype) == TYPE_CODE_INT
+                  || TYPE_CODE (elttype) == TYPE_CODE_CHAR)
 	      && (format == 0 || format == 's')
 	      && addr != 0)
 	    {