Bug 31282 - [gdb/fortran] logical type doesn't match documentation of TYPE_CODE_BOOL
Summary: [gdb/fortran] logical type doesn't match documentation of TYPE_CODE_BOOL
Status: NEW
Alias: None
Product: gdb
Classification: Unclassified
Component: fortran (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-23 20:27 UTC by Tom de Vries
Modified: 2024-01-25 14:57 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tom de Vries 2024-01-23 20:27:01 UTC
Consider the fortran type logical.

Here ( https://gcc.gnu.org/onlinedocs/gfortran/Internal-representation-of-LOGICAL-variables.html ) we read:
...
The Fortran standard does not specify how variables of LOGICAL type are represented, beyond requiring that LOGICAL variables of default kind have the same storage size as default INTEGER and REAL variables. The GNU Fortran internal representation is as follows.

A LOGICAL(KIND=N) variable is represented as an INTEGER(KIND=N) variable, however, with only two permissible values: 1 for .TRUE. and 0 for .FALSE.. Any other integer value results in undefined behavior. 
...

OK, that's clear.  So how does this look in dwarf? Like so:
...
 <1><106>: Abbrev Number: 1 (DW_TAG_base_type)
    <107>   DW_AT_byte_size   : 4
    <108>   DW_AT_encoding    : 2       (boolean)
    <109>   DW_AT_name        : (indirect string, offset: 0x6): logical(kind=4)
...

So, is there a clue to tell what the values of true/false are, other than looking at the producer string and searching for "GNU Fortran"?

DW_AT_encoding ATE_boolean has the description of "true or false", so that doesn't seem to help much.

However, internally in gdb, this type is mapped onto:
...
/* * Boolean type.  0 is false, 1 is true, and other values are
   non-boolean (e.g. FORTRAN "logical" used as unsigned int).  */
OP (TYPE_CODE_BOOL)
...

That doesn't seem correct.  If .TRUE. is, say 2, then using TYPE_CODE_BOOL tells us to view it as non-boolean.

AFAICT this doesn't cause any trouble because there's special handling to deal with this discrepancy, for instance here in f-valprint.c:
...
          /* The Fortran standard doesn't specify how logical types are                  
             represented.  Different compilers use different non zero                    
             values to represent logical true.  */
          if (longval == 0)
            gdb_puts (f_decorations.false_name, stream);
          else
            gdb_puts (f_decorations.true_name, stream);
...
Comment 1 Tom Tromey 2024-01-24 01:10:43 UTC
Updating the comment seems fine.
It's also worth considering how hardbool should work.
Comment 2 Tom de Vries 2024-01-24 11:11:47 UTC
Hmm, here:
...
/* The set of Fortran booleans.  These are matched case insensitively.  */
static const struct f77_boolean_val boolean_values[]  =
{
  { ".true.", 1 },
  { ".false.", 0 }
};
...
we assume 0/1.

Not great when debugging an ifort-compiled program (FWIW, this is gdb.fortran/logical.f90):
...
(gdb) p l
$9 = .TRUE.
(gdb) p l == .TRUE.
$10 = .FALSE.
(gdb) p /x l
$11 = 0xffffffff
(gdb) p /x .TRUE.
$12 = 0x1
...

OK, so is there any indication in the dwarf to show that .TRUE. == -1?  Let's see:
...
 <1><153>: Abbrev Number: 5 (DW_TAG_base_type)
    <154>   DW_AT_byte_size   : 4
    <155>   DW_AT_encoding    : 2       (boolean)
    <156>   DW_AT_name        : (indirect string, offset: 0x2a5): LOGICAL(4)
...

Nope, that's the same as gnu fortran:
...
 <1><1a3>: Abbrev Number: 4 (DW_TAG_base_type)
    <1a4>   DW_AT_byte_size   : 4
    <1a5>   DW_AT_encoding    : 2       (boolean)
    <1a6>   DW_AT_name        : (indirect string, offset: 0x1d3): logical(kind=4)
...
Comment 3 Tom de Vries 2024-01-24 12:16:30 UTC
(In reply to Tom Tromey from comment #1)
> It's also worth considering how hardbool should work.

Ah, interesting, thanks.  I found some explanation here ( https://gcc.gnu.org/onlinedocs/gnat_rm/Hardened-Booleans.html ).

So let's try with this program:
...
with Text_IO; use Text_IO;

procedure hello is
   type HBool is new Boolean;
   for HBool use (0, 1);
   for HBool'Size use 8;
   A : HBool := True;
begin
   Put_Line("Hello world!");
end hello;
...

This dwarf is generated, it's a subrange_type:
...
 <2><168e>: Abbrev Number: 3 (DW_TAG_subrange_type)
    <168f>   DW_AT_lower_bound : 0
    <1690>   DW_AT_upper_bound : 1
    <1691>   DW_AT_name        : hello__hbool
    <1695>   DW_AT_type        : <0x16a7>
 <1><16a7>: Abbrev Number: 5 (DW_TAG_base_type)
    <16a8>   DW_AT_byte_size   : 1
    <16a9>   DW_AT_encoding    : 2      (boolean)
    <16aa>   DW_AT_name        : hello__hboolB
    <16ae>   DW_AT_artificial  : 1
...

Now let's try this change:
...
-   for HBool use (0, 1);
+   for HBool use (0, 2);
...

Also a subrange_type, but with a different basis type.
...
 <2><168a>: Abbrev Number: 3 (DW_TAG_enumeration_type)
    <168b>   DW_AT_name        : hello__hboolB
    <168f>   DW_AT_encoding    : 7      (unsigned)
    <1690>   DW_AT_byte_size   : 1
    <1694>   DW_AT_artificial  : 1
 <3><1698>: Abbrev Number: 4 (DW_TAG_enumerator)
    <1699>   DW_AT_name        : false
    <169d>   DW_AT_const_value : 0
 <3><169e>: Abbrev Number: 4 (DW_TAG_enumerator)
    <169f>   DW_AT_name        : true
    <16a3>   DW_AT_const_value : 2
 <2><16a5>: Abbrev Number: 5 (DW_TAG_subrange_type)
    <16a6>   DW_AT_lower_bound : 0
    <16a7>   DW_AT_upper_bound : 2
    <16a8>   DW_AT_name        : hello__hbool
    <16ac>   DW_AT_type        : <0x168a>
...

Likewise for:
...
-   for HBool use (0, 1);
+   for HBool use (1, 2);
...
which sort of suggests that DW_ATE_boolean means 0/1, but that is contradicted by the use of DW_ATE_boolean in ifort.
Comment 4 Tom de Vries 2024-01-24 13:49:37 UTC
(In reply to Tom Tromey from comment #1)
> Updating the comment seems fine.

In gdbtypes.c I found:
...
std::optional<LONGEST>
get_discrete_low_bound (struct type *type)
{
  ...
    case TYPE_CODE_BOOL:
      return 0;
...
and:
...
std::optional<LONGEST>
get_discrete_high_bound (struct type *type)
{
    case TYPE_CODE_BOOL:
      return 1;
  ...
...

So, the notion the comment describes seems to be hardcoded in the code.
Comment 5 Tom Tromey 2024-01-24 16:58:08 UTC
(In reply to Tom de Vries from comment #3)
> (In reply to Tom Tromey from comment #1)
> > It's also worth considering how hardbool should work.
> 
> Ah, interesting, thanks.  I found some explanation here (
> https://gcc.gnu.org/onlinedocs/gnat_rm/Hardened-Booleans.html ).

For Ada it probably isn't such a big issue, due to how Ada's
type system works -- the compiler is already equipped to
think of things as enums and ranges and whatnot.

However hardbool is also a C extension feature now.
Search for "hardbool" here:
https://gcc.gnu.org/gcc-14/changes.html

Anyway I think gdb is at the mercy of the compiler(s) here.
It's not unreasonable to expect them to tell gdb what
the bool values are; and if this can't be expressed in DWARF,
then a DWARF update is also required.
Comment 6 Tom de Vries 2024-01-25 12:29:30 UTC
(In reply to Tom Tromey from comment #5)
> (In reply to Tom de Vries from comment #3)
> > (In reply to Tom Tromey from comment #1)
> > > It's also worth considering how hardbool should work.
> > 
> > Ah, interesting, thanks.  I found some explanation here (
> > https://gcc.gnu.org/onlinedocs/gnat_rm/Hardened-Booleans.html ).
> 
> For Ada it probably isn't such a big issue, due to how Ada's
> type system works -- the compiler is already equipped to
> think of things as enums and ranges and whatnot.
> 
> However hardbool is also a C extension feature now.
> Search for "hardbool" here:
> https://gcc.gnu.org/gcc-14/changes.html
> 

Ah I see.  That's documented here ( https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-hardbool-type-attribute ).

I tried:
...
$ cat test.c
#define true 0x1
#define false 0x0

typedef char __attribute__ ((__hardbool__ (false, true))) hbool;

static hbool a = true;
static hbool b = false;
static hbool c = 0;
static hbool d = 1;

int
main (void)
{
  hbool x = 0;
  *(unsigned char *)&x = 0;
  
  return 0;
}
...
and got:
...
 <1><2e>: Abbrev Number: 4 (DW_TAG_typedef)
    <2f>   DW_AT_name        : hbool
    <36>   DW_AT_type        : <0x3a>
 <1><3a>: Abbrev Number: 5 (DW_TAG_enumeration_type)
    <3b>   DW_AT_encoding    : 5        (signed)
    <3c>   DW_AT_byte_size   : 1
    <3d>   DW_AT_type        : <0x52>
 <2><45>: Abbrev Number: 2 (DW_TAG_enumerator)
    <46>   DW_AT_name        : false
    <4a>   DW_AT_const_value : 0
 <2><4b>: Abbrev Number: 2 (DW_TAG_enumerator)
    <4c>   DW_AT_name        : true
    <50>   DW_AT_const_value : 1
...

> Anyway I think gdb is at the mercy of the compiler(s) here.
> It's not unreasonable to expect them to tell gdb what
> the bool values are; and if this can't be expressed in DWARF,
> then a DWARF update is also required.

I think it's possible to handle the current situation better by looking at producer strings.  And once that's done, we can present a case to dwarf-discuss showing the hoops we jump through, with the question how things should be handled without producer string magic, after which we can file PRs for compilers to generate proper info.
Comment 7 Tom Tromey 2024-01-25 14:57:42 UTC
Emitting an enum here actually seems kind of fine, however
c-exp.y will need some special handling for true/false now.
I wonder what the rules are in the C compiler for deciding
which 'true' to use.