This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH V3] Add negative repeat count to 'x' command

From: Toshihito Kikuchi <k dot toshihito at yahoo dot de>
To: Pedro Alves <palves at redhat dot com>, gdb-patches at sourceware dot org
Date: Mon, 23 May 2016 10:22:03 -0700
Subject: Re: [PATCH V3] Add negative repeat count to 'x' command
Authentication-results: sourceware.org; auth=none
References: <1461866539-24965-1-git-send-email-k dot toshihito at yahoo dot de> <c2d74e3a-d818-2af2-613e-57ce6bec97f9 at redhat dot com>
Hi Pedro,

Thank you for your kind comments.  Most of them are addressed in next
iteration.  Please find my comments inlined.

>> This change adds support for specifying a negative repeat count to
>> all the formats of the 'x' command to examine memory backward.
>>
>> A new testsuite 'examine-backward' is added to cover this new feature.
>> It mainly focuses on the format 'x/-#s' and 'x/-#i'.  For disassemble
>> testing, only i386 and amd64 are covered.  Other platforms are skipped.
> 
> This comment is no longer up to date, right?
> 
> I see you still left the x86/x64-64-specific tests in place.  Now that
> we have the generic disassembly test, is there value in keeping them?
> I'm thinking that they're probably making the test fail on e.g., x32.

I kept them for two reasons: 1) disassemble more than one instructions
backward and 2) disassemble backward from an address off the procedure
boundary.  I added coverage for 1) to the generic test.  It's difficult
to add 2) to the generic test, but I think it's not a big deal.  I
remove the x86/x64-64-specific tests.

>> @@ -785,6 +790,208 @@ print_address_demangle (const struct value_print_options *opts,
>>  }
>>  
>>  
>> +/* Rewind assembly from ADDR and return the start address after COUNT lines
>> +   are disassembled. 
> 
> FWIW, I found this sentence incredibly confusing.  What's "rewind"?  
> And, the START address AFTER lines are disassembled?  What's that?
> If you've already disassembled, how can it be the start?
> 
> I'd find this clearer:
> 
> /* Find the address of the instruction that is COUNT instructions
>    before the instruction at ADDR.  */ 
> 
> 
>>  We use line number information to accurately locate
>> +   procedure boundaries.
> 
> I'd say:
> 
>   Since some architectures have variable-length instructions, we 
>   can't just simply subtract COUNT*INSN_LEN from ADDR.  Instead, we
>   use line number information to locate the nearest known instruction
>   boundary, and disassemble forward from there.
> 
>> +   If we go out of the symbol range during disassembling, we return the lowest
>> +   address we've got so far and set the number of lines read to LINES_READ.  */
> 
> Is it really _lines_ read though?  It took me a while to grok this all, but I
> think it's actually number of instructions we were able to count/read backwards?

Totally agreed with your comments.  I update the leading comment and the
parameter names accordingly.

> 
>> +
>> +static CORE_ADDR
>> +find_instruction_backward (struct gdbarch *gdbarch, CORE_ADDR addr,
>> +                           int count, int *lines_read)
> 
> A variable named "count" in this function is confusing, since it's
> it's not immediately clear whether it counts instructions, or lines.
> insn_count / line_count ?  But, see above.  Actually, having now
> read the code and come back here, I think it's "lines_read" that is
> the most misleading parameter name.
> 
> 
>> +{
>> +  /* The vector PCS is used to store procedure boundary addressese within
>> +     a pc range.  */
> 
> typo "addressese".

Updated in next iteration.

> 
>> +  CORE_ADDR loop_start, loop_end, p;
>> +  VEC (CORE_ADDR) *pcs = NULL;
>> +  struct symtab_and_line sal;
>> +  struct cleanup *cleanup = make_cleanup (VEC_cleanup (CORE_ADDR), &pcs);
>> +
>> +  *lines_read = 0;
>> +  loop_start = loop_end = addr;
>> +
>> +  /* In each iteration of the outer loop, we get a pc range which ends before
>> +     LOOP_START, then we count and store each procedure boundary from the start
>> +     address of the range to the end.
> 
> I don't think we store "store each procedure boundary".  Seems like what
> we store in the VEC is every instruction address of the range currently
> iterated?

Updated in next iteration.

> 
>> +     If the number of instructions counted reaches COUNT, return one of stored
>> +     addresses which is located COUNT instructions back from ADDR.
>> +     If COUNT is not reached, we subtract the number of counted instructions
>> +     from COUNT, and go to the next iteration.  */
> 
>> +  do
>> +    {
>> +      VEC_truncate (CORE_ADDR, pcs, 0);
>> +      sal = find_pc_sect_line (loop_start, NULL, 1);
>> +      if (sal.line <= 0)
>> +        {
>> +          /* We reach here when line info is not available.  In this case,
>> +             we print a message and just exit the loop.  The return value
>> +             is calculated after the loop.  */
>> +          printf_filtered (_("No line number information available "
>> +                             "for address "));
>> +          wrap_here ("  ");
>> +          print_address (gdbarch, loop_start - 1, gdb_stdout);
>> +          printf_filtered ("\n");
>> +          break;
>> +        }
>> +
>> +      loop_end = loop_start;
>> +      loop_start = sal.pc;
>> +
>> +      /* This loop pushes procedure boundaries in a range from
>> +         LOOP_START to LOOP_END.  */
>> +      for (p = loop_start; p < loop_end;)
>> +        {
>> +          VEC_safe_push (CORE_ADDR, pcs, p);
>> +          p += gdb_insn_length (gdbarch, p);
>> +        }
>> +
>> +      count -= VEC_length (CORE_ADDR, pcs);
>> +      *lines_read += VEC_length (CORE_ADDR, pcs);
> 
> How come the VEC's length, which holds instruction addresses,
> is added to lines_read?  Shouldn't this be just incrementing
> lines_read by one, or some such?  OK, I think the problem
> is actually that you're calling this "lines", maybe thinking
> of disassembly-lines-displayed, which is confusing with
> "source lines" / sals.   I think you should rename this to
> something else.  
> 
> /me goes update the comments above.

You're 100% right.  When I chose 'lines_read', I thought lines of
disassembly code.  Updated the parameter names.

> 
> I wonder if this algoritm will still work with optimized code
> and inlined functions, and PCs / lines jumping around.

This algorithm does not depend on source line numbers.  Line info is
used to get only a start address (symtab_and_line::pc).  It's not
impacted by optimization.

> 
>> +    }
>> +  while (count > 0);
>> +
>> +  /* After the loop, the vector PCS has procedure boundaries in the last
>> +     range we processed, and COUNT has a negative value.  We return an address
>> +     at the index of -COUNT in the vector for the reason below.
>> +     Assuming that the processed ranges are {4000, 4001, 4005},
> 
> So IIRC, "{4000, 4001, 4005}" is the addresses covered by a line, not really
> a range, right?  A range with 3 elements would be odd.

Added a more readable example here.

> 
>>  {4009, 400c},
>> +     and the original COUNT is 4, we want to return 4001.  When we reach here,
>> +     COUNT is set to -1 because it is subtracted by 2 and 3.  The value 4001
>> +     is located at the index 1 of the last range, which can be calculated by
>> +     VEC_length - (COUNT + VEC_length) = -COUNT.
>> +     The case when the length of PCS is 0 means that we reach an area where
>> +     line info is not available.  In such a case, we return LOOP_START which
>> +     is the lowest procedure boundary with line info.  */
>> +  p = VEC_length (CORE_ADDR, pcs) > 0
>> +    ? VEC_index (CORE_ADDR, pcs, -count)
>> +    : loop_start;
>> +
>> +  /* LINES_READ includes entire ranges.  Need to exclude a beginning part,
>> +     that is {4000} in the example above.  */
>> +  if (count < 0)
>> +    *lines_read += count;
>> +
>> +  do_cleanups (cleanup);
>> +  return p;
>> +}
>> +
>> +/* This is a backward version of partial_memory_read in valprint.c.
> 
> Should we put this next to partial_memory_read ?
> 
> Thought OTOH, both should probably be moved to target.c, next to
> read_memory_robust or some such.  (partial_memory_read could probably
> be eliminated).

I remove the first line of the comment as it's not so helpful.

I want to keep read_memory_backward defined with 'static' as long as
it's not referred from other files.  Moving read_memory_backward and
partial_memory_read would be better to be made by a different
refactoring change.

> 
>> +   Read LEN bytes of target memory at address MEMADDR, placing the
>> +   results in GDB's memory at MYADDR.  Returns a count of the bytes
>> +   actually read.  */
>> +
>> +static int
>> +read_memory_backward (struct gdbarch *gdbarch,
>> +                      CORE_ADDR memaddr, gdb_byte *myaddr, int len)
>> +{
>> +  int errcode;
>> +  int nread;      /* Number of bytes actually read.  */
>> +
>> +  /* First try a complete read.  */
>> +  errcode = target_read_memory (memaddr, myaddr, len);
>> +  if (errcode == 0)
>> +    {
>> +      /* Got it all.  */
>> +      nread = len;
>> +    }
>> +  else
>> +    {
>> +      /* Loop, reading one byte at a time until we get as much as we can.  */
>> +      for (errcode = 0, nread = 0, memaddr += len, myaddr += len;
>> +           len > 0 && errcode == 0;
>> +           ++nread, --len)
>> +        {
>> +          errcode = target_read_memory (--memaddr, --myaddr, 1);
>> +        }
>> +      if (errcode != 0)
>> +        {
>> +          --nread;
>> +          printf_filtered (_("Cannot access memory at address %s\n"),
>> +                           paddress (gdbarch, memaddr));
>> +        }
> 
> I find the putting initialization of non-iterator variables 
> in the 'for' initializer statement to be a lot more confusing than
> necessary.  Likewise for the errcode condition and the need
> to rewind nread.  I'd rather rewrite it like this:
> 
>       memaddr += len;
>       myaddr += len;
>       for (nread = 0; nread < len; nread++)
>         {
>           errcode = target_read_memory (--memaddr, --myaddr, 1);
>           if (errcode != 0)
>             {
>               /* The read was unsuccessful, so adjust count.  */
>               printf_filtered (_("Cannot access memory at address %s\n"),
>                                paddress (gdbarch, memaddr));
>               break;
>             }
>         }
> 
> Seems simpler and much clearer to me.

Updated.

> 
> Also, we should probably be taking into account the addressable
> unit size here.

Do you mean adding a new parameter like unit_size to pass it to
target_read_memory?

> 
>> +    }
>> +  return nread;
>> +}
>> +
>> +/* This is an integer version of decimal_is_zero in dfp.c.
> 
> The implementation is completely different.  I'd just drop this
> comment.

Updated.

>> @@ -868,6 +1077,38 @@ do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr)
>>  
>>    get_formatted_print_options (&opts, format);
>>  
>> +  if (count < 0)
>> +    {
>> +      /* This is the negative repeat count case.
>> +         We rewind the address based on the given repeat count and format,
>> +         then examine memory from there in forward direction.  */
>> +
>> +      count = -count;
>> +      if (format == 'i')
>> +        {
>> +          next_address = find_instruction_backward (gdbarch, addr, count,
>> +                                                    &count);
>> +        }
>> +      else if (format == 's')
>> +        {
>> +          next_address = find_string_backward (gdbarch, addr, count,
>> +                                               val_type->length,
>> +                                               &opts, &count);
>> +        }
>> +      else
>> +        {
>> +          next_address = addr - count * val_type->length;
>> +        }
>> +
>> +      /* The following call to print_formatted updates next_address in every
>> +         iteration. In backward case, we store the start address here
> 
> Double-space after period.

Updated.

>> +
>> +static __attribute__((noinline)) void
>> +zero_mapping_test ()\
>> +{
>> +}
>> +
>> +int main ()
> 
> int 
> main (void)

Updated.

> 
> 
>> +{
>> +    /* To make the address zero valid, we call mmap for addr = 0 with
>> +       MAP_FIXED.  This syscall is expected to fail under normal configuration
>> +       because that space is protected by CONFIG_DEFAULT_MMAP_MIN_ADDR.
>> +       This is for non-MMU platforms.   */
> 
> What's the high-level idea behind doing this?  If it's a non-MMU platform,
> then I don't think you need this?
> 
> It's be nice to have the test not depend on mmap, or make it still work
> when mmap is not available, as then it'd expose the test to a bunch of
> bare metal ports / architectures gdb supports.

I added mmap to see the behavior around the address zero because I was
not sure about the expected result.  Given that non-MMU does not require
mmap to access the address zero, I remove mmap.

> 
>> +    const int page_size = 4096;
>> +    void *p = mmap (0, page_size,
>> +        PROT_READ|PROT_WRITE,
>> +        MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,
>> +        -1, 0);
> 
> Formatting.  PROT_READ should align under the 0.
> 
>> +
>> +    /* Call zero_mapping_test regardless of the result from mmap to make sure
>> +       a breakpoint at zero_mapping_test is always hit in a test scenario.  */
>> +    zero_mapping_test();
> 
> Space before parens.
> 
>> +
>> +    if (p != MAP_FAILED)
>> +        munmap (p, page_size);
> 
> Indentation doesn't look right.

These are removed.

>> +standard_testfile
>> +if { [prepare_for_testing "failed to prepare for examine-backward" \
>> +        ${testfile} ${srcfile}] } {
>> +    return -1
>> +}
>> +
>> +if ![runto_main] {
>> +    untested "could not run to main"
>> +    return -1
>> +}
>> +
>> +proc get_first_mapped_address {} {
>> +    global gdb_prompt
>> +
>> +    set addr "0"
>> +    gdb_test_multiple "info proc mappings" "info proc mappings" {
>> +        -re "objfile\[\r\n\t \]+(0x\[0-9a-fA-F\]+).*\[\r\n\]*$gdb_prompt $" {
>> +            set addr $expect_out(1,string)
>> +        }
> 
> Should probably add:
> 
>         -re "$gdb_prompt $" {
>         }
> 
> to avoid a FAIL on targets that don't support "info proc mappings".
> Call UNSUPPORTED here or in the caller, so there's something logged
> in gdb.sum.

Updated.

> 
>> +    }
>> +    return ${addr}
>> +}
>> +
>> +with_test_prefix "invalid format" {
>> +    gdb_test "x/- 10xb main" "Invalid number \"10xb\"\." \
>> +        "a whitespace after a leading hyphen"
>> +    gdb_test "x/--10xb main" "Invalid number \"10xb\"\." \
>> +        "double hyphen"
>> +    gdb_test "x/-a10xb main" "Invalid number \"10xb\"\." \
>> +        "an alphabet after a leading hyphen"
>> +    gdb_test_no_output "x/-0i main" "zero with backward disassemble"
>> +    gdb_test_no_output "x/-0sh main" "zero with backward examine string"
>> +}
>> +
>> +with_test_prefix "memory page boundary" {
>> +    set boundary [get_first_mapped_address]
>> +    if {![is_address_zero_readable] && $boundary != 0} {
>> +        gdb_test_no_output "set print elements 0"
>> +        gdb_test_sequence "x/3s ${boundary}" "take 3 strings forward" {
>> +            "0x"
>> +            "0x"
>> +            "0x"
>> +        }
>> +        gdb_test_sequence "x/-4s" "take 4 strings backward" {
>> +            "Cannot access memory at address 0x"
>> +            "0x"
>> +            "0x"
>> +            "0x"
>> +        }
>> +        gdb_test_sequence "x/3s ${boundary}" "take 3 strings forward again" {
>> +            "0x"
>> +            "0x"
>> +            "0x"
>> +        }
>> +        gdb_test_sequence "x/-3s" "take 3 strings backward" {
>> +            "Cannot access memory at address 0x"
>> +            "0x"
>> +            "0x"
>> +            "0x"
>> +        }
>> +    }
>> +}
>> +
>> +with_test_prefix "address zero boundary" {
>> +    gdb_test "b zero_mapping_test" "Breakpoint.*" \
>> +        "set breakpoint at zero_mapping_test"
>> +    gdb_test "c" "zero_mapping_test.*" "run until zero_mapping_test"
>> +    if {[is_address_zero_readable]} {
>> +        gdb_test "x/3x 0" \
>> +            "0x\[0-9a-f\]+:\t0x\[0-9a-f\]+\t0x\[0-9a-f\]+\t0x\[0-9a-f\]+" \
>> +            "examine 4 bytes forward"
>> +        gdb_test "x/-6x" \
>> +            "0x\[0-9a-f\]+:\tCannot access memory at address 0x\[0-9a-f\]+" \
>> +            "examine 6 bytes backward"
>> +    }
>> +    gdb_test "x/-10x 0" \
>> +        "0x\[0-9a-f\]+:\tCannot access memory at address 0x\[0-9a-f\]+" \
>> +        "examine 10 bytes backward from NULL"
> 
> So in the last test, the address wraps around.  On a non-MMU this will
> actually read memory from a high address.  Is this what we want?  Or do
> we want to have GDB to never try to go before 0?  Likewise end of addr space.

This was my misunderstanding about non-MMU.  I change the expected
result, expecting a line starts with '0x\[0-9a-f\]+fd'.


Thanks,
Toshihito
Follow-Ups:
- Re: [PATCH V3] Add negative repeat count to 'x' command
  - From: Pedro Alves
References:
- Re: [PATCH V3] Add negative repeat count to 'x' command
  - From: Pedro Alves
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]