This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH V3] Add negative repeat count to 'x' command
- From: Toshihito Kikuchi <k dot toshihito at yahoo dot de>
- To: Pedro Alves <palves at redhat dot com>, gdb-patches at sourceware dot org
- Date: Mon, 23 May 2016 10:22:03 -0700
- Subject: Re: [PATCH V3] Add negative repeat count to 'x' command
- Authentication-results: sourceware.org; auth=none
- References: <1461866539-24965-1-git-send-email-k dot toshihito at yahoo dot de> <c2d74e3a-d818-2af2-613e-57ce6bec97f9 at redhat dot com>
Hi Pedro,
Thank you for your kind comments. Most of them are addressed in next
iteration. Please find my comments inlined.
>> This change adds support for specifying a negative repeat count to
>> all the formats of the 'x' command to examine memory backward.
>>
>> A new testsuite 'examine-backward' is added to cover this new feature.
>> It mainly focuses on the format 'x/-#s' and 'x/-#i'. For disassemble
>> testing, only i386 and amd64 are covered. Other platforms are skipped.
>
> This comment is no longer up to date, right?
>
> I see you still left the x86/x64-64-specific tests in place. Now that
> we have the generic disassembly test, is there value in keeping them?
> I'm thinking that they're probably making the test fail on e.g., x32.
I kept them for two reasons: 1) disassemble more than one instructions
backward and 2) disassemble backward from an address off the procedure
boundary. I added coverage for 1) to the generic test. It's difficult
to add 2) to the generic test, but I think it's not a big deal. I
remove the x86/x64-64-specific tests.
>> @@ -785,6 +790,208 @@ print_address_demangle (const struct value_print_options *opts,
>> }
>>
>>
>> +/* Rewind assembly from ADDR and return the start address after COUNT lines
>> + are disassembled.
>
> FWIW, I found this sentence incredibly confusing. What's "rewind"?
> And, the START address AFTER lines are disassembled? What's that?
> If you've already disassembled, how can it be the start?
>
> I'd find this clearer:
>
> /* Find the address of the instruction that is COUNT instructions
> before the instruction at ADDR. */
>
>
>> We use line number information to accurately locate
>> + procedure boundaries.
>
> I'd say:
>
> Since some architectures have variable-length instructions, we
> can't just simply subtract COUNT*INSN_LEN from ADDR. Instead, we
> use line number information to locate the nearest known instruction
> boundary, and disassemble forward from there.
>
>> + If we go out of the symbol range during disassembling, we return the lowest
>> + address we've got so far and set the number of lines read to LINES_READ. */
>
> Is it really _lines_ read though? It took me a while to grok this all, but I
> think it's actually number of instructions we were able to count/read backwards?
Totally agreed with your comments. I update the leading comment and the
parameter names accordingly.
>
>> +
>> +static CORE_ADDR
>> +find_instruction_backward (struct gdbarch *gdbarch, CORE_ADDR addr,
>> + int count, int *lines_read)
>
> A variable named "count" in this function is confusing, since it's
> it's not immediately clear whether it counts instructions, or lines.
> insn_count / line_count ? But, see above. Actually, having now
> read the code and come back here, I think it's "lines_read" that is
> the most misleading parameter name.
>
>
>> +{
>> + /* The vector PCS is used to store procedure boundary addressese within
>> + a pc range. */
>
> typo "addressese".
Updated in next iteration.
>
>> + CORE_ADDR loop_start, loop_end, p;
>> + VEC (CORE_ADDR) *pcs = NULL;
>> + struct symtab_and_line sal;
>> + struct cleanup *cleanup = make_cleanup (VEC_cleanup (CORE_ADDR), &pcs);
>> +
>> + *lines_read = 0;
>> + loop_start = loop_end = addr;
>> +
>> + /* In each iteration of the outer loop, we get a pc range which ends before
>> + LOOP_START, then we count and store each procedure boundary from the start
>> + address of the range to the end.
>
> I don't think we store "store each procedure boundary". Seems like what
> we store in the VEC is every instruction address of the range currently
> iterated?
Updated in next iteration.
>
>> + If the number of instructions counted reaches COUNT, return one of stored
>> + addresses which is located COUNT instructions back from ADDR.
>> + If COUNT is not reached, we subtract the number of counted instructions
>> + from COUNT, and go to the next iteration. */
>
>> + do
>> + {
>> + VEC_truncate (CORE_ADDR, pcs, 0);
>> + sal = find_pc_sect_line (loop_start, NULL, 1);
>> + if (sal.line <= 0)
>> + {
>> + /* We reach here when line info is not available. In this case,
>> + we print a message and just exit the loop. The return value
>> + is calculated after the loop. */
>> + printf_filtered (_("No line number information available "
>> + "for address "));
>> + wrap_here (" ");
>> + print_address (gdbarch, loop_start - 1, gdb_stdout);
>> + printf_filtered ("\n");
>> + break;
>> + }
>> +
>> + loop_end = loop_start;
>> + loop_start = sal.pc;
>> +
>> + /* This loop pushes procedure boundaries in a range from
>> + LOOP_START to LOOP_END. */
>> + for (p = loop_start; p < loop_end;)
>> + {
>> + VEC_safe_push (CORE_ADDR, pcs, p);
>> + p += gdb_insn_length (gdbarch, p);
>> + }
>> +
>> + count -= VEC_length (CORE_ADDR, pcs);
>> + *lines_read += VEC_length (CORE_ADDR, pcs);
>
> How come the VEC's length, which holds instruction addresses,
> is added to lines_read? Shouldn't this be just incrementing
> lines_read by one, or some such? OK, I think the problem
> is actually that you're calling this "lines", maybe thinking
> of disassembly-lines-displayed, which is confusing with
> "source lines" / sals. I think you should rename this to
> something else.
>
> /me goes update the comments above.
You're 100% right. When I chose 'lines_read', I thought lines of
disassembly code. Updated the parameter names.
>
> I wonder if this algoritm will still work with optimized code
> and inlined functions, and PCs / lines jumping around.
This algorithm does not depend on source line numbers. Line info is
used to get only a start address (symtab_and_line::pc). It's not
impacted by optimization.
>
>> + }
>> + while (count > 0);
>> +
>> + /* After the loop, the vector PCS has procedure boundaries in the last
>> + range we processed, and COUNT has a negative value. We return an address
>> + at the index of -COUNT in the vector for the reason below.
>> + Assuming that the processed ranges are {4000, 4001, 4005},
>
> So IIRC, "{4000, 4001, 4005}" is the addresses covered by a line, not really
> a range, right? A range with 3 elements would be odd.
Added a more readable example here.
>
>> {4009, 400c},
>> + and the original COUNT is 4, we want to return 4001. When we reach here,
>> + COUNT is set to -1 because it is subtracted by 2 and 3. The value 4001
>> + is located at the index 1 of the last range, which can be calculated by
>> + VEC_length - (COUNT + VEC_length) = -COUNT.
>> + The case when the length of PCS is 0 means that we reach an area where
>> + line info is not available. In such a case, we return LOOP_START which
>> + is the lowest procedure boundary with line info. */
>> + p = VEC_length (CORE_ADDR, pcs) > 0
>> + ? VEC_index (CORE_ADDR, pcs, -count)
>> + : loop_start;
>> +
>> + /* LINES_READ includes entire ranges. Need to exclude a beginning part,
>> + that is {4000} in the example above. */
>> + if (count < 0)
>> + *lines_read += count;
>> +
>> + do_cleanups (cleanup);
>> + return p;
>> +}
>> +
>> +/* This is a backward version of partial_memory_read in valprint.c.
>
> Should we put this next to partial_memory_read ?
>
> Thought OTOH, both should probably be moved to target.c, next to
> read_memory_robust or some such. (partial_memory_read could probably
> be eliminated).
I remove the first line of the comment as it's not so helpful.
I want to keep read_memory_backward defined with 'static' as long as
it's not referred from other files. Moving read_memory_backward and
partial_memory_read would be better to be made by a different
refactoring change.
>
>> + Read LEN bytes of target memory at address MEMADDR, placing the
>> + results in GDB's memory at MYADDR. Returns a count of the bytes
>> + actually read. */
>> +
>> +static int
>> +read_memory_backward (struct gdbarch *gdbarch,
>> + CORE_ADDR memaddr, gdb_byte *myaddr, int len)
>> +{
>> + int errcode;
>> + int nread; /* Number of bytes actually read. */
>> +
>> + /* First try a complete read. */
>> + errcode = target_read_memory (memaddr, myaddr, len);
>> + if (errcode == 0)
>> + {
>> + /* Got it all. */
>> + nread = len;
>> + }
>> + else
>> + {
>> + /* Loop, reading one byte at a time until we get as much as we can. */
>> + for (errcode = 0, nread = 0, memaddr += len, myaddr += len;
>> + len > 0 && errcode == 0;
>> + ++nread, --len)
>> + {
>> + errcode = target_read_memory (--memaddr, --myaddr, 1);
>> + }
>> + if (errcode != 0)
>> + {
>> + --nread;
>> + printf_filtered (_("Cannot access memory at address %s\n"),
>> + paddress (gdbarch, memaddr));
>> + }
>
> I find the putting initialization of non-iterator variables
> in the 'for' initializer statement to be a lot more confusing than
> necessary. Likewise for the errcode condition and the need
> to rewind nread. I'd rather rewrite it like this:
>
> memaddr += len;
> myaddr += len;
> for (nread = 0; nread < len; nread++)
> {
> errcode = target_read_memory (--memaddr, --myaddr, 1);
> if (errcode != 0)
> {
> /* The read was unsuccessful, so adjust count. */
> printf_filtered (_("Cannot access memory at address %s\n"),
> paddress (gdbarch, memaddr));
> break;
> }
> }
>
> Seems simpler and much clearer to me.
Updated.
>
> Also, we should probably be taking into account the addressable
> unit size here.
Do you mean adding a new parameter like unit_size to pass it to
target_read_memory?
>
>> + }
>> + return nread;
>> +}
>> +
>> +/* This is an integer version of decimal_is_zero in dfp.c.
>
> The implementation is completely different. I'd just drop this
> comment.
Updated.
>> @@ -868,6 +1077,38 @@ do_examine (struct format_data fmt, struct gdbarch *gdbarch, CORE_ADDR addr)
>>
>> get_formatted_print_options (&opts, format);
>>
>> + if (count < 0)
>> + {
>> + /* This is the negative repeat count case.
>> + We rewind the address based on the given repeat count and format,
>> + then examine memory from there in forward direction. */
>> +
>> + count = -count;
>> + if (format == 'i')
>> + {
>> + next_address = find_instruction_backward (gdbarch, addr, count,
>> + &count);
>> + }
>> + else if (format == 's')
>> + {
>> + next_address = find_string_backward (gdbarch, addr, count,
>> + val_type->length,
>> + &opts, &count);
>> + }
>> + else
>> + {
>> + next_address = addr - count * val_type->length;
>> + }
>> +
>> + /* The following call to print_formatted updates next_address in every
>> + iteration. In backward case, we store the start address here
>
> Double-space after period.
Updated.
>> +
>> +static __attribute__((noinline)) void
>> +zero_mapping_test ()\
>> +{
>> +}
>> +
>> +int main ()
>
> int
> main (void)
Updated.
>
>
>> +{
>> + /* To make the address zero valid, we call mmap for addr = 0 with
>> + MAP_FIXED. This syscall is expected to fail under normal configuration
>> + because that space is protected by CONFIG_DEFAULT_MMAP_MIN_ADDR.
>> + This is for non-MMU platforms. */
>
> What's the high-level idea behind doing this? If it's a non-MMU platform,
> then I don't think you need this?
>
> It's be nice to have the test not depend on mmap, or make it still work
> when mmap is not available, as then it'd expose the test to a bunch of
> bare metal ports / architectures gdb supports.
I added mmap to see the behavior around the address zero because I was
not sure about the expected result. Given that non-MMU does not require
mmap to access the address zero, I remove mmap.
>
>> + const int page_size = 4096;
>> + void *p = mmap (0, page_size,
>> + PROT_READ|PROT_WRITE,
>> + MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,
>> + -1, 0);
>
> Formatting. PROT_READ should align under the 0.
>
>> +
>> + /* Call zero_mapping_test regardless of the result from mmap to make sure
>> + a breakpoint at zero_mapping_test is always hit in a test scenario. */
>> + zero_mapping_test();
>
> Space before parens.
>
>> +
>> + if (p != MAP_FAILED)
>> + munmap (p, page_size);
>
> Indentation doesn't look right.
These are removed.
>> +standard_testfile
>> +if { [prepare_for_testing "failed to prepare for examine-backward" \
>> + ${testfile} ${srcfile}] } {
>> + return -1
>> +}
>> +
>> +if ![runto_main] {
>> + untested "could not run to main"
>> + return -1
>> +}
>> +
>> +proc get_first_mapped_address {} {
>> + global gdb_prompt
>> +
>> + set addr "0"
>> + gdb_test_multiple "info proc mappings" "info proc mappings" {
>> + -re "objfile\[\r\n\t \]+(0x\[0-9a-fA-F\]+).*\[\r\n\]*$gdb_prompt $" {
>> + set addr $expect_out(1,string)
>> + }
>
> Should probably add:
>
> -re "$gdb_prompt $" {
> }
>
> to avoid a FAIL on targets that don't support "info proc mappings".
> Call UNSUPPORTED here or in the caller, so there's something logged
> in gdb.sum.
Updated.
>
>> + }
>> + return ${addr}
>> +}
>> +
>> +with_test_prefix "invalid format" {
>> + gdb_test "x/- 10xb main" "Invalid number \"10xb\"\." \
>> + "a whitespace after a leading hyphen"
>> + gdb_test "x/--10xb main" "Invalid number \"10xb\"\." \
>> + "double hyphen"
>> + gdb_test "x/-a10xb main" "Invalid number \"10xb\"\." \
>> + "an alphabet after a leading hyphen"
>> + gdb_test_no_output "x/-0i main" "zero with backward disassemble"
>> + gdb_test_no_output "x/-0sh main" "zero with backward examine string"
>> +}
>> +
>> +with_test_prefix "memory page boundary" {
>> + set boundary [get_first_mapped_address]
>> + if {![is_address_zero_readable] && $boundary != 0} {
>> + gdb_test_no_output "set print elements 0"
>> + gdb_test_sequence "x/3s ${boundary}" "take 3 strings forward" {
>> + "0x"
>> + "0x"
>> + "0x"
>> + }
>> + gdb_test_sequence "x/-4s" "take 4 strings backward" {
>> + "Cannot access memory at address 0x"
>> + "0x"
>> + "0x"
>> + "0x"
>> + }
>> + gdb_test_sequence "x/3s ${boundary}" "take 3 strings forward again" {
>> + "0x"
>> + "0x"
>> + "0x"
>> + }
>> + gdb_test_sequence "x/-3s" "take 3 strings backward" {
>> + "Cannot access memory at address 0x"
>> + "0x"
>> + "0x"
>> + "0x"
>> + }
>> + }
>> +}
>> +
>> +with_test_prefix "address zero boundary" {
>> + gdb_test "b zero_mapping_test" "Breakpoint.*" \
>> + "set breakpoint at zero_mapping_test"
>> + gdb_test "c" "zero_mapping_test.*" "run until zero_mapping_test"
>> + if {[is_address_zero_readable]} {
>> + gdb_test "x/3x 0" \
>> + "0x\[0-9a-f\]+:\t0x\[0-9a-f\]+\t0x\[0-9a-f\]+\t0x\[0-9a-f\]+" \
>> + "examine 4 bytes forward"
>> + gdb_test "x/-6x" \
>> + "0x\[0-9a-f\]+:\tCannot access memory at address 0x\[0-9a-f\]+" \
>> + "examine 6 bytes backward"
>> + }
>> + gdb_test "x/-10x 0" \
>> + "0x\[0-9a-f\]+:\tCannot access memory at address 0x\[0-9a-f\]+" \
>> + "examine 10 bytes backward from NULL"
>
> So in the last test, the address wraps around. On a non-MMU this will
> actually read memory from a high address. Is this what we want? Or do
> we want to have GDB to never try to go before 0? Likewise end of addr space.
This was my misunderstanding about non-MMU. I change the expected
result, expecting a line starts with '0x\[0-9a-f\]+fd'.
Thanks,
Toshihito