[PATCH v3] gdb/manual: Introduce location specs

Thu May 26 13:52:45 GMT 2022

> Date: Thu, 26 May 2022 13:26:07 +0100
> Cc: gdb-patches@sourceware.org
> From: Pedro Alves <pedro@palves.net>
> 
> I like many of the suggestions you made in this direction for e.g.,
> the "list" command and others.  But not for breakpoints.  (now, the following
> few replies to your comments happen to all be in the area, but if you look further
> below, I agree with your suggestions a lot more...)
> 
> Because what you suggest above is not equivalent: what really happens is that we do set
> a breakpoint at each location {address, function name, filename, line} in the
> program that matches the spec.  Recall my inline functions example in the other thread.
> If we think only in terms of addresses, GDB would behave differently in that inlines example.
> It's not just the address that matters.  Like in geography, you can think of locations
> having coordinates (e.g., {x,y,z}), and address is just one of them.

I already asked what does a "location" entail in your eyes, in
addition to the program address to which it eventually resolves.  I
don't think we will arrive at a full agreement before we see this
spelled out and documented.

And I'm not saying that address is the only thing that matters, I'm
just saying that thinking about addresses is useful when describing
how GDB uses location specs for setting breakpoints or for other
related features.

> Like below for example, you start with a locspec like "main", and the breakpoint is set at
> ..../gdb.c:25.
> 
>  (top-gdb) b main
>  (top-gdb) info breakpoints
>  Num     Type           Disp Enb Address            What
>  3       breakpoint     keep y   <MULTIPLE>         
>  3.1                         y   0x00000000000ed06c in main(int, char**) at /home/pedro/gdb/binutils-gdb/src/gdb/gdb.c:25
> 
> All the info in the "Address" and "What" columns above define the coordinates of the location
> in the program, not just the address.  The function name is important.  The line number is important.

I'm talking about terminology, not about the aspects that are
important.  You seem to be interpreting "address" too literally, and
"location" too generally.

> The set breakpoint is then implemented by placing a breakpoint instruction at the address
> of each of the breakpoint's locations.

I would say "set breakpoint is implemented by arranging for the
program to stop at every address that matches the location
specification".  ("Placing a breakpoint instruction" is inaccurate,
because we have hardware-assisted breakpoints.)

> This is what we need to convey.  Just talking about addresses is only talking about
> the implementation detail, not what the users see, and not all that matters about each
> location.

I think users of GDB have a clear understanding about the equivalence
between source-level locations and breakpoint addresses.  If this is
an implementation detail, then it had leaked to the GDB user level
long ago.

> >> +@var{locspec} can specify a function name, a line number, an address
> >> +of an instruction, and more.  @xref{Location Specifications}, for the
> >> +various forms of @var{locspec}.  The breakpoint will stop your program
> >> +just before it executes any of the code at any of the breakpoint's
> >> +locations.
> >    ^^^^^^^^^
> > "addresses", not "locations".
> 
> I think it should be both.   "breakpoint's locations' addresses".
> I went with that.

>From the English POV, there should be only one "'s", the second one.
We could also make it less awkward (double construct state is
discouraged):

  ...at any of the location addresses of the breakpoint.

> >> +It is possible that a breakpoint corresponds to several locations in
> >> +your program.  @xref{Location Specifications}, for examples.
> > 
> > I would rephrase:
> > 
> >   It is possible that a breakpoint's location spec corresponds to
> >   several places in your program.
> 
> IMO, it just adds to confusion.  The cindex (just above) is called "multiple locations".
> There's is nothing wrong with saying "locations".  We have been saying "location" all
> these years.  Only the xref needs to change, which is what I was doing.

The main problem with "location" is that it is too general a notion,
and can mean many similar but different things.  As long as we use it
only in one sense, that is somewhat tolerable, but once we start using
it for more than one thing, and related things at that, it becomes a
source of confusion.

> >>  @value{GDBN} provides some additional commands for controlling what
> >> -happens when the @samp{break} command cannot resolve breakpoint
> >> -address specification to an address:
> >> +happens when the @samp{break} command cannot find any location that
> >> +matches the location spec (@pxref{Location Specifications}):
> > 
> > This should say "...cannot resolve the breakpoint's location spec to
> > an address".  IOW, the only problem in the original text was with
> > using "address specification", where we now want to use "location
> > specification" instead.
> 
> Yes, but it's not as correct.  If "break" didn't find any location {line number,
> function name, etc.) that matches whatever was specified in the location
> spec, then the breakpoint ends up with no breakpoint locations, and in
> that particular case, the breakpoint is called a pending breakpoint.
> 
> If GDB manages to create a breakpoint location for the breakpoint later, when
> new symbols are loaded, and _afterwards_ the code at that location goes away (due to
> shared library unload, for example), the breakpoint doesn't go back to being a pending
> breakpoint -- GDB will remember the location where the breakpoint location was set at,
> with only the _address_ of the location being unresolved, not the breakpoint itself.

I understand, but I don't see how this invalidates my comment and the
rewording suggestion.  What is described in that text refers to
something done when defining the breakpoint, so what happens
afterwards (and is not described there) cannot affect the clarity of
the text or its interpretation by the reader, who at that point wants
only to understand what happens with these specifications.

> >> -@item clear @var{location}
> >> -Delete any breakpoints set at the specified @var{location}.
> >> -@xref{Specify Location}, for the various forms of @var{location}; the
> >> -most useful ones are listed below:
> >> +@item clear @var{locspec}
> >> +Delete any breakpoints set at the locations that match @var{locspec}.
> > 
> > "Delete any breakpoints set at addresses that match the location spec
> > @var{locspec}."
> 
> No, that is ambiguous, it kind of suggests that you can only pass
> address location specs here.

It does?  It explicitly says "addresses that match", so doesn't imply
that addresses are passed.

> > "If either @var{first} or @var{last} match more than one source line
> > in the program, the @code{list} command will show the list of
> > ambiguous source lines, and will not print any source lines."
> 
> I like the first part about matching lines, but I think "show the list of ambiguous source lines"
> is worse, because it's ambiguous that way -- it ends up with "source lines" used twice to mean different
> things.  The first refers to the location in the program, the second refers to the contents
> of source code at the lines.  And, GDB prints more location coordinates than lines when ambiguous:
> 
>  file: "/home/pedro/gdb/binutils-gdb/src/gdb/gdb.c", line number: 25, symbol: "main(int, char**)"
>  file: "/home/pedro/gdb/binutils-gdb/src/gdb/unittests/basic_string_view/cons/char/1.cc", line number: 61, symbol: "selftests::string_view::cons_1::main()"
>  file: "/home/pedro/gdb/binutils-gdb/src/gdb/unittests/basic_string_view/cons/char/2.cc", line number: 40, symbol: "selftests::string_view::cons_2::main()"
>  ...

You are saying that the above are locations?  That's again different
from what we show in "info breakpoints", even under your latest patch.

> >> +A location spec serves as a blueprint, and it may match more than one
> >> +actual location in your program.  Examples of this situation are:
> >           ^^^^^^^^
> > "address".
> > 
> 
> We're defining a location spec here, so that would be an overcorrection.  There's nothing
> wrong with referring to "a location in the program".  It's even exposed to C++ users in
> the language itself: https://en.cppreference.com/w/cpp/utility/source_location
> 
> This should really say that specifications match actual locations.  The "spec"
> qualifier in "location spec" makes this unambiguous, and the point is really to
> distinguish the "spec" from the actual "thing".
> 
> It is no different from saying:
> 
>   "a cake specification serves as a blueprint, and it may match more than one
>    actual cake in the cake shop".
> 
> There is nothing ambiguous in this sentence using cakes.  And I am saying the
> exact same thing, but for locations.

This analogy doesn't really work.  "Cake" is a real concrete object:
you can eat it and report its taste and nutritional values; "cake
specification" (people actually use "recipe") is something entirely
different: it's a text recorded on some media.

By contrast, "location" is not a tangible object, it's an abstraction.
So its difference from "location specification" more subtle, and thus
harder to grasp.  Which makes the confusion easier.

The C++ URL you pointed to doesn't talk about "location" (which, as I
said above is too general, and thus problematic), it talks about
"source location", and clearly documents its attributes.  If you are
okay with using "source location" instead, I could go with it,
provided that:

  . we always use these two words, never just "location"
  . we consider "source location" as the result of fully resolving
    a "location specification", and describe it as such
  . we clearly document what a "source location" entails, i.e. what
    are its attributes

> > "You can also inquire (using @code{*@var{addr}} as the form for
> > @var{locspec}) what source line covers a particular address
> > @var{addr}:"
> 
> AFAICS, you're suggesting to add "@var{addr}".

Yes.

> I don't think that would be correct without other changes.  Try reading the
> sentence without the parenthesis, it wouldn't make sense then:
> 
>  "You can also inquire what source line covers a particular address
>  @var{addr}:"
> 
> because "addr" is not referred to in the example that follows, it is only referring 
> to the addr in the parenthesized part.  So I think that if you want to
> add "addr" here, the sentence should be tweaked further.

If we don't reference "addr", the text does not explain clearly
enough what is alluded to as "particular address".

> >>  @smallexample
> >> - -exec-until [ @var{location} ]
> >> + -exec-until [ @var{locspec} ]
> >>  @end smallexample
> >>  
> >> -Executes the inferior until the @var{location} specified in the
> >> -argument is reached.  If there is no argument, the inferior executes
> >> -until a source line greater than the current one is reached.  The
> >> -reason for stopping in this case will be @samp{location-reached}.
> >> +Executes the inferior until a location that matches @var{locspec} is
> >> +reached.
> > 
> > "Executes the inferior until it reaches an address that matches
> > @var{locspec}."
> 
> I think that reads worse than before.  It was good to say "location" before
> my change, so it should still be good after.  Please let's not overcorrect here.

Why is it overcorrection?  This is about program execution, so talking
about addresses is very natural.

But if we can agree about the "source location" variant, maybe most or
all of the remaining disagreements will go away.

Thanks.