This is the mail archive of the
archer@sourceware.org
mailing list for the Archer project.
Stop the Insanity! Linespec Rewrite
- From: Keith Seitz <keiths at redhat dot com>
- To: archer at sourceware dot org
- Date: Thu, 01 Mar 2012 17:13:37 -0800
- Subject: Stop the Insanity! Linespec Rewrite
Hi,
I think that most of us know the pain that is linespec.c. Well, Tom gave
me the green light on this a while back, and I played with an initial
design/implementation in September.
Earlier this month, I dug this out of mothballs, and it is now time to
submit the work-in-progress for comment. This is *not* a finished design
and/or implementation! It is simply a place to stop along the road and
ask for advice, for a fresh set of eyes. Consider this akin to an RFC.
Let me recap what some of the more important requirements for this
rewrite were:
- More robust parser. As several of us can attest to, adding just about
anything to linespec.c is PAINFUL. Really painful. We need a more robust
linespec-to-sal infrastructure.
- Consolidate similar functionality. There are several functions which
pretty much do the same thing (e.g., decode_variable vs decode_compound
vs the remaining bits of decode_internal).
- Rain on the quoting parade. This is actually pretty tightly coupled to
the first requirement, but it is such a nightmare as it is now, that
it's just worth calling out separately. I doubt I need say more.
- Enable a way for "explicit" linespecs (as I call them, for lack of a
better term). I would like to add the ability to do, e.g., "break
-sourcefile foo.c -function baz::doit(char*) -offset 3" and the like.
Probably most useful to MI (and python), but not altogether useless to
CLI users.
- Increase the maintainability and reduce the fragility of this code. If
you haven't had the pleasure of hacking on linespecs, consider yourself
lucky. You've never lost sleep over the prospect of digging into this code.
I'm sure there are some other requirements/wish-list items which I've
forgotten. These are just basically mine.
If you'll pardon my rambling, here are some notes about what the
design/implementation does/does not do:
o We now have a (trivial) parser and a lexer. The lexer "word" breaks
the input on ':', but it does know about "::" as a scope operator for
C++ (but nothing on ObjC). The whole thing attempts to integrate all
languages into one design/implementation. You will not see any
references to current_language anywhere, except one tiny place where
canonicalization is done, and that's cut-n-paste from current sources.
o Speaking of canonicalization: that is now done in only one place in
the parser.
o Quoting is greatly minimized. Got spaces in a filename? Doesn't matter
anymore. The only thing that matters is ':'. Everything else is simply
text. When a quote character (either " or ') is seen, everything is
skipped to the next quote character. [No, you cannot mix quote
characters in the same lexeme, e.g., "foo'bar'" will return the lexeme
foo'bar'.] If your filename is "main::foo.cc", then you must quote that:
"break 'main::foo.cc':3".
o As previously, Objective C is short-circuited. I have not changed that
at all. Selector names would need quoting to avoid being lexed
incorrectly. I haven't pursued this at all (and probably won't).
o All current functionality is maintained AFAICT. I did not add any new
functionality that didn't just drop out from doing it this way. [One
example: you can now do "break myclass::mymethod::a_label". It "just
works."]
o decode_* are almost all gone. All symbol lookups are essentially done
by one function (although there is one helper function for it). [Notable
exceptions: decode_dollar and decode_objc.]
o Canonicalization (of the linespec) is now pretty trivial and isolated
to one function. It is no longer scattered all over the place.
o Error reporting could be greatly expanded. The version I have
committed simply maintains the status quo.
o Some linespecs don't work anymore. IMO, these existed because of the
nightmare of the existing code. The only group of linespecs that won't
work involve "goofy" quoting. For example, you cannot do: "break
klass::'operator +'" anymore. Nor can you do "break
'foo.c:static_function'". IMO a small price to pay for sanity.
o I have not written tests for this yet.
o I have not checked memory allocation or other such common problems. My
goal was to get something that works.
o There is probably some dead code that can/could now be removed.
[cp_validate_operator comes to mind] I have not even begun to look for this.
o Probably lots of cleanup to do. Probably goofed/thinko'd a few things,
too, especially in the area of ambiguous linespecs. This is part of the
reason for exposing this less-than-beta quality code to other developers
and maintainers. I still have comments to myself all over the place
(comments with "!!" and #if WHATS_THIS_FOR). Who needs a stinkin'
notebook!? I take notes in the code.
Ok, so enough of that. You probably are all anxious to see what I've
actually done...
If you will permit me a few more paragraphs of your time, please allow
me very briefly describe the underlying idea. Basically, I've split
linespec decoding into several main areas: symtabs, symbols, labels,
line offsets, SALs, and canonicalization The first four are pretty
obvious. We have to find those things. The last two are a rather large
departure from the current code. [I won't go so far as to call it a design.]
Previously we constructed SALs as we found symbols. At the same time, we
also constructed canonical linespec representations. We now take in a
list of parameters and convert them to SALs in one swoop. Then we
compute any necessary canonicalizations.
The main routine dealing with SALs is convert_linespec_to_sals. This
converts a structure into a list of SALs, which is then returned to the
caller. Canonicalization is done by canonicalize_linespec.
The structure which must be "filled-in" for convert_linespec_to_sals is
"typedef struct linespec *linespec_t". Start with that. Right now, the
only way to fill-in this structure is via the parser, but another
function for "explicit" linespecs could fill it in, too, by using some
of the new functions that I've introduced.
Ok, so enough chatter. You want to see the actual code.
Here you go: Archer branch "archer-keiths-linespec-rewrite" [duh!]
Please share your comments, concerns, and suggestions. Please help me
stop the [linespec] insanity!
Keith