This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens
- From: Keith Seitz <keiths at redhat dot com>
- To: Pedro Alves <palves at redhat dot com>, gdb-patches at sourceware dot org
- Date: Wed, 9 Aug 2017 08:48:09 -0700
- Subject: Re: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens
- Authentication-results: sourceware.org; auth=none
- Authentication-results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
- Authentication-results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=keiths at redhat dot com
- Dmarc-filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8B597357E44
- References: <1496406158-12663-1-git-send-email-palves@redhat.com> <1496406158-12663-35-git-send-email-palves@redhat.com>
On 06/02/2017 05:22 AM, Pedro Alves wrote:
> currently "b func tion" manages to set a breakpoint at "function" !
>
> All this years I had never noticed this, but now that the linespec
> completer actually works, this easily happens by accident, with:
That makes two of us!
> The operator_stoken changes are necessary due to a latent bug --
> currently "operator char" becomes "operatorchar", and later look ups
> only find it because strcmp_iw ignores the whitespace...
I have a similar fix on the compile branch. :-)
> gdb/ChangeLog:
> yyyy-mm-dd Pedro Alves <palves@redhat.com>
>
> * c-exp.y (oper): Add space to operator names.
> * cp-support.c (cp_symbol_name_matches_1)
> (cp_fq_symbol_name_matches): Pass language to
> strncmp_iw_with_mode.
> (test_cp_symbol_name_cmp): Add unit tests.
> * language.c (default_symbol_name_matcher): Pass language to
> strncmp_iw_with_mode.
> * utils.c: Include "cp-support.h" and <algorithm>.
> (valid_identifier_name_char, cp_skip_operator_token, skip_ws)
> (cp_is_operator): New functions.
> (strncmp_iw_with_mode): Use them. Add language parameter. Don't
> skip whitespace in the symbol name when the lookup name doesn't
> have spaces, and vice versa.
> (strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode.
^^^^^^^^
Not just language, but language_minimal.
> * utils.h (strncmp_iw_with_mode): Add language parameter.
> diff --git a/gdb/c-exp.y b/gdb/c-exp.y
> index 24a2fbd..0a182cc 100644
> --- a/gdb/c-exp.y
> +++ b/gdb/c-exp.y
> @@ -1487,7 +1487,7 @@ oper: OPERATOR NEW
> | OPERATOR '>'
> { $$ = operator_stoken (">"); }
> | OPERATOR ASSIGN_MODIFY
> - { const char *op = "unknown";
> + { const char *op = " unknown";
Good catch. I missed that.
> switch ($2)
> {
> case BINOP_RSH:
> @@ -1563,7 +1563,8 @@ oper: OPERATOR NEW
>
> c_print_type ($2, NULL, &buf, -1, 0,
> &type_print_raw_options);
> - $$ = operator_stoken (buf.c_str ());
> + std::string name = " " + buf.string ();
> + $$ = operator_stoken (name.c_str ());
> }
> ;
>
The only additional change that I have in my compile branch is
that since this type's name could come from a user, it needs to be canonicalized.
But I can hit that when(ever?!) I start submitting some of the precursor
patches that I have. [I have a test that demonstrates the need for
canonicalization in the c++compile branch.]
> diff --git a/gdb/cp-support.c b/gdb/cp-support.c
> index 84d8a6b..4c353c5 100644
> --- a/gdb/cp-support.c
> +++ b/gdb/cp-support.c
> @@ -1857,6 +1857,67 @@ test_cp_symbol_name_cmp ()
> CHECK_MATCH_C ("function(int)", "function(int)");
[snip a WHOLE LOTTA TESTS]
AWESOME!
> diff --git a/gdb/utils.c b/gdb/utils.c
> index 9798edc..484c1ef 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -65,6 +65,8 @@
> #include "gdb_usleep.h"
> #include "interps.h"
> #include "gdb_regex.h"
> +#include "cp-support.h"
> +#include <algorithm>
>
> #if !HAVE_DECL_MALLOC
> extern PTR malloc (); /* ARI: PTR */
> @@ -2418,22 +2420,227 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name,
> }
> }
>
> +/* True if CH is a character that can be part of a symbol name. I.e.,
> + either a number, a letter, or a '_'. */
> +
> +static bool
> +valid_identifier_name_char (int ch)
> +{
> + return (isalnum (ch) || ch == '_');
> +}
Couldn't this be language-dependent? [Yikes!]
Also note that there are a handful of places where this could be used
[follow-up patch?] in linespec.c, location.c, symtab.c. Maybe more.
We have logic for c++ operators all over the place. Note to self: this needs
to be cleaned up/consolidated.
> +
> +/* Skip to end of token, or to END, whatever comes first. */
> +
I think a(n explicit) mention that the input is assumed to be an operator name. It's
mentioned in the name, but please consider repeating that in the comment. It's important.
> +static const char *
> +cp_skip_operator_token (const char *token, const char *end)
> +{
> + const char *p = token;
> + while (p != end && !isspace (*p) && *p != '(')
> + {
> + if (valid_identifier_name_char (*p))
> + {
> + while (p != end && valid_identifier_name_char (*p))
> + p++;
> + return p;
> + }
> + else
> + {
> + /* Note, ordered such that among ops that share a prefix,
> + longer comes first. This is so that the loop below can
> + bail on first match. */
> + static const char *ops[] =
> + {
> + "[",
> + "]",
> + "~",
> + ",",
> + "-=", "--", "->", "-",
> + "+=", "++", "+",
> + "*=", "*",
> + "/=", "/",
> + "%=", "%",
> + "|=", "||", "|",
> + "&=", "&&", "&",
> + "^=", "^",
> + "!=", "!",
> + "<<=", "<=", "<<", "<",
> + ">>=", ">=", ">>", ">",
> + "==", "=",
> + };
> +
Man, I'd *swear* that we have this code repeated in numerous places, but
we actually don't. [Not that I looked that hard...]
> + for (const char *op : ops)
> + {
> + size_t oplen = strlen (op);
> + size_t lencmp = std::min<size_t> (oplen, end - p);
> +
> + if (strncmp (p, op, lencmp) == 0)
> + return p + lencmp;
> + }
> + /* Some unidentified character. Return it. */
> + return p + 1;
> + }
> + }
> +
> + return p;
> +}
> +
> +/* Advance string1/string2 past whitespace. */
> +
> +static void
> +skip_ws (const char *&string1, const char *&string2, const char *end_str2)
> +{
> + while (isspace (*string1))
> + string1++;
> + while (string2 < end_str2 && isspace (*string2))
> + string2++;
> +}
> +
> +static bool
> +cp_is_operator (const char *string, const char *start)
Missing comment?
> +{
> + return ((string == start
> + || !valid_identifier_name_char (string[-1]))
> + && strncmp (string, CP_OPERATOR_STR, CP_OPERATOR_LEN) == 0
> + && !valid_identifier_name_char (string[CP_OPERATOR_LEN]));
> +}
> +
> /* See utils.h. */
>
> int
> strncmp_iw_with_mode (const char *string1, const char *string2,
> - size_t string2_len, strncmp_iw_mode mode)
> + size_t string2_len, strncmp_iw_mode mode,
> + enum language language)
> {
> + const char *string1_start = string1;
> const char *end_str2 = string2 + string2_len;
> + bool skip_spaces = true;
> + bool have_colon_op = (language == language_cplus
> + || language == language_rust
> + || language == language_fortran);
>
Just a passing comment: I'm kinda torn on this. When new languages are added,
this is going to be yet another place that language implementers are going to
have to modify. While a language method would probably be better (for some
definition of "better"), I don't want to see the language vector bloat beyond
control either. So IMO there's no clear better path.
> while (1)
> {
> - while (isspace (*string1))
> - string1++;
> - while (string2 < end_str2 && isspace (*string2))
> - string2++;
> + if (skip_spaces
> + || ((isspace (*string1) && !valid_identifier_name_char (*string2))
> + || (isspace (*string2) && !valid_identifier_name_char (*string1))))
> + {
> + skip_ws (string1, string2, end_str2);
> + skip_spaces = false;
> + }
> +
> if (*string1 == '\0' || string2 == end_str2)
> break;
> +
> + /* Handle the :: operator. */
> + if (have_colon_op && string1[0] == ':' && string1[1] == ':')
At least this part is language-agnostic. That's a plus.
> + {
> + if (*string2 != ':')
> + return 1;
> +
> + string1++;
> + string2++;
> +
> + if (string2 == end_str2)
> + break;
> +
> + if (*string2 != ':')
> + return 1;
> +
> + string1++;
> + string2++;
> +
> + while (isspace (*string1))
> + string1++;
> + while (string2 < end_str2 && isspace (*string2))
> + string2++;
> + continue;
> + }
> +
> + /* Handle C++ user-defined operators. */
> + else if (language == language_cplus
> + && *string1 == 'o')
> + {
> + if (cp_is_operator (string1, string1_start))
> + {
> + /* An operator name in STRING1. Check STRING2. */
> + size_t cmplen = std::min<size_t> (CP_OPERATOR_LEN, end_str2 - string2);
line length == 85
> + if (strncmp (string1, string2, cmplen) != 0)
> + return 1;
[snip]
> @@ -2462,7 +2675,7 @@ int
> strncmp_iw (const char *string1, const char *string2, size_t string2_len)
> {
> return strncmp_iw_with_mode (string1, string2, string2_len,
> - strncmp_iw_mode::NORMAL);
> + strncmp_iw_mode::NORMAL, language_minimal);
> }
>
> /* See utils.h. */
> @@ -2471,7 +2684,7 @@ int
> strcmp_iw (const char *string1, const char *string2)
> {
> return strncmp_iw_with_mode (string1, string2, strlen (string2),
> - strncmp_iw_mode::MATCH_PARAMS);
> + strncmp_iw_mode::MATCH_PARAMS, language_minimal);
> }
I think the comments for both of these functions should be updated, since
they pass language_minimal to strncmp_iw_with_mode. Therefore,
strncmp_iw_with_mode (string1, string2, len, MATCH_PARAMS, a_language)
may not necessarily equal
strncmp_iw (string1, string2)
That may not be obvious to the casual user. Some sort of caveat seems prudent.
> /* This is like strcmp except that it ignores whitespace and treats
> diff --git a/gdb/utils.h b/gdb/utils.h
> index 9e531e0..4ce263e 100644
> --- a/gdb/utils.h
> +++ b/gdb/utils.h
> @@ -56,7 +56,8 @@ enum class strncmp_iw_mode
> extern int strncmp_iw_with_mode (const char *string1,
> const char *string2,
> size_t string2_len,
> - strncmp_iw_mode mode);
> + strncmp_iw_mode mode,
> + enum language language);
While most of these parameters are rather obvious usage, it is not obvious to
me why a strncmp-like function needs a language definition. [Of course,
I understand why after reading the code, but a brief mention of how LANGUAGE
affects the operation might be useful IMO. YMMV.]
>
> /* Do a strncmp() type operation on STRING1 and STRING2, ignoring any
> differences in whitespace. STRING2_LEN is STRING2's length.
>
Keith