This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] Rewrite iconv option parsing [BZ #19519]

From: Arjun Shankar <arjun dot is at lostca dot se>
To: Siddhesh Poyarekar <siddhesh at gotplt dot org>
Cc: libc-alpha at sourceware dot org, Florian Weimer <fweimer at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>
Date: Mon, 13 Jan 2020 16:12:51 +0000
Subject: Re: [PATCH v2] Rewrite iconv option parsing [BZ #19519]
References: <20191220164437.GA95222@aloka.lostca.se> <694f24af-182e-375c-11b3-b2a36d48b052@gotplt.org>

Hi Siddhesh,

> > --- /dev/null
> > +++ b/iconv/gconv_charset.c
> > @@ -0,0 +1,202 @@
> > +/* Charset name normalization.
> > +   Copyright (C) 2019 Free Software Foundation, Inc.
> 
> 2020, for all other new files too.
> 

Fixed!

> > +find_suffix (char *s)

> > +  for (int i = 0; s[i] != '\0'; i++)
> > +    if (s[i] == '/')
> > +      slash_count++;
> > +
> > +  if (slash_count < 2)
> > +    return NULL;
> > +
> > +  char *last_slash = strrchr (s, '/');
> > +  char *last_comma = strrchr (s, ',');
> 
> You could do this in the parent loop and avoid the strrchr calls.

Yes. I've done this now.

> > +  /* Which of the two comes later?  */
> > +  if ((last_slash != NULL)
> > +      && (last_comma != NULL))
> > +    return (last_slash > last_comma) ? last_slash : last_comma;
> > +
> > +  return (last_slash != NULL) ? last_slash : last_comma;
> 
> Why would last_slash ever be NULL?  You already exit if number of
> slashes are less than 2.

Right. The check was unnecessary.
 
> In fact, all of the above could be squished up into the loop like so
> (untested, so you may need to clean it up):
> 
>   char *suffix_term = NULL;
> 
>   for (int i = 0; s[i] != '\0'; i++)
>     switch (s[i])
>       {
>       case '/':
>         slash_count++;
>         /* Fallthrough */
>       case ',':
>         suffix_term = &s[i];
>       }
> 
>   if (slash_count >= 2)
>     return suffix_term;
> 
>   return NULL;

Yes. This replacement looks right and is more readable. I've incorporated it.

> > +/* This function parses an iconv_open encoding (tocode or fromcode).  It strips
> > +   any suffixes (such as TRANSLIT or IGNORE) from the code, and if it passed a
> > +   non-NULL conv_spec, it turns on the corresponding flag(s) in it.
> > +   The function returns 0 on success and non-zero on allocation failure.  */
> > +static int
> > +gconv_parse_code (struct gconv_parsed_code *pc, const char *code)
> > +{
> > +  pc->code = __strdup (code);
> > +  if (pc->code == NULL)
> > +    return 1;

Based on comments you made further down, I've stop passing `const char *code'
to this function, and instead let it work directly on a pre-allocated string
in `pc'; and I've removed the strdup. I've also changed the return type to void
and the function description to match all this.

> > +  while (1)
> > +    {
> > +      /* First drop any trailing whitespaces and separators.  */
> > +      size_t len = strlen (pc->code);
> > +      while ((len > 0)
> > +             && (isspace (pc->code[len - 1])
> > +                 || pc->code[len - 1] == ','
> > +                 || pc->code[len - 1] == '/'))
> > +        {
> > +          pc->code[len - 1] = '\0';
> > +          len--;
> > +        }
> 
> You don't need to update the end every single time, just decrement len
> in the loop and at the end set code[len-1] to '\0' provided len remains
> greater than 0.  It may also be clearer to exit right here if the code
> ended up being all whitespaces and separators instead of doing so after
> calling find_suffix below.

Right. One difference: I now unconditionally set code[len] to '\0' at the end
since `len' will already be decremented in the loop, and never be negative.

> > +          /* The SUFFIX is an index into the CODE character array and
> > +             points to one past the end of the code and any unprocessed
> > +             suffixes and to the beginning of the suffix being processed
> > +             during this iteration.  We must process SUFFIX and then
> > +             terminate the preceding text so that it isn't encountered
> 
> Suggest:
> 
> "drop it from CODE by terminating the preceding text with NULL."

Yes. I've changed this.

> > +struct gconv_spec *
> > +__gconv_create_spec (struct gconv_spec *conv_spec, const char *fromcode,
> > +                   const char *tocode)
> > +{
> > +  struct gconv_parsed_code pfc, ptc;
> > +
> > +  int ret = gconv_parse_code (&pfc, fromcode);
> > +  if (ret != 0)
> > +    return NULL;
> > +
> > +  ret = gconv_parse_code (&ptc, tocode);
> > +  if (ret != 0)
> > +    {
> > +      free (pfc.code);
> > +      return NULL;
> > +    }
> 
> The frees look a bit suspicious; why isn't pfc.code freed in the first
> one, and ptc.code in the second one?  Also, it might be easier to read
> if you do the strdup here instead of inside gconv_parse_code.  It may
> also avoid the struct altogether since you can just pass &ptc_translit,
> &ptc_ignore, etc. instead of making the struct.  In fact AFAICT, you
> don't even need pfc_* since you're not setting it anywhere.

The code is right, because in the version of the patch you've seen,
gconv_parse_code would only fail upon allocation failure, thus the
.code corresponding to a failed call wouldn't need to be freed.

That being said, this block is not an easy read. I like your version, and
I've worked it into the next iteration of the patch with minor changes.

> So overall I'd write the function something like this (untested again!):
> 
>   struct gconv_spec ret = NULL;
>   bool translit, ignore;
> 
>   char *pfc_code = strdup (fromcode);
>   if (pfc_code == NULL)
>     goto out;
> 
>   char *ptc_code = strdup (tocode);
>   if (ptc_code == NULL)
>     goto out;

I decided to go for two consequtive strdups followed by a conditional goto.
This avoids the free in goto from reading an uninitialized ptc/pfc.code.

>   /* Adjust GCONV_PARSE_CODE to set translit and ignore flags only if
>      the pointers are non-NULL.  */
>   if (gconv_parse_code (&pfc, pfc_code, NULL, NULL) != 0)
>     goto out;
> 
>   if (gconv_parse_code (&pfc, pfc_code, &translit, &ignore) != 0)
>     goto out;

gconv_parse_code now always succeeds. So this got changed a bit.

>   conv_spec->fromcode = malloc (strlen (fromcode) + 3);
>   if (conv_spec->fromcode == NULL)
>     goto out;
> 
>   conv_spec->tocode = malloc (strlen (tocode) + 3);
>   if (conv_spec->tocode == NULL)
>     {
>       free (conv_spec->fromcode);
>       conv_spec->fromcode = NULL;
>       goto out;
>     }
> 
>   conv_spec->translit = ptc.translit;
>   conv_spec->ignore = ptc.ignore;
> 
>   /* Strip unrecognized characters and ensure that the code has two '/'
>      characters as per conversion code triplet specification.  */
>   strip (conv_spec->fromcode, pfc_code);
>   strip (conv_spec->tocode, ptc_code);
>   ret = conv_spec;
> 
> out:
>   free (pfc_code);
>   free (ptc_code);
> 
>   return ret;

Looks good to me. I've incorporated this.

I'm about to post the next iteration of the patch but wanted the explain the
differences inline here first.

Thanks for the review!

References:
- Re: [PATCH v2] Rewrite iconv option parsing [BZ #19519]
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]