This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] glob: Avoid copying the d_name field of struct dirent [BZ #19779]


On 03/31/2016 01:27 AM, Roland McGrath wrote:
>>> uint8_t is big enough for d_type, and will make the struct smaller on
>>> ILP32.
>>
>> I assume the struct is turned into scalars anyway, but I added the
>> __typeof__.
> 
> I like that fine, but we need to check if it's OK in gnulib.

I'm going to use unsigned char instead.

>>>> -		      len = NAMLEN (d);
>>>> -		      names->name[cur] = (char *) malloc (len + 1);
>>>> +		      names->name[cur] = strdup (e.d_name);
>>>>  		      if (names->name[cur] == NULL)
>>>>  			goto memory_error;
>>>> -		      *((char *) mempcpy (names->name[cur++], name, len))
>>>> -			= '\0';
>>>> +		      ++cur;
>>>
>>> In the _DIRENT_HAVE_D_NAMLEN case, this assumes that strdup is as efficient
>>> as malloc+mempcpy (with no strlen required).  Perhaps it's close enough,
>>> but that is a subtle change you didn't mention as intended.
>>
>> I want to avoid conditionalized code because these things tend to break
>> after a while (or never work in the first place).
> 
> That might be a sufficient reason to avoid writing such code in the first
> place.  But when you are changing something from how it already is, you
> need to be clear about what you are changing and why.  My point was not
> that I had no idea what your motivation for the specific change might have
> been.  My point was that you billed this whole larger change as doing one
> thing, and then slipped this in without comment.

Two more things:

For the non-glob64 case, I removed a copy of the name, which amounts to
one traversal less.  The internal strlen operation in strdup adds back a
string traversal.  So there is no net difference of the work done
(except that there are fewer writes).  CONVERT_DIRENT_DIRENT64 is used
in the glob64 code, so there is an unnecessary strlen operation there.

But the glibc tests for GLOB_ALTDIRFUNC do not set the d_namlen field.
Neither does the code in OpenSSH.  (GNU make sets d_namlen, though.)
Not looking at d_namlen avoids introducing application bugs.

d_ino/d_fileno has a similar problem, and even make doesn't set it. We
should probably ignore it altogether in GLOB_ALTDIRFUNC mode.

> In this case, we already have supported configurations of both cases.
> Hurd is _DIRENT_HAVE_D_NAMLEN (it uses the generic bits/dirent.h), while
> others are not.  So just the configurations that work today not being
> broken in the future (and being adequately tested) prevents "tending to
> break after a while".

I'm surprised there aren't any test failures on Hurd.  d_namlen is an
unsigned char, probably there is a sufficiently large value in it by
accident so that we end copying entire names, including the null
terminator, plus some extra bytes.

> I am making a big deal out of this little case, and it's not because I
> think this particular little performance-relevant change is so likely to
> be important or even that I personally necessarily object to it.  What I
> want to make a big deal about is the project's standards for attention
> to detail and for clarity and transparency about intent and effect of
> changes.  Sometimes "and I slipped this other little change in while I
> was at it" is fine, though the starting place is always that every
> separately-motivated change should be a separate change.

I can remove the use of d_namlen as a separate patch, before the copy
avoidance.  I did not realize it has implications on application
behavior at the time, for which I apologize.  But now, I still think
it's a desirable improvement.

> What's never
> fine is slipping such a change in without comment or discussion, which
> is what happened here.  Now we're discussing it rather than it going
> unnoticed, which is what matters most.  But I am also concerned by the
> way you responded to my review, which to me read as an off-hand remark
> about a personal preference as if that were an adequate response.  Such
> preferences are normal in the reasoning behind writing new code in a
> particular way, and in such cases will likely go unremarked.  But here
> you are changing existing code in a way that changes the compiled code
> and its performance characteristics, and the only motivation you cited
> was your preferences for how code should be all else being equal.  That
> completely ignores the history behind the code being how it is today,
> and hence ignores the project's key principle of conservatism.

Sorry, I don't follow.  I expressed a preference for a coding strategy
that reduces risk by aligning behavior across architectures, so that we
can reuse the results of testing on one set of architectures for others.

I think this approach is better than to check in code that wasn't tested
at all, or demand that every developer tests changes in generic code on
Hurd, x32, alpha, and other architectures which have interesting
differences not found anywhere else.

>> +/* Extract name and type from directory entry.  No copy of the name is
>> +   made.  If SOURCE is NULL, result name is NULL.  */
>> +static struct readdir_result
>> +convert_dirent (const struct dirent *source)
>> +{
>> +  if (source == NULL)
>> +    return (struct readdir_result) {};
>> +  else
>> +    return (struct readdir_result)
>> +      {
>> +	.name = source->d_name,
>> +	D_INO_TO_RESULT (source),
>> +	D_TYPE_TO_RESULT (source)
>> +      };
>> +}
>> +
>> +#ifndef COMPILE_GLOB64
>> +/* Like convert_dirent, but works on struct dirent64 instead.  */
>> +static struct readdir_result
>> +convert_dirent64 (const struct dirent64 *source)
>> +{
>> +  if (source == NULL)
>> +    return (struct readdir_result) {};
>> +  else
>> +    return (struct readdir_result)
>> +      {
>> +	.name = source->d_name,
>> +	D_INO_TO_RESULT (source),
>> +	D_TYPE_TO_RESULT (source)
>> +      };
>> +}
>> +#endif
> 
> While inlines are better than macros when all else is equal, this
> duplication of something that lends itself to a single-expression form
> (?:) is a good reason to make it a macro instead.

The call site relies on the function nature because the source argument
is evaluated multiple times.  I could change that by using a temporary
at the call site, but ...

> Also, compound literals are a GNU extension and I'm not sure if they
> meet gnulib's portability requirements or not.

... avoiding the compound literal requires writing a function.  I could
put the shared body into a macro, though.

Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]