This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Implement strlcat [BZ#178]

From: Florian Weimer <fweimer at redhat dot com>
To: Paul Eggert <eggert at cs dot ucla dot edu>
Cc: libc-alpha at sourceware dot org
Date: Fri, 4 Dec 2015 14:41:17 +0100
Subject: Re: [PATCH] Implement strlcat [BZ#178]
Authentication-results: sourceware.org; auth=none
References: <56547472 dot 3010302 at redhat dot com> <5654B1FE dot 5020100 at cs dot ucla dot edu> <5654B796 dot 7070302 at redhat dot com> <5656E018 dot 5020608 at cs dot ucla dot edu> <565F211A dot 2030909 at redhat dot com> <56607CD1 dot 3050209 at cs dot ucla dot edu>

On 12/03/2015 06:33 PM, Paul Eggert wrote:
> Thanks for the review.  Revised patchset attached.  At this point I'm
> inclined to install the first two patches (which don't change semantics)
> to make it easier to review and maintain the third one (which adds
> strlcpy+strlcat), but I'll hold off a bit longer on that to get more
> feedback.

> Replying to your comments:
>> I don't understand the âand as for strings usually pointers â are usedâ
>> part.
> 
> Changed to "A wide-string variable is usually declared to be a pointer
> of type @code{wchar_t *}, by analogy with string variables and
> @code{char *}." Hope this makes it clear.

Oh, I see now.  A few commas in the original version would have helped,
but this much better.

>> Standard uses âwide stringâ by the way, not âwide character stringâ.
> 
> C11 says "wide string", POSIX says "wide-character string". True, the
> more-concise form is better here, so I've changed it to that. Similarly,
> I changed "multibyte character string" (POSIX wording) to "multibyte
> string" (C Standard wording). I wish the standards could standardize the
> wording...

Oh, how unfortunate.

>> +literal.  Strings can also be formed by @dfn{string concatenation}:
>> +@code{"a" "b"} is the
>>
>> The original had âstring literalâ.  As this only works for string
>> literals, it's best to keep it.
> 
> The original was incorrect; it said "string literals can also be formed
> by @dfn{string concatenation}" but string literals are the input to
> string concatenation, not the output from it. I changed the wording to
> "String literals can also contribute to @dfn{string concatenation}:...".

Okay, fine.

>> @@ -309,7 +318,8 @@ returns @var{maxlen}.  Therefore this function is
>> equivalent to
>>   @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) :
>> @var{maxlen})}
>>   but it
>>   is more efficient and works even if the string @var{s} is not
>> -null-terminated.
>> +null-terminated so long as @var{maxlen} does not exceed the
>> +size of @var{s}'s array.
>>
>> This doesn't make much sense anymore because strings are defined to be
>> always null-terminated.
> 
> Reworded to "If the array @var{s} of size @var{maxlen} contains a null
> byte, the @code{strnlen} function returns the length of the string
> @var{s} in bytes.  Otherwise it returns @var{maxlen}. Therefore this
> function is equivalent to @code{(strlen (@var{s}) < @var{maxlen} ?
> strlen (@var{s}) : @var{maxlen})} but it is more efficient and works
> even if @var{s} is not null-terminated so long as @var{maxlen} does not
> exceed the size of @var{s}'s array."

Looks good.

I'm reading the second patch now.

This claim about _FORTIFY_SOURCE is misleading:

+Although these string-truncation functions have been used to fend off
+some buffer overruns, nowadays more-systematic techniques are often
+available, such as defining the @code{_FORTIFY_SOURCE} macro or using
+GCC's @option{-fsanitize=address} option.  Ironically, use of these
+string-truncation functions can mask application bugs that would
+otherwise be caught by the more-systematic techniques.

Unfortunately, source fortification catches few buffer overflows in this
area.  For example, this code:

char *
concat (const char *a, const char *b)
{
  size_t alen = strlen (a);
  size_t blen = strlen (b);
  size_t len = alen + blen;
  char *result = malloc (len + 1);
  if (result != NULL)
    {
      strcpy (result, a);
      strcat (result, b);
    }
  return result;
}

Compiles to (using GCC 5.1.1):

concat:
	pushl	%ebp
	pushl	%edi
	pushl	%esi
	pushl	%ebx
	subl	$24, %esp
	movl	44(%esp), %ebp
	pushl	%ebp
	call	strlen
	popl	%edx
	pushl	48(%esp)
	movl	%eax, %ebx
	call	strlen
	movl	%eax, %esi
	leal	1(%ebx,%eax), %eax
	movl	%eax, (%esp)
	call	malloc
	addl	$16, %esp
	testl	%eax, %eax
	movl	%eax, %edi
	je	.L2
	subl	$4, %esp
	addl	$1, %esi
	pushl	%ebx
	pushl	%ebp
	addl	%edi, %ebx
	pushl	%eax
	call	memcpy
	addl	$12, %esp
	pushl	%esi
	pushl	44(%esp)
	pushl	%ebx
	call	memcpy
	addl	$16, %esp
.L2:
	addl	$12, %esp
	movl	%edi, %eax
	popl	%ebx
	popl	%esi
	popl	%edi
	popl	%ebp
	ret

strcpy and strcat have been replaced by memcpy, but this only eliminates
only one source of buffer overflows here.

In my experience, _FORTIFY_SOURCE mostly covers the
write-to-statically-sized-buffer case.  The manual should not make
promises the current GCC/glibc combinations cannot deliver.
_FORTIFY_SOURCE will always be brittle for the more complex cases
because of the dependency on GCC optimization behavior.

It's also highly application-specific whether a crash (induced by
_FORTIFY_SOURCE) or truncation (from strncpy or strlcpy) is better.

Florian

Follow-Ups:
- Re: [PATCH] Implement strlcat [BZ#178]
  - From: Paul Eggert

References:
- Re: [PATCH] Implement strlcat [BZ#178]
  - From: Florian Weimer
- Re: [PATCH] Implement strlcat [BZ#178]
  - From: Paul Eggert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]