This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Implement strlcat [BZ#178]
- From: Florian Weimer <fweimer at redhat dot com>
- To: Paul Eggert <eggert at cs dot ucla dot edu>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 4 Dec 2015 14:41:17 +0100
- Subject: Re: [PATCH] Implement strlcat [BZ#178]
- Authentication-results: sourceware.org; auth=none
- References: <56547472 dot 3010302 at redhat dot com> <5654B1FE dot 5020100 at cs dot ucla dot edu> <5654B796 dot 7070302 at redhat dot com> <5656E018 dot 5020608 at cs dot ucla dot edu> <565F211A dot 2030909 at redhat dot com> <56607CD1 dot 3050209 at cs dot ucla dot edu>
On 12/03/2015 06:33 PM, Paul Eggert wrote:
> Thanks for the review. Revised patchset attached. At this point I'm
> inclined to install the first two patches (which don't change semantics)
> to make it easier to review and maintain the third one (which adds
> strlcpy+strlcat), but I'll hold off a bit longer on that to get more
> feedback.
> Replying to your comments:
>> I don't understand the âand as for strings usually pointers â are usedâ
>> part.
>
> Changed to "A wide-string variable is usually declared to be a pointer
> of type @code{wchar_t *}, by analogy with string variables and
> @code{char *}." Hope this makes it clear.
Oh, I see now. A few commas in the original version would have helped,
but this much better.
>> Standard uses âwide stringâ by the way, not âwide character stringâ.
>
> C11 says "wide string", POSIX says "wide-character string". True, the
> more-concise form is better here, so I've changed it to that. Similarly,
> I changed "multibyte character string" (POSIX wording) to "multibyte
> string" (C Standard wording). I wish the standards could standardize the
> wording...
Oh, how unfortunate.
>> +literal. Strings can also be formed by @dfn{string concatenation}:
>> +@code{"a" "b"} is the
>>
>> The original had âstring literalâ. As this only works for string
>> literals, it's best to keep it.
>
> The original was incorrect; it said "string literals can also be formed
> by @dfn{string concatenation}" but string literals are the input to
> string concatenation, not the output from it. I changed the wording to
> "String literals can also contribute to @dfn{string concatenation}:...".
Okay, fine.
>> @@ -309,7 +318,8 @@ returns @var{maxlen}. Therefore this function is
>> equivalent to
>> @code{(strlen (@var{s}) < @var{maxlen} ? strlen (@var{s}) :
>> @var{maxlen})}
>> but it
>> is more efficient and works even if the string @var{s} is not
>> -null-terminated.
>> +null-terminated so long as @var{maxlen} does not exceed the
>> +size of @var{s}'s array.
>>
>> This doesn't make much sense anymore because strings are defined to be
>> always null-terminated.
>
> Reworded to "If the array @var{s} of size @var{maxlen} contains a null
> byte, the @code{strnlen} function returns the length of the string
> @var{s} in bytes. Otherwise it returns @var{maxlen}. Therefore this
> function is equivalent to @code{(strlen (@var{s}) < @var{maxlen} ?
> strlen (@var{s}) : @var{maxlen})} but it is more efficient and works
> even if @var{s} is not null-terminated so long as @var{maxlen} does not
> exceed the size of @var{s}'s array."
Looks good.
I'm reading the second patch now.
This claim about _FORTIFY_SOURCE is misleading:
+Although these string-truncation functions have been used to fend off
+some buffer overruns, nowadays more-systematic techniques are often
+available, such as defining the @code{_FORTIFY_SOURCE} macro or using
+GCC's @option{-fsanitize=address} option. Ironically, use of these
+string-truncation functions can mask application bugs that would
+otherwise be caught by the more-systematic techniques.
Unfortunately, source fortification catches few buffer overflows in this
area. For example, this code:
char *
concat (const char *a, const char *b)
{
size_t alen = strlen (a);
size_t blen = strlen (b);
size_t len = alen + blen;
char *result = malloc (len + 1);
if (result != NULL)
{
strcpy (result, a);
strcat (result, b);
}
return result;
}
Compiles to (using GCC 5.1.1):
concat:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $24, %esp
movl 44(%esp), %ebp
pushl %ebp
call strlen
popl %edx
pushl 48(%esp)
movl %eax, %ebx
call strlen
movl %eax, %esi
leal 1(%ebx,%eax), %eax
movl %eax, (%esp)
call malloc
addl $16, %esp
testl %eax, %eax
movl %eax, %edi
je .L2
subl $4, %esp
addl $1, %esi
pushl %ebx
pushl %ebp
addl %edi, %ebx
pushl %eax
call memcpy
addl $12, %esp
pushl %esi
pushl 44(%esp)
pushl %ebx
call memcpy
addl $16, %esp
.L2:
addl $12, %esp
movl %edi, %eax
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
strcpy and strcat have been replaced by memcpy, but this only eliminates
only one source of buffer overflows here.
In my experience, _FORTIFY_SOURCE mostly covers the
write-to-statically-sized-buffer case. The manual should not make
promises the current GCC/glibc combinations cannot deliver.
_FORTIFY_SOURCE will always be brittle for the more complex cases
because of the dependency on GCC optimization behavior.
It's also highly application-specific whether a crash (induced by
_FORTIFY_SOURCE) or truncation (from strncpy or strlcpy) is better.
Florian