This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

The semantics of strlcpy and strlcat

From: Zack Weinberg <zackw at panix dot com>
To: GNU C Library <libc-alpha at sourceware dot org>
Date: Thu, 21 Jan 2016 18:37:50 -0500
Subject: The semantics of strlcpy and strlcat
Authentication-results: sourceware.org; auth=none

I've thought some more about my concerns with the existing patches
posted for strlcpy/strlcat (and the position taken by some of the
people arguing about how they should behave), and I've decided that
"bug-for-bug compatibility with OpenBSD" is the wrong way to express
what I want.

Abstractly, I have two concerns.  I will start with the more
philosophical one: I think all new functions added to the C library
should be implemented *and documented* to have as little undefined
behavior as possible.  I have come to believe that the large number of
cases triggering undefined behavior in the C standard, especially for
corner cases of library functions, is a Bad Thing.  In the case of
strlcpy and strlcat, that means I think it is *more important* for
them to have well-defined behavior in as many corner cases as
possible, than for them to guarantee nul-termination as a
postcondition.  Keep in mind that "undefined behavior" *does not* mean
"if the preconditions are not satisfied, the function never returns".
It means "the compiler may assume that the preconditions are always
satisfied, and make arbitrarily aggressive optimizations based on that
assumption."

The more practical concern is that these functions are already
provided by some C libraries, and a number of applications are
carrying around their own copies.  If we do not implement semantics
that are closely compatible, then application authors will declare our
implementation broken and continue using their own copies, and our
work will be for nothing.  This is especially important since, if we
add these functions, we will be doing so *only* for compatibility's
sake -- I don't think I've seen anyone in the discussion here argue
that they actually *like* this API.  In this context, the most
important thing to compare with is *not* the OpenBSD original, but the
embedded copies already in use by applications.

I inspected five existing embedded copies of strlcpy

http://inn.eyrie.org/svn/trunk/lib/strlcpy.c
http://doxygen.postgresql.org/strlcpy_8c_source.html
http://www.opensource.apple.com/source/OpenSSH/OpenSSH-7.1/openssh/bsd-strlcpy.c
http://www.net-snmp.org/dev/agent/strlcpy_8c_source.html
https://code.google.com/p/honeyd/source/browse/trunk/honeyd/strlcpy.c

and four existing embedded copies of strlcat

http://inn.eyrie.org/svn/trunk/lib/strlcat.c
http://doxygen.postgresql.org/strlcat_8c.html
http://www.opensource.apple.com/source/OpenSSH/OpenSSH-7.1/openssh/bsd-strlcat.c
https://code.google.com/p/honeyd/source/browse/trunk/honeyd/strlcat.c

(net-snmp does not appear to contain strlcat)

There are basically two variations on each: either they use strlen and
memcpy (not memmove), or they use open-coded byte-by-byte copy loops.
This is how they behave:

* If the buffers overlap, the behavior is undefined.  That is, either
memcpy is used, or an overlap-blind copy loop is used - either way,
the "correct" behavior (= "what memmove would have done") will not
always happen.

* If the destination-size argument is zero, the destination-buffer
pointer is not dereferenced. This is not by accident - all
implementations take special care to do this.

* Three out of four strlcat implementations have well-defined behavior
(they return `destsize+strlen(src)` and leave the destination buffer
unchanged) when the destination buffer does not contain a terminating
nul.  The other one blindly calls strlen() on the destination buffer,
and I very strongly suspect that this is an oversight.

Given this, I will accept undefined behavior in the case of
overlapping buffers, but I insist that the implemented *and
documented* semantics must be that the destination buffer pointer will
not be dereferenced by either function if the size argument is zero,
and that strlcat will return `destsize+strlen(src)` and leave the
destination buffer unchanged if it fails to find a nul byte within the
first destsize bytes of the destination buffer.

Yes, this means the functions will not nul-terminate the destination
buffer in certain corner cases.  That should be clearly stated in the
documentation as a consequence of these semantics.

Note that "the destination buffer pointer is not dereferenced when the
size argument is zero" is *stronger* than "the destination buffer
pointer may be NULL when the size argument is zero".  It is the former
semantic that I am insisting on.

zw

Follow-Ups:
- Re: The semantics of strlcpy and strlcat
  - From: Paul Eggert
- Re: The semantics of strlcpy and strlcat
  - From: Russ Allbery

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]