1890 – strerror() unnecessarily non thread-safe

Bug 1890 - strerror() unnecessarily non thread-safe

Summary: strerror() unnecessarily non thread-safe

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.3.6

Importance:	P2 normal
Target Milestone:	2.32
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (1):	19694 (view as bug list)
Depends on:
Blocks:

Reported:	2005-11-19 15:15 UTC by Stefan Puiu
Modified:	2023-12-13 09:39 UTC (History)
CC List:	5 users (show)

See Also:	30555
Host:
Target:
Build:
Last reconfirmed:	2020-01-17 00:00:00

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stefan Puiu 2005-11-19 15:15:17 UTC

Because of some confusing information in the strerror() manual page, I decided
to check the glibc-2.3.6 source and see for myself, and was dissapointed to see
how strerror_r() (sysdeps/generic/_strerror.c) and strerror()
(string/strerror.c) are implemented. The only reason strerror() is not thread
safe is because it insists keeping a global buffer around, which it mallocs when
needed and then fills it in when it gets called with an invalid errno, with a
string like "Unknown error: <errno>". I would say this is a useless
complication, buys you nothing (you can examine the value without strerror()
supplying it back to you in a buffer), and forces people to use the non-POSIX
strerror_r() (which returns a char*), which only uses the supplied buffer when
the aforementioned situation occurs, further confusing people. 

I suggest changing strerror so that it simply returns "Unknown error" when an
invalid errno is supplied, and deprecating strerror_r. I can provide a patch if
needed. 

Note: both Solaris and HP-UX have thread-safe strerror() functions, so this will
ease the work of people developing applications for multiple Unices.

Comment 1 Ulrich Drepper 2005-11-22 18:13:11 UTC

strerror_r is the POSIX function to use.  Everything else is incompatible in
multi-threaded environment.  Additionally, the extra info provided by strerror
for unknown errors is crucial in some situations, it is completely unacceptable
to return a generic string.  There will be no change.

Comment 2 Stefan Puiu 2005-11-23 07:35:51 UTC

Well, the glibc info file says: 

"This function `strerror_r' is a GNU extension"

So either the documentation is wrong, or the function isn't conforming to POSIX
- I don't have a copy of the standard to check. This document: 
http://www.opengroup.org/rtforum/uploads/40/7319/POSIX_and_Linux_Application_Compatibility_v0.92_released_22_April_05.pdf

says that POSIX strerror_r() returns an int, the GNU one returns a char*; OTOH,
as I said before, other platforms have thread-safe strerror() (Solaris 8 has it,
HP-UX 11i has it), and some (Solaris 8, for example) don't even have
strerror_r() *at all*. Thus, using strerror_r() breaks portability anyway. Not
to mention that if you google for "strerror_r on linux" you'll find posts by
users that were confused by the fact that the function didn't even use the
supplied buffer (check out
http://lists.gnu.org/archive/html/autoconf/2004-12/msg00079.html or
http://www.openldap.org/lists/openldap-bugs/200404/msg00191.html). In my
opinion, requiring a buffer argument that you *might* use in some weird
circumstances is bad design. 

The "extra info" you talk about can be provided by simply printing out the errno
value. If any applications rely on errnos outside the normal range, they should
do this anyway.

Comment 3 Ulrich Drepper 2005-11-23 08:31:09 UTC

Dammit, don't reopen bugs, especially if  you are clueless.

glibc provides two strerror_r definitions.  Just pick the stupid POSIX
definition if you must.

There will be no change.  Period.

Comment 4 Stefan Puiu 2005-11-23 09:15:46 UTC

Funny thing you insist keeping a broken design, breaking compatibility (check
out the autoconf wizardry required to be portable about strerror right now), all
that for some stupid "extra info", and still call *me* clueless. 

Have a nice day.

Comment 5 Stefan Puiu 2005-11-23 09:25:59 UTC

Oh, and about "picking the stupid definition", I specifically pointed you to a
post on the autoconf mailing list. Here's a quote: 


"You would be best served by using configure to learn how the default strerror_r
behaves and adapting your code to suit.

You don't want to force -D_XOPEN_SOURCE=600 on all systems because behavior when
the system does not support this level is undefined. In my experience, headers
on some systems fail miserably if you specify an _XOPEN_SOURCE value greater
than what they were designed to expect. Using -D_XOPEN_SOURCE=500 is reasonably
safe on most (but not all) systems.

Trying to force the headers to behave a particular way seems to be a lost cause.
After trying this approach for a number of months, I finally realized that
relying on default behavior worked best."

Comment 6 Florian Weimer 2016-05-17 18:23:06 UTC

The strerror_r mess with inconsistent definitions suggests we probably should bite the bullet and make strerror fully thread-safe instead (like other systems do).

Comment 7 Florian Weimer 2016-05-17 18:24:12 UTC

*** Bug 19694 has been marked as a duplicate of this bug. ***

Comment 8 Florian Weimer 2023-06-15 09:48:14 UTC

Fixed for 2.32 via:

commit 28aff047818eb1726394296d27b9c7885340bead
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu May 14 17:44:15 2020 -0300

    string: Implement strerror in terms of strerror_l
    
    If the thread is terminated then __libc_thread_freeres will free the
    storage via __glibc_tls_internal_free.
    
    It is only within the calling thread that this matters.  It makes
    strerror MT-safe.
    
    Checked on x86-64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu,
    and s390x-linux-gnu.
    
    Tested-by: Carlos O'Donell <carlos@redhat.com>
    Reviewed-by: Carlos O'Donell <carlos@redhat.com>

Comment 9 Bruno Haible 2023-11-01 01:59:48 UTC

Note that this return convention (returning a pointer to a per-thread buffer) has a pitfall:
If the returned pointer ever gets passed to a different thread, value corruption will occur, that is hard to detect and to debug. (Because while the second thread is storing, printing, or logging the value, the first thread may write different contents into it.)

Comment 10 Florian Weimer 2023-12-12 11:40:40 UTC

(In reply to Bruno Haible from comment #9)
> Note that this return convention (returning a pointer to a per-thread
> buffer) has a pitfall:
> If the returned pointer ever gets passed to a different thread, value
> corruption will occur, that is hard to detect and to debug. (Because while
> the second thread is storing, printing, or logging the value, the first
> thread may write different contents into it.)

This has been clarified in POSIX, and I believe C23. However, it only applies to the case where an unknown error code is used, so that's why I think a separate symbol version wasn't necessary.

Comment 11 Bruno Haible 2023-12-12 12:21:18 UTC

(In reply to Florian Weimer from comment #10)
> This has been clarified in POSIX, and I believe C23.

True, both POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror.html> and ISO C 23 § 7.26.6.3 contain wording that allows glibc's behaviour and should alert the programmer.

What I meant to state is that I would find it undesirable if glibc were to use this return convention (returning a pointer to a per-thread buffer) in more and more functions. Such value corruption cannot be detected by ASAN or valgrind (in the case of long-living threads); therefore the only possible help the programmer could get here is from static analysis tools.

> However, it only applies to the case where an unknown error code is used

Is a value corruption less severe because it appears less frequently? I would argue the opposite way: If it appears less frequently, there are less chances that it gets caught through a test suite and thus gets eliminated from an application.

Comment 12 Florian Weimer 2023-12-13 09:39:28 UTC

(In reply to Bruno Haible from comment #11)
> Is a value corruption less severe because it appears less frequently? I
> would argue the opposite way: If it appears less frequently, there are less
> chances that it gets caught through a test suite and thus gets eliminated
> from an application.

It's a philosophical question. It's also not something we can fix with symbol versions anymore because the release went out with an unversioned change.