Because of some confusing information in the strerror() manual page, I decided to check the glibc-2.3.6 source and see for myself, and was dissapointed to see how strerror_r() (sysdeps/generic/_strerror.c) and strerror() (string/strerror.c) are implemented. The only reason strerror() is not thread safe is because it insists keeping a global buffer around, which it mallocs when needed and then fills it in when it gets called with an invalid errno, with a string like "Unknown error: <errno>". I would say this is a useless complication, buys you nothing (you can examine the value without strerror() supplying it back to you in a buffer), and forces people to use the non-POSIX strerror_r() (which returns a char*), which only uses the supplied buffer when the aforementioned situation occurs, further confusing people. I suggest changing strerror so that it simply returns "Unknown error" when an invalid errno is supplied, and deprecating strerror_r. I can provide a patch if needed. Note: both Solaris and HP-UX have thread-safe strerror() functions, so this will ease the work of people developing applications for multiple Unices.
strerror_r is the POSIX function to use. Everything else is incompatible in multi-threaded environment. Additionally, the extra info provided by strerror for unknown errors is crucial in some situations, it is completely unacceptable to return a generic string. There will be no change.
Well, the glibc info file says: "This function `strerror_r' is a GNU extension" So either the documentation is wrong, or the function isn't conforming to POSIX - I don't have a copy of the standard to check. This document: http://www.opengroup.org/rtforum/uploads/40/7319/POSIX_and_Linux_Application_Compatibility_v0.92_released_22_April_05.pdf says that POSIX strerror_r() returns an int, the GNU one returns a char*; OTOH, as I said before, other platforms have thread-safe strerror() (Solaris 8 has it, HP-UX 11i has it), and some (Solaris 8, for example) don't even have strerror_r() *at all*. Thus, using strerror_r() breaks portability anyway. Not to mention that if you google for "strerror_r on linux" you'll find posts by users that were confused by the fact that the function didn't even use the supplied buffer (check out http://lists.gnu.org/archive/html/autoconf/2004-12/msg00079.html or http://www.openldap.org/lists/openldap-bugs/200404/msg00191.html). In my opinion, requiring a buffer argument that you *might* use in some weird circumstances is bad design. The "extra info" you talk about can be provided by simply printing out the errno value. If any applications rely on errnos outside the normal range, they should do this anyway.
Dammit, don't reopen bugs, especially if you are clueless. glibc provides two strerror_r definitions. Just pick the stupid POSIX definition if you must. There will be no change. Period.
Funny thing you insist keeping a broken design, breaking compatibility (check out the autoconf wizardry required to be portable about strerror right now), all that for some stupid "extra info", and still call *me* clueless. Have a nice day.
Oh, and about "picking the stupid definition", I specifically pointed you to a post on the autoconf mailing list. Here's a quote: "You would be best served by using configure to learn how the default strerror_r behaves and adapting your code to suit. You don't want to force -D_XOPEN_SOURCE=600 on all systems because behavior when the system does not support this level is undefined. In my experience, headers on some systems fail miserably if you specify an _XOPEN_SOURCE value greater than what they were designed to expect. Using -D_XOPEN_SOURCE=500 is reasonably safe on most (but not all) systems. Trying to force the headers to behave a particular way seems to be a lost cause. After trying this approach for a number of months, I finally realized that relying on default behavior worked best."
The strerror_r mess with inconsistent definitions suggests we probably should bite the bullet and make strerror fully thread-safe instead (like other systems do).
*** Bug 19694 has been marked as a duplicate of this bug. ***
Fixed for 2.32 via: commit 28aff047818eb1726394296d27b9c7885340bead Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Thu May 14 17:44:15 2020 -0300 string: Implement strerror in terms of strerror_l If the thread is terminated then __libc_thread_freeres will free the storage via __glibc_tls_internal_free. It is only within the calling thread that this matters. It makes strerror MT-safe. Checked on x86-64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu, and s390x-linux-gnu. Tested-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Note that this return convention (returning a pointer to a per-thread buffer) has a pitfall: If the returned pointer ever gets passed to a different thread, value corruption will occur, that is hard to detect and to debug. (Because while the second thread is storing, printing, or logging the value, the first thread may write different contents into it.)
(In reply to Bruno Haible from comment #9) > Note that this return convention (returning a pointer to a per-thread > buffer) has a pitfall: > If the returned pointer ever gets passed to a different thread, value > corruption will occur, that is hard to detect and to debug. (Because while > the second thread is storing, printing, or logging the value, the first > thread may write different contents into it.) This has been clarified in POSIX, and I believe C23. However, it only applies to the case where an unknown error code is used, so that's why I think a separate symbol version wasn't necessary.
(In reply to Florian Weimer from comment #10) > This has been clarified in POSIX, and I believe C23. True, both POSIX <https://pubs.opengroup.org/onlinepubs/9699919799/functions/strerror.html> and ISO C 23 ยง 7.26.6.3 contain wording that allows glibc's behaviour and should alert the programmer. What I meant to state is that I would find it undesirable if glibc were to use this return convention (returning a pointer to a per-thread buffer) in more and more functions. Such value corruption cannot be detected by ASAN or valgrind (in the case of long-living threads); therefore the only possible help the programmer could get here is from static analysis tools. > However, it only applies to the case where an unknown error code is used Is a value corruption less severe because it appears less frequently? I would argue the opposite way: If it appears less frequently, there are less chances that it gets caught through a test suite and thus gets eliminated from an application.
(In reply to Bruno Haible from comment #11) > Is a value corruption less severe because it appears less frequently? I > would argue the opposite way: If it appears less frequently, there are less > chances that it gets caught through a test suite and thus gets eliminated > from an application. It's a philosophical question. It's also not something we can fix with symbol versions anymore because the release went out with an unversioned change.