Bug 31394 - clone on sparc might fail with -EFAULT for no valid reason
Summary: clone on sparc might fail with -EFAULT for no valid reason
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.38
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-16 18:46 UTC by Michael Karcher
Modified: 2024-02-23 12:03 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
issue reproducing program. (1.31 KB, text/plain)
2024-02-16 18:46 UTC, Michael Karcher
Details
patch to mitigate the kernel issue (229 bytes, patch)
2024-02-16 18:52 UTC, Michael Karcher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Karcher 2024-02-16 18:46:39 UTC
Created attachment 15373 [details]
issue reproducing program.

There seems to be a bug in the sparc and sparc64 linux kernel that results in spurious -EFAULT errors from clone if %sp is referring to a stack address that is not (yet) part of the program stack, as the stack is laziliy allocated.

I wrote a reproducer program to demonstrate the the presence of the issue, see the attached file "more_clone_attack.c". As the reproducing script assumes the stack bias of 0x7ff, it likely won't work as is on sparc32.
Comment 1 Michael Karcher 2024-02-16 18:52:01 UTC
Created attachment 15374 [details]
patch to mitigate the kernel issue

The issue has been obversed on a very wide range of kernel and glibc versions, e.g. on Debian wheezy (Linux 3.2.0 / glibc 2.13), but also on current gentoo machines (Linux 6.1 / glibc 2.38-r9).

I developed a patch that causes the stack to be in a state that makes the system call clone work reliably, by first invoking flushw from userspace before entering the kernel. I believe the underlying issue is the way the kernel handles page faults during flushw ("The user stack is bolixed"), so I chose this way of pre-faulting the required pages.
Comment 2 Adhemerval Zanella 2024-02-23 11:54:20 UTC
The explanation sounds reasonable, but I would like to get some confirmation from kernel developers that this is the issue before applying this workaround on glibc.
Comment 3 John Paul Adrian Glaubitz 2024-02-23 12:03:56 UTC
(In reply to Adhemerval Zanella from comment #2)
> The explanation sounds reasonable, but I would like to get some confirmation
> from kernel developers that this is the issue before applying this
> workaround on glibc.

I have tried to get feedback from David Miller and developers from Oracle, but so far without any success.