Created attachment 8454 [details] Sample program illustrating the issue Hi, there is a problem in generic syscall implementation, which breaks futexes, which break pthread_* functions using them. The problem is in sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h file in the macro #define ARGIFY(X) ((long long) (__typeof__ ((X) - (X))) (X)) which is then used to load syscall arguments into registers. I assume its purpose is to avoid sign extension of an argument if it's of unsigned type, which seems to be wrong. The CPU always does sign extensions while it's loading 32bit integers and this is expected by Linux and GCC. The problem arises, if an argument passed this way and a loaded value are compared. Even if they are the same, comparison fails because one of them is sign extended and the second one isn't. This can be fixed by replacing ARGIFY with a simple cast to long, however it wouldn't work for a syscall, which takes 64bit argument, but I do not know about a syscall, which would take such an argument on 32bit platform (e.g. llseek splits its arguments into two 32bit ones). Why this macro was introduced? See the attached code, which illustrates the problem. The issue doesn't arise if futex is called trough syscall().
On Mon, 27 Jul 2015, oss at malat dot biz wrote: > Hi, > there is a problem in generic syscall implementation, which breaks futexes, > which break pthread_* functions using them. What glibc and Linux kernel versions did you test with? > The problem is in > sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h > file in the macro > #define ARGIFY(X) ((long long) (__typeof__ ((X) - (X))) (X)) > which is then used to load syscall arguments into registers. I assume its > purpose is to avoid sign extension of an argument if it's of unsigned type, No, its purpose is to avoid warnings for converting 32-bit pointers to 64-bit integers - which means that 32-bit pointers have to be converted to 32-bit integers before being converted to 64-bit integers. See <https://sourceware.org/ml/libc-ports/2007-06/msg00003.html>. > which seems to be wrong. The CPU always does sign extensions while it's loading > 32bit integers and this is expected by Linux and GCC. The problem arises, if an Linux (versions since SYSCALL_WRAPPERS, which I think includes all 2.6.29 and later versions) should always do that sign extension on syscall entry to avoid a security issue (CVE-2009-0029, see bug 4459). The minimum kernel supported by current glibc is 2.6.32 (likely to move to 3.2 in the not too distant future when the 2.6.32 kernel series ceases to be maintained). > This can be fixed by replacing ARGIFY with a simple cast to long, however it > wouldn't work for a syscall, which takes 64bit argument, but I do not know > about a syscall, which would take such an argument on 32bit platform (e.g. n32 syscalls take arguments in the same way as n32 userspace functions, in general (that is, 64-bit arguments go in single registers). Look at the various *.c files in sysdeps/unix/sysv/linux/mips/mips64/n32/ for code using the syscall macros in such a way that requires them to work with 64-bit integer arguments.
(In reply to joseph@codesourcery.com from comment #1) > On Mon, 27 Jul 2015, oss at malat dot biz wrote: > > > Hi, > > there is a problem in generic syscall implementation, which breaks > > futexes, which break pthread_* functions using them. > > What glibc and Linux kernel versions did you test with? Linux 2.6.32 and glibc-2.9 (with glibc-ports-2.9), then with 2.16 and then I've checked the code trough gitweb and it's the same. > > The problem is in > > sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h > > file in the macro > > #define ARGIFY(X) ((long long) (__typeof__ ((X) - (X))) (X)) > > which is then used to load syscall arguments into registers. I assume its > > purpose is to avoid sign extension of an argument if it's of unsigned > > type, > > No, its purpose is to avoid warnings for converting 32-bit pointers to > 64-bit integers - which means that 32-bit pointers have to be converted to > 32-bit integers before being converted to 64-bit integers. See > <https://sourceware.org/ml/libc-ports/2007-06/msg00003.html>. That's what I was looking for. The old code seems wrong too, it prevents sign extension. > > which seems to be wrong. The CPU always does sign extensions while it's > > loading 32bit integers and this is expected by Linux and GCC. The problem > > arises, if an > > Linux (versions since SYSCALL_WRAPPERS, which I think includes all 2.6.29 > and later versions) should always do that sign extension on syscall entry > to avoid a security issue (CVE-2009-0029, see bug 4459). The minimum > kernel supported by current glibc is 2.6.32 (likely to move to 3.2 in the > not too distant future when the 2.6.32 kernel series ceases to be > maintained). > > > This can be fixed by replacing ARGIFY with a simple cast to long, however > > it wouldn't work for a syscall, which takes 64bit argument, but I do not > > know about a syscall, which would take such an argument on 32bit platform > n32 syscalls take arguments in the same way as n32 userspace functions, in > general (that is, 64-bit arguments go in single registers). Look at the > various *.c files in sysdeps/unix/sysv/linux/mips/mips64/n32/ for code > using the syscall macros in such a way that requires them to work with > 64-bit integer arguments. In that case the fix is not going to be so simple (I will check the code), but I still think ARGIFY is wrong: - Assume the futex value is 0x8000 0000. - Because it's unsigned, ARGIFY expands to ((long long) (unsigned) (0x8000 0000)) so the content of the register is: 0x0000 0000 8000 0000 - Then the futex syscall is made and the kernel loads the value for comparison from the provided memory address into an unsigned variable (using load word instruction) - The content of register is 0xFFFF FFFF 8000 0000 because sign extension has been done - Comparison fails (because whole registers are compared even when unsigned integers are used) and the process doesn't go to sleep. If this happens in e.g. pthread_cond_wail, the process ends up livelocked.
On Mon, 27 Jul 2015, oss at malat dot biz wrote: > In that case the fix is not going to be so simple (I will check the code), but > I still think ARGIFY is wrong: > - Assume the futex value is 0x8000 0000. > - Because it's unsigned, ARGIFY expands to > ((long long) (unsigned) (0x8000 0000)) > so the content of the register is: > 0x0000 0000 8000 0000 > - Then the futex syscall is made and the kernel loads the value for > comparison from the provided memory address into an unsigned variable > (using load word instruction) But the code in the kernel that handles syscall arguments should have sign-extended the register value before it ever reached the C implementation of futex in the kernel with a 32-bit argument type, so it shouldn't matter what the high word of the register passed to the kernel was. If the syscall wrapper code in the kernel doesn't do so, then hostile userspace code can cause security issues by passing values to the kernel that result in C code having register values for arguments that don't conform to the ABI and so undefined behavior in that C code. And because of the privilege boundary involved, the kernel can never rely on userspace code sign-extending arguments correctly. So why isn't the kernel ensuring that the "int" value conforms to the ABI for int values (by sign-extending) before it gets to a C function in the kernel with an int argument?
(In reply to joseph@codesourcery.com from comment #3) > On Mon, 27 Jul 2015, oss at malat dot biz wrote: > > > In that case the fix is not going to be so simple (I will check the code), but > > I still think ARGIFY is wrong: > > - Assume the futex value is 0x8000 0000. > > - Because it's unsigned, ARGIFY expands to > > ((long long) (unsigned) (0x8000 0000)) > > so the content of the register is: > > 0x0000 0000 8000 0000 > > - Then the futex syscall is made and the kernel loads the value for > > comparison from the provided memory address into an unsigned variable > > (using load word instruction) > > But the code in the kernel that handles syscall arguments should have > sign-extended the register value before it ever reached the C > implementation of futex in the kernel with a 32-bit argument type, so it > shouldn't matter what the high word of the register passed to the kernel > was. If the syscall wrapper code in the kernel doesn't do so, then > hostile userspace code can cause security issues by passing values to the > kernel that result in C code having register values for arguments that > don't conform to the ABI and so undefined behavior in that C code. And > because of the privilege boundary involved, the kernel can never rely on > userspace code sign-extending arguments correctly. > > So why isn't the kernel ensuring that the "int" value conforms to the ABI > for int values (by sign-extending) before it gets to a C function in the > kernel with an int argument? In this case it can't cause any security issues as the value is used only to decide if the caller shall sleep on the futex. As the value is not stored according to the ABI, it's not equal to any valid value and the process is not put to sleep. I agree returning EINVAL would be better in such a case, but the bug would have to be fixed in glibc anyway. After checking implementation of these new syscalls, which were not present in my old version, I think the proper fix would be to change futex value to signed int type.
On Mon, 27 Jul 2015, oss at malat dot biz wrote: > > So why isn't the kernel ensuring that the "int" value conforms to the ABI > > for int values (by sign-extending) before it gets to a C function in the > > kernel with an int argument? > In this case it can't cause any security issues as the value is used only to The MIPS architecture specification says there is UNPREDICTABLE behavior if a non-sign-extended value is used with a 32-bit instruction. The compiler may generate such instructions to work with the int value. While such architecturally UNPREDICTABLE behavior can't itself cross privilege boundaries, that only means it doesn't cause a security issue if such an instruction is executed in userspace - and the problem code is executing in the kernel. So you're relying on particular combinations of (compiler code generation, behavior of particular processors on such inputs) to avoid security issues, which does not seem a good idea. > After checking implementation of these new syscalls, which were not present > in my old version, I think the proper fix would be to change futex value to > signed int type. Are you saying that in fact the bug was fixed in the kernel, but later than 2.6.32? Because the conclusion in the kernel community was that the kernel was the right place to fix this (by always sign-extending affected syscall arguments on kernel entry). And I think the conclusion from more recent discussions in the glibc context of a generic API for internal futex uses in glibc is that unsigned int is the preferred type for futexes within glibc.
(In reply to joseph@codesourcery.com from comment #5) > On Mon, 27 Jul 2015, oss at malat dot biz wrote: > > After checking implementation of these new syscalls, which were not present > > in my old version, I think the proper fix would be to change futex value to > > signed int type. > > Are you saying that in fact the bug was fixed in the kernel, but later > than 2.6.32? No, I meant my original proposal of changing ARGIFY is not good, because it will not work with ftruncate64, lockf64 etc., which are not present in my old version. > Because the conclusion in the kernel community was that the > kernel was the right place to fix this (by always sign-extending affected > syscall arguments on kernel entry). OK, I will try with a never kernel. Nevertheless I still think this is also a bug on glibc side as it violates the ABI. > And I think the conclusion from more recent discussions in the glibc context > of a generic API for internal futex uses in glibc is that unsigned int is > the preferred type for futexes within glibc. I've got this idea from the prototype specified in futex(2), which says val argument is of the type int, but on the other hand using unsigned is aligned with the kernel, as the value is unsigned there too.
I've tracked down a change, which prevents the problem in the kernel: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/kernel/futex_compat.c?id=90caf58dad77a4021e9dade5d051756cea2a2cd7 When COMPAT_SYSCALL_DEFINE is used a wrapper is created, which casts the futex value to unsigned long and then to unsigned int, which forces gcc to emit sll instruction, which, according to the specification, can be used even when a register is not properly sign extended. I will report this to the kernel mailing list, as from current long term releases 2.6.32.67, 3.2.69 and 3.4.108 are affected. From my POV glibc shall be fixed too, as it violates the ABI.
On Tue, 28 Jul 2015, oss at malat dot biz wrote: > I will report this to the kernel mailing list, as from current long term > releases 2.6.32.67, 3.2.69 and 3.4.108 are affected. Thanks. As a potential security issue ("incomplete fix for CVE-2009-0029", I suppose it would be described) this clearly ought to be fixed in stable kernels. > From my POV glibc shall be fixed too, as it violates the ABI. Well, the kernel discussion in 2007 said the ABI should be for the kernel to do the sign extension, and the approach taken with syscall wrappers in the kernel in 2009 confirmed that as the intended ABI. That doesn't rule out working around the kernel bug in glibc while some affected kernel versions are still supported, but does make clear that it's the kernel that's buggy.
What is the status of this bug? Can it be closed?
I don't know the status in the kernel, but presumably we'd keep it open until the buggy kernel versions are no longer supported or we've decided not to work around the kernel bug.
Marking this as resolved, as all affected kernels are after their end of life.