This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/17980] New: sem_* interoperability between 32bit and 64bit binaries


https://sourceware.org/bugzilla/show_bug.cgi?id=17980

            Bug ID: 17980
           Summary: sem_* interoperability between 32bit and 64bit
                    binaries
           Product: glibc
           Version: 2.21
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
          Assignee: unassigned at sourceware dot org
          Reporter: glibc at schuster dot re
                CC: drepper.fsp at gmail dot com

Created attachment 8128
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8128&action=edit
A minimal working example illustrating the issue

Description:
Since my distributions update from glibc 2.20 to 2.21 a mixed 32bit/64bit
application using posix semaphores (sem_open, sem_post, ...) started aborting
in the code of sem_post.
Mixed in this case means, that there is a 32bit application setting up a named
semaphore using sem_open and waiting for input of the 64bit-application, which
calls sem_post upon completion of a certain task.

stracing the executable reveals the following futex-call being the reason
(glibc calls abort() after the failed syscall):
     futex(0x7f5281f83000, 0xf7784081 /* FUTEX_??? */, 1) = -1 ENOSYS (Function
not implemented)
Unfortunately the code for this binary (as well as the build-chain) is a bit
convoluted and illsuited for debugging (if it is necessary, I can provide it
though, as well as building instructions).

A bit of debugging reveals that this issue persists with other 32bit/64bit
program-combinations, so I produced a minimal working example attached below.
It does not crash as the original binary, but simply fails to work as expected:

Steps to Reproduce:
* Unpack the tar (tar xf minimal-working-example.tar.gz)
* Build all binarys (make all)
* execute ./mwe and ./mwec in two terminals for the reference run
* One will receive an interleaved execution: mwe sleeps/waits once for each
call of mwec-unlock
* execute ./mwe_cleanup to reset/delete the used named semaphore
* execute ./mwe-32 in one terminal and ./mwec in another one
* The semaphore-unlock is not recogniced by ./mwe-32

Looking into the sources, my best bet is that the performance optimization for
"new_sem" in 2.21 changed the struct-layout of new_sem in
"sysdeps/nptl/internaltypes.h" for 64bit binaries (as AMD64 provides
"__HAVE_64B_ATOMICS"), but the 32bit version does not provide those, so the
struct-layouts differ.
In the case of named semaphores those structs seem to be written to a file,
which is then mmapped into the processes address-spaces by sem_open. As those
struct-layouts now differ, 32bit and 64bit-binaries are not interoperable
anymore.
I've produced a patch that seems to fix the issue for me by reordering
struct-members (see file 0001-Amortize-layout-of-struct-new_sem.patch
attached). Please note that this patch is more a "proof of concept", as it does
only take AMD64 into account and most likely will not play nicely with
big-endian architectures...

The problem persists with:
* glibc-2.21
* git-master (latest commit is 3f293d614c9e641a0d96d347df5c1c5ee687762f)

Note:
Bugzilla seems to only allow one attachement per bugreport, so I will attach
the patch with a comment below

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]