23960 – [2.28 Regression]: New getdents{64} implementation breaks qemu-user

Bug 23960 - [2.28 Regression]: New getdents{64} implementation breaks qemu-user

Summary: [2.28 Regression]: New getdents{64} implementation breaks qemu-user

Status:	UNCONFIRMED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.28

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Duplicates (1):	24014 (view as bug list)
Depends on:
Blocks:

Reported:	2018-12-07 13:40 UTC by John Paul Adrian Glaubitz
Modified:	2024-01-05 08:33 UTC (History)
CC List:	17 users (show)

See Also:	31186 31212
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Proposed patch (817 bytes, patch) 2018-12-08 14:32 UTC, Jessica Clarke	Details \| Diff
Patch to bypass the return EOVERFLOW condition (356 bytes, patch) 2019-05-28 13:42 UTC, dflo	Details \| Diff
getdents emulation for glibc (233 bytes, patch) 2020-01-15 05:49 UTC, Aladjev Andrew	Details \| Diff
getdents emulation for qemu (3.81 KB, patch) 2020-01-15 05:49 UTC, Aladjev Andrew	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John Paul Adrian Glaubitz 2018-12-07 13:40:14 UTC

On m68k-*-*, upgrading glibc from 2.27 to 2.28 breaks dash in a very obscure way:

(sid-m68k-sbuild)root@epyc:/build/hp2xx-j13jUj/hp2xx-3.4.4# dash
\u@\h:\w$ cp -a /build/hp2xx-j13jUj/hp2xx-3.4.4/hp-tests/* /build/hp2xx-j13jUj/hp2xx-3.4.4/debian/hp2xx/usr/share/doc/hp2xx/hp-tests/
/bin/cp: cannot stat '/build/hp2xx-j13jUj/hp2xx-3.4.4/hp-tests/*': No such file or directory
\u@\h:\w$
(sid-m68k-sbuild)root@epyc:/build/hp2xx-j13jUj/hp2xx-3.4.4# dpkg -i /tmp/libc6_2.27-8_m68k.deb /tmp/libc-bin_2.27-8_m68k.deb /tmp/libc-dev-bin_2.27-8_m68k.deb /tmp/multiarch-support_2.27-8_m68k.deb /tmp/libc6-dev_2.27-8_m68k.deb
dpkg: warning: downgrading libc6:m68k from 2.28-2 to 2.27-8
(Reading database ... 47706 files and directories currently installed.)
Preparing to unpack /tmp/libc6_2.27-8_m68k.deb ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
Unpacking libc6:m68k (2.27-8) over (2.28-2) ...
dpkg: warning: downgrading libc-bin from 2.28-2 to 2.27-8
Preparing to unpack /tmp/libc-bin_2.27-8_m68k.deb ...
Unpacking libc-bin (2.27-8) over (2.28-2) ...
dpkg: warning: downgrading libc-dev-bin from 2.28-2 to 2.27-8
Preparing to unpack .../libc-dev-bin_2.27-8_m68k.deb ...
Unpacking libc-dev-bin (2.27-8) over (2.28-2) ...
dpkg: warning: downgrading multiarch-support from 2.28-2 to 2.27-8
Preparing to unpack .../multiarch-support_2.27-8_m68k.deb ...
Unpacking multiarch-support (2.27-8) over (2.28-2) ...
dpkg: warning: downgrading libc6-dev:m68k from 2.28-2 to 2.27-8
Preparing to unpack /tmp/libc6-dev_2.27-8_m68k.deb ...
Unpacking libc6-dev:m68k (2.27-8) over (2.28-2) ...
Setting up libc6:m68k (2.27-8) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
Setting up libc-bin (2.27-8) ...
Setting up libc-dev-bin (2.27-8) ...
Setting up multiarch-support (2.27-8) ...
Setting up libc6-dev:m68k (2.27-8) ...
Processing triggers for man-db (2.8.4-3) ...
Not building database; man-db/auto-update is not 'true'.
(sid-m68k-sbuild)root@epyc:/build/hp2xx-j13jUj/hp2xx-3.4.4# dash
\u@\h:\w$ cp -a /build/hp2xx-j13jUj/hp2xx-3.4.4/hp-tests/* /build/hp2xx-j13jUj/hp2xx-3.4.4/debian/hp2xx/usr/share/doc/hp2xx/hp-tests/
\u@\h:\w$

Since dash is the standard shell in Debian for /bin/sh, this makes packages fail to build among other issues.

Comment 1 John Paul Adrian Glaubitz 2018-12-07 13:49:38 UTC

It looks like shell expansion is broken:

\u@\h:\w$ ls -l old/
total 148
-rw-r--r-- 1 glaubitz fbedv 16188 Jun 21  2003 chardraw.c
-rw-r--r-- 1 glaubitz fbedv  7922 Jun 21  2003 fillpoly.c
-rw-r--r-- 1 glaubitz fbedv   948 Jun 21  2003 readme
-rw-r--r-- 1 glaubitz fbedv 31860 Jun 21  2003 to_atari.c
-rw-r--r-- 1 glaubitz fbedv 13093 Jun 21  2003 to_eps.c
-rw-r--r-- 1 glaubitz fbedv  6631 Jun 21  2003 to_mf.c
-rw-r--r-- 1 glaubitz fbedv  3261 Jun 21  2003 to_pbm.c
-rw-r--r-- 1 glaubitz fbedv 12854 Jun 21  2003 to_pcx.c
-rw-r--r-- 1 glaubitz fbedv 10657 Jun 21  2003 to_pdf.c
-rw-r--r-- 1 glaubitz fbedv  6697 Jun 21  2003 to_pm.c
-rw-r--r-- 1 glaubitz fbedv  9463 Jun 21  2003 to_x11a.c
-rw-r--r-- 1 glaubitz fbedv 10405 Jun 21  2003 to_x11.c
\u@\h:\w$ ls -l old/*
/bin/ls: cannot access 'old/*': No such file or directory
\u@\h:\w$

Comment 2 John Paul Adrian Glaubitz 2018-12-07 14:27:25 UTC

Not sure whether it's related, but with 2.28, "man" also regressed:

root@pacman:~# man man
man: waitpid failed: No child processes
root@pacman:~#

This can be reproduced on qemu-system-m68k but not on qemu-user-m68k where "man man" will show the manpage.

Comment 3 John Paul Adrian Glaubitz 2018-12-08 12:00:42 UTC

This is the commit which causes the problem:

b4a5d26d8835d972995f0a0a2f805a8845bafa0b is the first bad commit
commit b4a5d26d8835d972995f0a0a2f805a8845bafa0b
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Nov 2 11:04:18 2017 -0200

    linux: Consolidate sigaction implementation
    
    This patch consolidates all Linux sigaction implementations on the default
    sysdeps/unix/sysv/linux/sigaction.c.  The idea is remove redundant code
    and simplify new ports addition by following the current generic
    Linux User API (UAPI).
    
    The UAPI for new ports defines a generic extensible sigaction struct as:
    
      struct sigaction
      {
        __sighandler_t sa_handler;
        unsigned long sa_flags;
      #ifdef SA_RESTORER
        void (*sa_restorer) (void);
      #endif
        sigset_t sa_mask;
      };
    
    Where SA_RESTORER is just placed for compatibility reasons (news ports
    should not add it).  A similar definition is used on generic
    kernel_sigaction.h.
    
    The user exported sigaction definition is not changed, so for most
    architectures it requires an adjustment to kernel expected one for the
    syscall.
    
    The main changes are:
    
      - All architectures now define and use a kernel_sigaction struct meant
        for the syscall, even for the architectures where the user sigaction
        has the same layout of the kernel expected one (s390-64 and ia64).
        Although it requires more work for these architectures, it simplifies
        the generic implementation. Also, sigaction is hardly a hotspot where
        micro optimization would play an important role.
    
      - The generic kernel_sigaction definition is now aligned with expected
        UAPI one for newer ports, where SA_RESTORER and sa_restorer are not
        expected to be defined.  This means adding kernel_sigaction for
        current architectures that does define it (m68k, nios2, powerpc, s390,
        sh, sparc, and tile) and which rely on previous generic definition.
    
      - Remove old MIPS usage of sa_restorer.  This was removed since 2.6.27
        (2957c9e61ee9c - "[MIPS] IRIX: Goodbye and thanks for all the fish").
    
      - The remaining arch-specific sigaction.c are to handle ABI idiosyncrasies
        (like SPARC kernel ABI for rt_sigaction that requires an additional
        stub argument).
    
    So for new ports the generic implementation should work if its uses
    Linux UAPI.  If SA_RESTORER is still required (due some architecture
    limitation), it should define its own kernel_sigaction.h, define it and
    include generic header (assuming it still uses the default generic kernel
    layout).
    
    Checked on x86_64-linux-gnu, i686-linux-gnu, arm-linux-gnueabihf,
    aarch64-linux-gnu, sparc64-linux-gnu, sparcv9-linux-gnu, powerpc-linux-gnu,
    powerpc64-linux-gnu, ia64-linux-gnu and alpha-linux-gnu.  I also checked the
    build on all remaining affected ABIs.
    
            * sysdeps/unix/sysv/linux/aarch64/sigaction.c: Use default Linux version
            as base implementation.
            * sysdeps/unix/sysv/linux/arm/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/i386/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/sparc/sparc32/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/sparc/sparc64/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/x86_64/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/alpha/kernel_sigaction.h: Add include guards,
            remove unrequired definitions and update comments.
            * sysdeps/unix/sysv/linux/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/mips/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/ia64/kernel_sigaction.h: New file.
            * sysdeps/unix/sysv/linux/m68k/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/nios2/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/powerpc/kernel_sigaction: Likewise.
            * sysdeps/unix/sysv/linux/s390/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/sh/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/sparc/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/tile/kernel_sigaction.h: Likewise.
            * sysdeps/unix/sysv/linux/ia64/sigaction.c: Remove file.
            * sysdeps/unix/sysv/linux/mips/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/s390/s390-64/sigaction.c: Likewise.
            * sysdeps/unix/sysv/linux/sigaction.c: Add STUB, SET_SA_RESTORER,
            and RESET_SA_RESTORER hooks.

:100644 100644 17506be6a9897f7dd3111752a9de0e0b060cec75 e7abee53e98c0b2e1362cb4913e2f4211b3747fa M      ChangeLog
:040000 040000 fa5f55cc57a1b1ec9a9bd3c5dc32986834e0af78 29aefea5f3bf96c2a78552305a9ee8b862619c92 M      sysdeps

Comment 4 John Paul Adrian Glaubitz 2018-12-08 13:09:15 UTC

One observation is that sigset_t is an "unsigned long" type on m68k:

> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/m68k/include/uapi/asm/signal.h

While on the glibc side, it is defined through a struct:

> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/bits/types/__sigset_t.h;h=e2f18acf30f43496567b1511456089dcd1798425;hb=HEAD

Comment 5 Jessica Clarke 2018-12-08 14:07:47 UTC

(In reply to John Paul Adrian Glaubitz from comment #4)
> One observation is that sigset_t is an "unsigned long" type on m68k:
> 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/m68k/include/uapi/asm/signal.h

That's the old definition (see e.g ARM that also has that), and is only user-facing (surrounded by `#ifndef _KERNEL_`).

Comment 6 Jessica Clarke 2018-12-08 14:32:46 UTC

Created attachment 11441 [details]
Proposed patch

Please try this completely untested patch.

Comment 7 John Paul Adrian Glaubitz 2018-12-08 15:25:53 UTC

(In reply to James Clarke from comment #6)
> Created attachment 11441 [details]
> Proposed patch
> 
> Please try this completely untested patch.

Yes, this patch fixes the issue with "man":

root@pacman:~# LD_PRELOAD=/root/libc.so /root/ld.so /usr/bin/man man
MAN(1)                        Manual pager utils                        MAN(1)

NAME
       man - an interface to the on-line reference manuals

SYNOPSIS
(...)
root@pacman:~#

It works normally again.

The issue with dash still persists:

\u@\h:\w$ ls -l /bin/*
/bin/ls: cannot access '/bin/*': No such file or directory
\u@\h:\w$

But that affects qemu-user only.

Comment 8 John Paul Adrian Glaubitz 2018-12-09 10:07:12 UTC

Splitting the bug reports now since those are two different issues.

The shell expansion issue with dash affects all of qemu-user, not just m68k and was introduced by:

298d0e3129c0b5137f4989275b13fe30d0733c4d is the first bad commit
commit 298d0e3129c0b5137f4989275b13fe30d0733c4d
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Wed Feb 28 15:37:17 2018 -0300

    Consolidate Linux getdents{64} implementation
    
    This patch consolidates Linux getdents{64} implementation on just
    the default sysdeps/unix/sysv/linux/getdents{64}{_r}.c ones.
    
    Although this symbol is used only internally, the non-LFS version
    still need to be build due the non-LFS getdirentries which requires
    its semantic.
    
    The non-LFS default implementation now uses the wordsize-32 as base
    which uses getdents64 syscall plus adjustment for overflow (it allows
    to use the same code for architectures that does not support non-LFS
    getdents syscall).  It has two main differences to wordsize-32 one:
    
      - DIRENT_SET_DP_INO is added to handle alpha requirement to zero
        the padding.
    
      - alloca is removed by allocating a bounded temporary buffer (it
        increases stack usage by roughly 276 bytes).
    
    The default implementation handle the Linux requirements:
    
      * getdents is only built for _DIRENT_MATCHES_DIRENT64 being 0.
    
      * getdents64 is always built and aliased to getdents for ABIs
        that define _DIRENT_MATCHES_DIRENT64 to 1.
    
      * A compat symbol is added for getdents64 for ABI that used to
        export the old non-LFS version.
    
    Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu,
    sparcv9-linux-gnu, sparc64-linux-gnu, powerpc-linux-gnu, and
    powerpc64le-linux-gnu.
    
            * sysdeps/unix/sysv/linux/alpha/getdents.c: Add comments with alpha
            requirements.
             (_DIRENT_MATCHES_DIRENT64): Undef
            * sysdeps/unix/sysv/linux/alpha/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/arm/getdents64.c: Remove file.
            * sysdeps/unix/sysv/linux/generic/getdents.c: Likewise.
            * sysdeps/unix/sysv/linux/generic/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/generic/wordsize-32/getdents.c: Likewise.
            * sysdeps/unix/sysv/linux/getdents.c: Simplify implementation by
            use getdents64 syscalls as base.
            * sysdeps/unix/sysv/linux/getdents64.c: Likewise and add compatibility
            symbol if required.
            * sysdeps/unix/sysv/linux/hppa/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/i386/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/m68k/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/powerpc/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/s390/s390-32/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/sparc/sparc32/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/wordsize-64/getdents.c: Likewise.
            * sysdeps/unix/sysv/linux/wordsize-64/getdents64.c: Likewise.
            * sysdeps/unix/sysv/linux/sparc/sparc64/get_clockfreq.c
            (__get_clockfreq_via_proc_openprom): Use __getdents64.
            * sysdeps/unix/sysv/linux/mips/mips64/getdents64.c: New file.

:100644 100644 5cc4dc9f635c36c136fbe1751c7f0d96e8b78dcb 93c82fef994b4a4c255ef91ec5016e79e4410f31 M      ChangeLog
:040000 040000 49467271d85e92ba455bf6b387fbfa0ffc58ca6c 5d449b9064aaa7f1c20c516a34af6b7b1084403f M      sysdeps

Comment 9 Florian Weimer 2018-12-09 12:27:24 UTC

(In reply to John Paul Adrian Glaubitz from comment #8)
> Splitting the bug reports now since those are two different issues.
> 
> The shell expansion issue with dash affects all of qemu-user, not just m68k
> and was introduced by […] commit 298d0e3129c0b5137f4989275b13fe30d0733c4d […]

We already have bug 23497 and this commit:

commit 690652882b499defb3d950dfeff8fe421d13cab5 (bug23497)
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Aug 10 10:20:13 2018 +0200

    Linux: Rewrite __old_getdents64 [BZ #23497]
    
    Commit 298d0e3129c0b5137f4989275b13fe30d0733c4d ("Consolidate Linux
    getdents{64} implementation") broke the implementation because it does
    not take into account struct offset differences.
    
    The new implementation is close to the old one, before the
    consolidation, but has been cleaned up slightly.

Does it fix the qemu-user issue?

Comment 10 John Paul Adrian Glaubitz 2018-12-09 12:57:45 UTC

(In reply to Florian Weimer from comment #9)
> commit 690652882b499defb3d950dfeff8fe421d13cab5 (bug23497)
> Author: Florian Weimer <fweimer@redhat.com>
> Date:   Fri Aug 10 10:20:13 2018 +0200
> 
>     Linux: Rewrite __old_getdents64 [BZ #23497]
>     
>     Commit 298d0e3129c0b5137f4989275b13fe30d0733c4d ("Consolidate Linux
>     getdents{64} implementation") broke the implementation because it does
>     not take into account struct offset differences.
>     
>     The new implementation is close to the old one, before the
>     consolidation, but has been cleaned up slightly.
> 
> Does it fix the qemu-user issue?

The Debian glibc package in unstable is already shipping this patch [1] and it is not enough to fix the problem:

(sid-powerpc-sbuild)root@nofan:/# dpkg -l libc6 |grep 2.28
ii  libc6:powerpc  2.28-2       powerpc      GNU C Library: Shared libraries
(sid-powerpc-sbuild)root@nofan:/# dash
\u@\h:\w$ echo *
*
\u@\h:\w$

> [1] https://sources.debian.org/src/glibc/2.28-2/debian/patches/git-updates.diff/#L364

Comment 11 Jessica Clarke 2018-12-09 19:44:26 UTC

(In reply to Florian Weimer from comment #9)
> (In reply to John Paul Adrian Glaubitz from comment #8)
> > Splitting the bug reports now since those are two different issues.
> > 
> > The shell expansion issue with dash affects all of qemu-user, not just m68k
> > and was introduced by […] commit 298d0e3129c0b5137f4989275b13fe30d0733c4d […]
> 
> We already have bug 23497 and this commit:
> 
> commit 690652882b499defb3d950dfeff8fe421d13cab5 (bug23497)
> Author: Florian Weimer <fweimer@redhat.com>
> Date:   Fri Aug 10 10:20:13 2018 +0200
> 
>     Linux: Rewrite __old_getdents64 [BZ #23497]
>     
>     Commit 298d0e3129c0b5137f4989275b13fe30d0733c4d ("Consolidate Linux
>     getdents{64} implementation") broke the implementation because it does
>     not take into account struct offset differences.
>     
>     The new implementation is close to the old one, before the
>     consolidation, but has been cleaned up slightly.
> 
> Does it fix the qemu-user issue?

That's about __old_getdents64, but this problem is about __getdents switching from the new-style getdents system call to the new-style getents64 system call. The latter clearly can overflow d_off on 32-bit systems (in practice, if you're using a 32-bit compat syscall layer, ext4 will give 32-bit values back[1], but that doesn't help you when using qemu-user), and whilst it doesn't with ext4 on i386, I imagine that with large enough directories it can still in theory overflow on i386 on other filesystems.

[1] ext4's d_off is not an offset, but an encoded "hash" (not really a hash, just packing bits; it uses the upper half on 64-bit ABI syscalls, but throws it away on 32-bit ABI syscalls - that's ABI, so getdents64 still gets a 32-bit hash that's then zero-extended)

Comment 12 Florian Weimer 2018-12-11 11:25:41 UTC

The 32-bit getdents64 system call on an x86-64 kernel returns a truncated d_off value to userspace on ext4, while the 64-bit version of the system call uses more bits for d_off, on the same directory.

As a result, the i386 glibc getdents implementation (which is based on the 32-bit getdents64 system call) does not observe any d_off values which would have to be truncated, and readdir is able to enumerate the entire directory.

This looks like an emulation (QEMU?) problem to me, and not like a glibc bug.

Comment 13 Jessica Clarke 2018-12-11 11:30:22 UTC

(In reply to Florian Weimer from comment #12)
> The 32-bit getdents64 system call on an x86-64 kernel returns a truncated
> d_off value to userspace on ext4, while the 64-bit version of the system
> call uses more bits for d_off, on the same directory.
> 
> As a result, the i386 glibc getdents implementation (which is based on the
> 32-bit getdents64 system call) does not observe any d_off values which would
> have to be truncated, and readdir is able to enumerate the entire directory.
> 
> This looks like an emulation (QEMU?) problem to me, and not like a glibc bug.

Well, in some sense yes, but this is a new regression in glibc-on-qemu-user, and given there's no reason why glibc needs to do it this way (it actually is simpler *not* to even for glibc it seems; working on a patch at the moment) it seems unhelpful to knowingly break this use-case.

Comment 14 Adhemerval Zanella 2018-12-11 12:34:57 UTC

(In reply to James Clarke from comment #13)
> (In reply to Florian Weimer from comment #12)
> > The 32-bit getdents64 system call on an x86-64 kernel returns a truncated
> > d_off value to userspace on ext4, while the 64-bit version of the system
> > call uses more bits for d_off, on the same directory.
> > 
> > As a result, the i386 glibc getdents implementation (which is based on the
> > 32-bit getdents64 system call) does not observe any d_off values which would
> > have to be truncated, and readdir is able to enumerate the entire directory.
> > 
> > This looks like an emulation (QEMU?) problem to me, and not like a glibc bug.
> 
> Well, in some sense yes, but this is a new regression in glibc-on-qemu-user,
> and given there's no reason why glibc needs to do it this way (it actually
> is simpler *not* to even for glibc it seems; working on a patch at the
> moment) it seems unhelpful to knowingly break this use-case.

I don't see this behavior either on a ARAnyM emulated system with vmlinuz-3.16.
Also it seems that for generic kernel UABI and newer ports kernel only provides __NR_getdents64 (nios2 is a 32-bit only that only support __NR_getdents64). How qemu-user handle such ABIs?

I do not oppose to handle it glibc by stepping a bit back and restore the old getdents implementation, which was rather complex since it required to handle three cases (__NR_getdents with user layout similar to kernel one, __NR_getdents with kernel layout different from user, and __NR_getdents64 only). However it seems qemu semantic is intrinsically broken for newer ABIs that does not provide getdents64 and I really think it should be fixed on qemu to provide a similar semantic as the kernel.

Comment 15 Jessica Clarke 2018-12-11 12:42:46 UTC

(In reply to Adhemerval Zanella from comment #14)
> (In reply to James Clarke from comment #13)
> > (In reply to Florian Weimer from comment #12)
> > > The 32-bit getdents64 system call on an x86-64 kernel returns a truncated
> > > d_off value to userspace on ext4, while the 64-bit version of the system
> > > call uses more bits for d_off, on the same directory.
> > > 
> > > As a result, the i386 glibc getdents implementation (which is based on the
> > > 32-bit getdents64 system call) does not observe any d_off values which would
> > > have to be truncated, and readdir is able to enumerate the entire directory.
> > > 
> > > This looks like an emulation (QEMU?) problem to me, and not like a glibc bug.
> > 
> > Well, in some sense yes, but this is a new regression in glibc-on-qemu-user,
> > and given there's no reason why glibc needs to do it this way (it actually
> > is simpler *not* to even for glibc it seems; working on a patch at the
> > moment) it seems unhelpful to knowingly break this use-case.
> 
> I don't see this behavior either on a ARAnyM emulated system with
> vmlinuz-3.16.
> Also it seems that for generic kernel UABI and newer ports kernel only
> provides __NR_getdents64 (nios2 is a 32-bit only that only support
> __NR_getdents64). How qemu-user handle such ABIs?
> 
> I do not oppose to handle it glibc by stepping a bit back and restore the
> old getdents implementation, which was rather complex since it required to
> handle three cases (__NR_getdents with user layout similar to kernel one,
> __NR_getdents with kernel layout different from user, and __NR_getdents64
> only). However it seems qemu semantic is intrinsically broken for newer ABIs
> that does not provide getdents64 and I really think it should be fixed on
> qemu to provide a similar semantic as the kernel.

You're right that this isn't a proper fix; QEMU currently just truncates d_off for __NR_getdents on 32-bit targets, but that's what it's always (FSVO always) done. At least with __NR_getdents on 32-bit targets it knows that userspace wants a 32-bit d_off, though. With __NR_getdents64, what should QEMU do? It could pass up the full 64-bit d_off, as one might reasonably expect (even though this differs from what the kernel actually does on ext4 specifically), or it could zero out the high half to work around this specific use of __NR_getdents64. At least by using __NR_getdents, glibc is communicating to the system call provider (be it QEMU or the kernel) whether it can handle a full 64-bit d_off.

The only real way glibc can use __NR_getdents64 and QEMU reliably emulate it is if the kernel is taught a way to force it to use the 32-bit ABI semantics for d_off, ie somehow pass FMODE_32BITHASH down to the kernel. But that doesn't currently exist, so in my opinion we should try to make the best of what we have and therefore glibc should use the "right" system call for the function.

Comment 16 Florian Weimer 2018-12-11 12:47:38 UTC

(In reply to James Clarke from comment #15)
> You're right that this isn't a proper fix; QEMU currently just truncates
> d_off for __NR_getdents on 32-bit targets, but that's what it's always (FSVO
> always) done. At least with __NR_getdents on 32-bit targets it knows that
> userspace wants a 32-bit d_off, though. With __NR_getdents64, what should
> QEMU do?

Check if the emulated userspace runs in long mode.  If it does, use the kernel-supplied 64-bit value.  If it does not, do whatever the kernel does to produce the d_off value.

There must be many system calls where something like this is necessary.

Comment 17 Jessica Clarke 2018-12-11 12:52:21 UTC

(In reply to Florian Weimer from comment #16)
> (In reply to James Clarke from comment #15)
> > You're right that this isn't a proper fix; QEMU currently just truncates
> > d_off for __NR_getdents on 32-bit targets, but that's what it's always (FSVO
> > always) done. At least with __NR_getdents on 32-bit targets it knows that
> > userspace wants a 32-bit d_off, though. With __NR_getdents64, what should
> > QEMU do?
> 
> Check if the emulated userspace runs in long mode.  If it does, use the
> kernel-supplied 64-bit value.  If it does not, do whatever the kernel does
> to produce the d_off value.
> 
> There must be many system calls where something like this is necessary.

Except this behaviour is filesystem-specific and *only* occurs on ext4. If you have a big enough directory in BTRFS you will get back a value bigger than 2^32 even on i386. For ext4 we'd need to convert it, but for BTRFS we'd need to report EOVERFLOW. We can't support both simultaneously without horrendous probing and hard-coding the encoding of d_off for ext4 which I would assume is meant to be an implementation detail that could change.

Comment 18 Adhemerval Zanella 2018-12-11 13:32:41 UTC

(In reply to James Clarke from comment #17)
> (In reply to Florian Weimer from comment #16)
> > (In reply to James Clarke from comment #15)
> > > You're right that this isn't a proper fix; QEMU currently just truncates
> > > d_off for __NR_getdents on 32-bit targets, but that's what it's always (FSVO
> > > always) done. At least with __NR_getdents on 32-bit targets it knows that
> > > userspace wants a 32-bit d_off, though. With __NR_getdents64, what should
> > > QEMU do?
> > 
> > Check if the emulated userspace runs in long mode.  If it does, use the
> > kernel-supplied 64-bit value.  If it does not, do whatever the kernel does
> > to produce the d_off value.
> > 
> > There must be many system calls where something like this is necessary.
> 
> Except this behaviour is filesystem-specific and *only* occurs on ext4. If
> you have a big enough directory in BTRFS you will get back a value bigger
> than 2^32 even on i386. For ext4 we'd need to convert it, but for BTRFS we'd
> need to report EOVERFLOW. We can't support both simultaneously without
> horrendous probing and hard-coding the encoding of d_off for ext4 which I
> would assume is meant to be an implementation detail that could change.

Sigh, alright I would really like to avoid this workaround on glibc but it looks like it seems the most straightforward issue.  

However, new 32-bits ABI which only provides getdents64 such as nios2 and possible newer one as sky, arc, and riscv32; will continue to be broken: glibc will still provide non-LFS interfaces for such cases (since off_t will defined as long int regardless). It will be fixed only when we really phase out non-LFS support for good, especially for newer ports.

Comment 19 Jessica Clarke 2018-12-11 13:37:07 UTC

(In reply to Adhemerval Zanella from comment #18)
> (In reply to James Clarke from comment #17)
> > (In reply to Florian Weimer from comment #16)
> > > (In reply to James Clarke from comment #15)
> > > > You're right that this isn't a proper fix; QEMU currently just truncates
> > > > d_off for __NR_getdents on 32-bit targets, but that's what it's always (FSVO
> > > > always) done. At least with __NR_getdents on 32-bit targets it knows that
> > > > userspace wants a 32-bit d_off, though. With __NR_getdents64, what should
> > > > QEMU do?
> > > 
> > > Check if the emulated userspace runs in long mode.  If it does, use the
> > > kernel-supplied 64-bit value.  If it does not, do whatever the kernel does
> > > to produce the d_off value.
> > > 
> > > There must be many system calls where something like this is necessary.
> > 
> > Except this behaviour is filesystem-specific and *only* occurs on ext4. If
> > you have a big enough directory in BTRFS you will get back a value bigger
> > than 2^32 even on i386. For ext4 we'd need to convert it, but for BTRFS we'd
> > need to report EOVERFLOW. We can't support both simultaneously without
> > horrendous probing and hard-coding the encoding of d_off for ext4 which I
> > would assume is meant to be an implementation detail that could change.
> 
> Sigh, alright I would really like to avoid this workaround on glibc but it
> looks like it seems the most straightforward issue.  
> 
> However, new 32-bits ABI which only provides getdents64 such as nios2 and
> possible newer one as sky, arc, and riscv32; will continue to be broken:
> glibc will still provide non-LFS interfaces for such cases (since off_t will
> defined as long int regardless). It will be fixed only when we really phase
> out non-LFS support for good, especially for newer ports.

Sounds to me like any port without a getdents system call should have only LFS (much like how new ports should have a 64-bit time_t forced upon them), using the kernel's lack of support as motivation. Are there any current 32-bit ports that only have getdents64?

Comment 20 Adhemerval Zanella 2018-12-11 13:44:06 UTC

(In reply to James Clarke from comment #19)
> (In reply to Adhemerval Zanella from comment #18)
> > (In reply to James Clarke from comment #17)
> > > (In reply to Florian Weimer from comment #16)
> > > > (In reply to James Clarke from comment #15)
> > > > > You're right that this isn't a proper fix; QEMU currently just truncates
> > > > > d_off for __NR_getdents on 32-bit targets, but that's what it's always (FSVO
> > > > > always) done. At least with __NR_getdents on 32-bit targets it knows that
> > > > > userspace wants a 32-bit d_off, though. With __NR_getdents64, what should
> > > > > QEMU do?
> > > > 
> > > > Check if the emulated userspace runs in long mode.  If it does, use the
> > > > kernel-supplied 64-bit value.  If it does not, do whatever the kernel does
> > > > to produce the d_off value.
> > > > 
> > > > There must be many system calls where something like this is necessary.
> > > 
> > > Except this behaviour is filesystem-specific and *only* occurs on ext4. If
> > > you have a big enough directory in BTRFS you will get back a value bigger
> > > than 2^32 even on i386. For ext4 we'd need to convert it, but for BTRFS we'd
> > > need to report EOVERFLOW. We can't support both simultaneously without
> > > horrendous probing and hard-coding the encoding of d_off for ext4 which I
> > > would assume is meant to be an implementation detail that could change.
> > 
> > Sigh, alright I would really like to avoid this workaround on glibc but it
> > looks like it seems the most straightforward issue.  
> > 
> > However, new 32-bits ABI which only provides getdents64 such as nios2 and
> > possible newer one as sky, arc, and riscv32; will continue to be broken:
> > glibc will still provide non-LFS interfaces for such cases (since off_t will
> > defined as long int regardless). It will be fixed only when we really phase
> > out non-LFS support for good, especially for newer ports.
> 
> Sounds to me like any port without a getdents system call should have only
> LFS (much like how new ports should have a 64-bit time_t forced upon them),
> using the kernel's lack of support as motivation. Are there any current
> 32-bit ports that only have getdents64?

From a kernel standpoint, I think it is the case: new ports will only provide the LFS variants (I think the idea is essentially the same for 64 time_t). The issue is glibc only sets off_t being off64_t for LP64 architectures. I am not sure how hard would be for newer 32-bit ports to not provide non-LFS variants, although I think it is doable (it would require a arch-specific typesizes.h plus some adjustments I think).

Unfortunately, we have nios2 which still provides non-LFS variants through glibc.

Comment 21 Florian Weimer 2018-12-11 13:56:30 UTC

I'm not sure if the lack of getdents for new architectures is relevant here.  After all, the i386 compat behavior affects both getdents and getdents64.  What seems to be missing is an interface that QEMU can use to tell the kernel that it wants compat behavior from the kernel.  Something like O_LARGEFILE, but which is not enabled by default.

The question is whether the kernel wants to carry around this baggage just for the sake of QEMU.

Comment 22 jsm-csl@polyomino.org.uk 2018-12-11 17:38:17 UTC

As d_off is an opaque value, can't __getdents just truncate without 
producing an EOVERFLOW error?  As __getdents is a purely internal 
function, its interface could be changed to return a truncation indication 
or indeed to provide the full d_off value somehow - such truncation 
indication only being relevant if telldir is used (which has a return type 
of long int and no corresponding LFS version).  Though if you want telldir 
/ seekdir to work in this case, maybe you do need to get the properly 
truncated value from the kernel (and also to add LFS versions of telldir / 
seekdir that work with directories too large for long on 32-bit systems).

Comment 23 jsm-csl@polyomino.org.uk 2018-12-11 17:43:32 UTC

On existing 32-bit architectures, we're saying _TIME_BITS=64 will only be 
allowed with _FILE_OFFSET_BITS=64, to avoid needing an ABI variant (and 
thus struct stat variant) with 32-bit offsets but 64-bit times.  It would 
seem reasonable enough to say that any new 32-bit architectures that 
choose to have only 64-bit time from the start (like 32-bit RISC-V) should 
also have only 64-bit file offsets etc. from the start as well (like x32).

Comment 24 Adhemerval Zanella 2018-12-11 20:54:38 UTC

(In reply to joseph@codesourcery.com from comment #22)
> As d_off is an opaque value, can't __getdents just truncate without 
> producing an EOVERFLOW error?  As __getdents is a purely internal 
> function, its interface could be changed to return a truncation indication 
> or indeed to provide the full d_off value somehow - such truncation 
> indication only being relevant if telldir is used (which has a return type 
> of long int and no corresponding LFS version).  Though if you want telldir 
> / seekdir to work in this case, maybe you do need to get the properly 
> truncated value from the kernel (and also to add LFS versions of telldir / 
> seekdir that work with directories too large for long on 32-bit systems).

One issue I can think of, if I am reading the ext4 code correctly, is the hash is encoded on high bits of d_off:

fs/ext4/dir.c:

302 /*
303  * These functions convert from the major/minor hash to an f_pos
304  * value for dx directories
305  *
306  * Upper layer (for example NFS) should specify FMODE_32BITHASH or
307  * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
308  * directly on both 32-bit and 64-bit nodes, under such case, neither
309  * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
310  */
311 static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
312 {
313         if ((filp->f_mode & FMODE_32BITHASH) ||
314             (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
315                 return major >> 1;
316         else
317                 return ((__u64)(major >> 1) << 32) | (__u64)minor;
318 }

It means that to mimic the FMODE_32BITHASH / 32 bits API we will need to return the 32 high bits, which might not be correct for other filesystems.

Comment 25 Florian Weimer 2018-12-11 21:16:09 UTC

(In reply to joseph@codesourcery.com from comment #22)
> As d_off is an opaque value, can't __getdents just truncate without 
> producing an EOVERFLOW error?  As __getdents is a purely internal 
> function, its interface could be changed to return a truncation indication 
> or indeed to provide the full d_off value somehow - such truncation 
> indication only being relevant if telldir is used (which has a return type 
> of long int and no corresponding LFS version).

That pretty much breaks seekdir/telldir on top of getdents64 for 32-bit systems unless the compat kludge described in comment 24 is active.  It's rather annoying that LFS does not solve this.

If we want to get truly ambitious we could allocate our own file offsets in telldir, so that they will fit into the 31 bits we have, keep the lookup tables in the directory stream, and deallocate them on a call to closedir.  But that would still be a waste on file systems which do the right thing internally (such as XFS).  Maybe we could avoid doing this if the offsets in the result buffer of getdents64 are monotonically increasing and sufficiently small, but it's a lot of work for a fringe use case of a fringe feature.

It looks to me that qemu-user papers over this by truncating the values.  I'm pretty sure that this will break seekdir on ext4, but apparently that's an improvement over proper error checking.

Comment 26 Sourceware Commits 2018-12-18 21:54:02 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  64dd7a16305441a7d6ed752c192c68b6c2a54ca5 (commit)
       via  8b1d5da56601ba7e59340dda235a6f3dbaa98ec9 (commit)
       via  f9eabb197fce8bab43376758fcb281bf2e4e88e0 (commit)
       via  56b98bf1fb819b357318f39fccf2901d3c6b41ec (commit)
       via  43a45c2d829f164c1fb94d5f44afe326fae946e1 (commit)
      from  646ce7e0be9218f644ab50681b4d5a13d1050dd4 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=64dd7a16305441a7d6ed752c192c68b6c2a54ca5

commit 64dd7a16305441a7d6ed752c192c68b6c2a54ca5
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Dec 11 17:16:36 2018 -0200

    s390: Use generic kernel_sigaction.h
    
    S390 kernel sigaction is the same as the Linux generic one.
    
    Checked with a s390-linux-gnu and s390x-linux-gnu build.
    
    	* sysdeps/unix/sysv/linux/s390/kernel_sigaction.h: Use Linux generic
    	kernel_sigction definition.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8b1d5da56601ba7e59340dda235a6f3dbaa98ec9

commit 8b1d5da56601ba7e59340dda235a6f3dbaa98ec9
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Dec 11 17:15:07 2018 -0200

    ia64: Remove kernel_sigaction.h
    
    IA64 kernel_sigaction.h definition is the sama as the Linux generic
    one.
    
    Checked on ia64-linux-gnu.
    
    	* sysdeps/unix/sysv/linux/ia64/kernel_sigaction.h: Remove file.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f9eabb197fce8bab43376758fcb281bf2e4e88e0

commit f9eabb197fce8bab43376758fcb281bf2e4e88e0
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Dec 11 17:14:09 2018 -0200

    hppa: Remove kernel_sigaction.h
    
    HPPA kernel_sigaction.h definition is the sama as the Linux generic
    one and old_kernel_sigaction is not used.
    
    Checked on hppa-linux-gnu.
    
    	* sysdeps/unix/sysv/linux/hppa/kernel_sigaction.h: Remove file.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=56b98bf1fb819b357318f39fccf2901d3c6b41ec

commit 56b98bf1fb819b357318f39fccf2901d3c6b41ec
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Dec 11 16:57:49 2018 -0200

    alpha: Use Linux generic sigaction implementation
    
    Alpha rt_sigaction syscall uses a slight different kernel ABI than
    generic one:
    
    arch/alpha/kernel/signal.c
    
     90 SYSCALL_DEFINE5(rt_sigaction, int, sig, const struct sigaction __user *, act,
     91                 struct sigaction __user *, oact,
     92                 size_t, sigsetsize, void __user *, restorer)
    
    Similar as sparc, the syscall expects a restorer function.  However
    different than sparc, alpha defines the restorer as the 5th argument
    (sparc defines as the 4th).
    
    This patch removes the arch-specific alpha sigaction implementation,
    adapt the Linux generic one to different restore placements (through
    STUB macro), and make alpha use the Linux generic kernel_sigaction
    definition.
    
    Checked on alpha-linux-gnu and x86_64-linux-gnu (for sanity).
    
    	* sysdeps/unix/sysv/linux/alpha/Makefile: Update comment about
    	__syscall_rt_sigaction.
    	* sysdeps/unix/sysv/linux/alpha/kernel_sigaction.h
    	(kernel_sigaction): Use Linux generic defintion.
    	(STUB): Define.
    	(__syscall_rt_sigreturn, __syscall_sigreturn): Add prototype.
    	* sysdeps/unix/sysv/linux/alpha/rt_sigaction.S
    	(__syscall_rt_sigaction): Remove implementation.
    	(__syscall_sigreturn, __syscall_rt_sigreturn): Define as global and
    	hidden.
    	* sysdeps/unix/sysv/linux/alpha/sigaction.c: Remove file.
    	* sysdeps/unix/sysv/linux/alpha/sysdep.h (INLINE_SYSCALL,
    	INTERNAL_SYSCALL): Remove definitions.
    	* sysdeps/unix/sysv/linux/sigaction.c: Define STUB to accept both the
    	action and signal set size.
    	* sysdeps/unix/sysv/linux/sparc/sparc32/sigaction.c (STUB): Redefine.
    	* sysdeps/unix/sysv/linux/sparc/sparc64/sigaction.c (STUB): Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=43a45c2d829f164c1fb94d5f44afe326fae946e1

commit 43a45c2d829f164c1fb94d5f44afe326fae946e1
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Dec 11 16:52:47 2018 -0200

    m68k: Fix sigaction kernel definition (BZ #23960)
    
    Commit b4a5d26d883 (linux: Consolidate sigaction implementation) added
    a wrong kernel_sigaction definition for m68k, meant for __NR_sigaction
    instead of __NR_rt_sigaction as used on generic Linux sigaction
    implementation.  This patch fixes it by using the Linux generic
    definition meant for the RT kernel ABI.
    
    Checked the signal tests on emulated m68-linux-gnu (Aranym).  It fixes
    the faulty signal/tst-sigaction and man works as expected.
    
    	Adhemerval Zanella  <adhemerval.zanella@linaro.org>
    	James Clarke  <jrtc27@jrtc27.com>
    
    	[BZ #23960]
    	* sysdeps/unix/sysv/linux/kernel_sigaction.h (HAS_SA_RESTORER):
    	Define if SA_RESTORER is defined.
    	(kernel_sigaction): Define sa_restorer if HAS_SA_RESTORER is defined.
    	(SET_SA_RESTORER, RESET_SA_RESTORER): Define iff the macro are not
    	already defined.
    	* sysdeps/unix/sysv/linux/m68k/kernel_sigaction.h (SA_RESTORER,
    	kernel_sigaction, SET_SA_RESTORER, RESET_SA_RESTORER): Remove
    	definitions.
    	(HAS_SA_RESTORER): Define.
    	* sysdeps/unix/sysv/linux/sparc/kernel_sigaction.h (SA_RESTORER,
    	SET_SA_RESTORER, RESET_SA_RESTORER): Remove definition.
    	(HAS_SA_RESTORER): Define.
    	* sysdeps/unix/sysv/linux/nios2/kernel_sigaction.h: Include generic
    	kernel_sigaction after define SET_SA_RESTORER and RESET_SA_RESTORER.
    	* sysdeps/unix/sysv/linux/powerpc/kernel_sigaction.h: Likewise.
    	* sysdeps/unix/sysv/linux/x86_64/sigaction.c: Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   49 ++++++++++++++++++++
 sysdeps/unix/sysv/linux/alpha/Makefile             |    2 +-
 sysdeps/unix/sysv/linux/alpha/kernel_sigaction.h   |   19 ++++----
 sysdeps/unix/sysv/linux/alpha/rt_sigaction.S       |   41 ++---------------
 sysdeps/unix/sysv/linux/alpha/sigaction.c          |   38 ---------------
 sysdeps/unix/sysv/linux/alpha/sysdep.h             |   23 ---------
 sysdeps/unix/sysv/linux/hppa/kernel_sigaction.h    |   18 -------
 sysdeps/unix/sysv/linux/ia64/kernel_sigaction.h    |    7 ---
 sysdeps/unix/sysv/linux/kernel_sigaction.h         |   12 ++++-
 sysdeps/unix/sysv/linux/m68k/kernel_sigaction.h    |   26 ++---------
 sysdeps/unix/sysv/linux/nios2/kernel_sigaction.h   |    3 +-
 sysdeps/unix/sysv/linux/powerpc/kernel_sigaction.h |    3 +-
 sysdeps/unix/sysv/linux/s390/kernel_sigaction.h    |   29 +----------
 sysdeps/unix/sysv/linux/sh/kernel_sigaction.h      |    3 +-
 sysdeps/unix/sysv/linux/sigaction.c                |    4 +-
 sysdeps/unix/sysv/linux/sparc/kernel_sigaction.h   |    7 +--
 sysdeps/unix/sysv/linux/sparc/sparc32/sigaction.c  |    5 +-
 sysdeps/unix/sysv/linux/sparc/sparc64/sigaction.c  |    5 +-
 sysdeps/unix/sysv/linux/x86_64/sigaction.c         |    3 +-
 19 files changed, 97 insertions(+), 200 deletions(-)
 delete mode 100644 sysdeps/unix/sysv/linux/alpha/sigaction.c
 delete mode 100644 sysdeps/unix/sysv/linux/hppa/kernel_sigaction.h
 delete mode 100644 sysdeps/unix/sysv/linux/ia64/kernel_sigaction.h

Comment 27 Adhemerval Zanella 2018-12-27 12:01:36 UTC

*** Bug 24014 has been marked as a duplicate of this bug. ***

Comment 28 Thomas De Schampheleire 2018-12-27 13:51:02 UTC

According to me, the impact 'qemu-user' would more accurately be 'qemu', as this problem is also seen with qemu-system-i386 via bug 24014.

Comment 29 Dmitry V. Levin 2018-12-28 04:48:50 UTC

(In reply to Florian Weimer from comment #12)
> The 32-bit getdents64 system call on an x86-64 kernel returns a truncated
> d_off value to userspace on ext4, while the 64-bit version of the system
> call uses more bits for d_off, on the same directory.

What if the 32-bit stat64 syscall truncated st_ino and st_size?
Sorry but this is clearly a bug in getdents64 syscall that should be fixed in the kernel instead.

Comment 30 Thomas De Schampheleire 2019-01-15 10:53:59 UTC

How can we proceed with this problem?

Even though the actual error may be in the kernel, the resulting behavior changed due to a change in glibc. As a user of CentOS 7 on the host machine, I cannot update the kernel even if there would be a fix in later kernel releases.
But I _can_ update glibc in my target toolchain (what will be used inside Qemu).

So, from my perspective as a glibc user, I would prefer the implementation in glibc to change back to something that also works on (possibly broken) kernels.

As it stands, glibc 2.28 and higher will not be usable inside Qemu on any host kernel, when using ext4 mounts.

Comment 31 Florian Weimer 2019-01-15 11:13:12 UTC

(In reply to Thomas De Schampheleire from comment #30)
> How can we proceed with this problem?

I raised this on various kernel lists:

https://lore.kernel.org/lkml/20181229015453.GA6310@bombadil.infradead.org/T/

I'm not a kernel developer.  I expect that someone needs to propose a kernel patch.

> As it stands, glibc 2.28 and higher will not be usable inside Qemu on any
> host kernel, when using ext4 mounts.

Most QEMU variants are just fine, this issue affects various variants of file system pass through only.

Comment 32 Adhemerval Zanella 2019-01-15 11:27:08 UTC

I am not against in reverting back to use SYS_getdents for getdents64, although it is a subpar resolution for a kernel issue.  Newer architectures with mixed 32 and 64 bits support will continue to be broken without a proper kernel fix since they use SYS_getdents64 for getdents.

What I think we should do is:

  1. *Deprecate* non-LFS usage in a multi-step way as discussed in libc-alpha [1]. We will need to take care of the issue brought by Joseph, but it will mean eventually the non-LFS interfaces will be just provided as compatibility symbols.

  2. Push to distro on 32-bits to *stop* building packages in non-LFS mode as default. Some distro already gets this right, but it seems some still lacking support.

  3. Continue to push kernel developers to provide a correct fix for this issue. 

[1] https://sourceware.org/ml/libc-alpha/2019-01/msg00114.html

Comment 33 Sourceware Commits 2019-01-31 17:12:36 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The annotated tag, glibc-2.29 has been created
at e7c9e41bb2407b0150997b382b49a5f3bb579bf9 (tag)
tagging 56c86f5dd516284558e106d04b92875d5b623b7a (commit)
replaces glibc-2.28.9000
tagged by Siddhesh Poyarekar
on Thu Jan 31 22:24:07 2019 +0530

- Log -----------------------------------------------------------------
The GNU C Library
=================

The GNU C Library version 2.29 is now available.

The GNU C Library is used as *the* C library in the GNU system and
in GNU/Linux systems, as well as many other systems that use Linux
as the kernel.

The GNU C Library is primarily designed to be a portable
and high performance C library. It follows all relevant
standards including ISO C11 and POSIX.1-2008. It is also
internationalized and has one of the most complete
internationalization interfaces known.

The GNU C Library webpage is at http://www.gnu.org/software/libc/

Packages for the 2.29 release may be downloaded from:
http://ftpmirror.gnu.org/libc/
http://ftp.gnu.org/gnu/libc/

The mirror list is at http://www.gnu.org/order/ftp.html

NEWS for version 2.29
====================

* The getcpu wrapper function has been added, which returns the currently
used CPU and NUMA node. This function is Linux-specific.

* A new convenience target has been added for distribution maintainers
to build and install all locales as directories with files. The new
target is run by issuing the following command in your build tree:
'make localedata/install-locale-files', with an optional DESTDIR
to set the install root if you wish to install into a non-default
configured location.

* Optimized generic exp, exp2, log, log2, pow, sinf, cosf, sincosf and tanf.

* The reallocarray function is now declared under _DEFAULT_SOURCE, not just
for _GNU_SOURCE, to match BSD environments.

* For powercp64le ABI, Transactional Lock Elision is now enabled iff kernel
indicates that it will abort the transaction prior to entering the kernel
(PPC_FEATURE2_HTM_NOSC on hwcap2). On older kernels the transaction is
suspended, and this caused some undefined side-effects issues by aborting
transactions manually. Glibc avoided it by abort transactions manually on
each syscall, but it lead to performance issues on newer kernels where the
HTM state is saved and restore lazily (the state being saved even when the
process actually does not use HTM).

* The functions posix_spawn_file_actions_addchdir_np and
posix_spawn_file_actions_addfchdir_np have been added, enabling
posix_spawn and posix_spawnp to run the new process in a different
directory. These functions are GNU extensions. The function
posix_spawn_file_actions_addchdir_np is similar to the Solaris function
of the same name.

* The popen and system do not run atfork handlers anymore (BZ#17490).
Although it is a possible POSIX violation, the POSIX rationale in
pthread_atfork documentation regarding atfork handlers is to handle
inconsistent mutex state after a fork call in a multi-threaded process.
In both popen and system there is no direct access to user-defined mutexes.

* Support for the C-SKY ABIV2 running on Linux has been added. This port
requires at least binutils-2.32, gcc-9.0, and linux-4.20. Two ABIs are
supported:
- C-SKY ABIV2 soft-float little-endian
- C-SKY ABIV2 hard-float little-endian

* strftime's default formatting of a locale's alternative year (%Ey)
has been changed to zero-pad the year to a minimum of two digits,
like "%y". This improves the display of Japanese era years during
the first nine years of a new era, and is expected to be harmless
for all other locales (only Japanese locales regularly have
alternative year numbers less than 10). Zero-padding can be
overridden with the '_' or '-' flags (which are GNU extensions).

* As a GNU extension, the '_' and '-' flags can now be applied to
"%EY" to control how the year number is formatted; they have the
same effect that they would on "%Ey".

Deprecated and removed features, and other changes affecting compatibility:

* The glibc.tune tunable namespace has been renamed to glibc.cpu and the
tunable glibc.tune.cpu has been renamed to glibc.cpu.name.

* The type of the pr_uid and pr_gid members of struct elf_prpsinfo, defined
in <sys/procfs.h>, has been corrected to match the type actually used by
the Linux kernel. This affects the size and layout of that structure on
MicroBlaze, MIPS (n64 ABI only), Nios II and RISC-V.

* For the MIPS n32 ABI, the type of the pr_sigpend and pr_sighold members of
struct elf_prstatus, and the pr_flag member of struct elf_prpsinfo,
defined in <sys/procfs.h>, has been corrected to match the type actually
used by the Linux kernel. This affects the size and layout of those
structures.

* An archaic GNU extension to scanf, under which '%as', '%aS', and '%a[...]'
meant to scan a string and allocate space for it with malloc, is now
restricted to programs compiled in C89 or C++98 mode with _GNU_SOURCE
defined. This extension conflicts with C99's use of '%a' to scan a
hexadecimal floating-point number, which is now available to programs
compiled as C99 or C++11 or higher, regardless of _GNU_SOURCE.

POSIX.1-2008 includes the feature of allocating a buffer for string input
with malloc, using the modifier letter 'm' instead. Programs using
'%as', '%aS', or '%a[...]' with the old GNU meaning should change to
'%ms', '%mS', or '%m[...]' respectively. Programs that wish to use the
C99 '%a' no longer need to avoid _GNU_SOURCE.

GCC's -Wformat warnings can detect most uses of this extension, as long
as all functions that call vscanf, vfscanf, or vsscanf are annotated with
__attribute__ ((format (scanf, ...))).

Changes to build and runtime requirements:

* Python 3.4 or later is required to build the GNU C Library.

* On most architectures, GCC 5 or later is required to build the GNU C
Library. (On powerpc64le, GCC 6.2 or later is still required, as before.)

Older GCC versions and non-GNU compilers are still supported when
compiling programs that use the GNU C Library.

Security related changes:

CVE-2018-19591: A file descriptor leak in if_nametoindex can lead to a
denial of service due to resource exhaustion when processing getaddrinfo
calls with crafted host names. Reported by Guido Vranken.

CVE-2019-6488: On x32, the size_t parameter may be passed in the lower
32 bits of a 64-bit register with with non-zero upper 32 bit. When it
happened, accessing the 32-bit size_t value as the full 64-bit register
in the assembly string/memory functions would cause a buffer overflow.
Reported by H.J. Lu.

CVE-2016-10739: The getaddrinfo function could successfully parse IPv4
addresses with arbitrary trailing characters, potentially leading to data
or command injection issues in applications.

Release Notes
=============

https://sourceware.org/glibc/wiki/Release/2.29

Contributors
============

This release was made possible by the contributions of many people.
The maintainers are grateful to everyone who has contributed
changes or bug reports. These include:

Adhemerval Zanella
Albert ARIBAUD (3ADEV)
Alexandra Hájková
Andreas K. Hüttel
Andreas Schwab
Anton Youdkevitch
Arjun Shankar
Assaf Gordon
Aurelien Jarno
Carlos O'Donell
Charles-Antoine Couret
DJ Delorie
Darius Rad
David S. Miller
Dmitry V. Levin
Florian Weimer
Fredrik Noring
Gabriel F. T. Gomes
H.J. Lu
Ilya Leoshkevich
Ilya Yu. Malakhov
Istvan Kurucsai
Jim Wilson
Joseph Myers
Justus Winter
Kemi Wang
Leonardo Sandoval
Mao Han
Martin Jansa
Martin Kuchta
Martin Sebor
Mingli Yu
Moritz Eckert
PanderMusubi
Paul Clarke
Paul Eggert
Paul Pluzhnikov
Pochang Chen
Rafael Avila de Espindola
Rafael Ávila de Espíndola
Rafal Luzynski
Rajalakshmi Srinivasaraghavan
Rogerio Alves
Samuel Thibault
Sergi Almacellas Abellana
Siddhesh Poyarekar
Stefan Liebler
Steve Ellcey
Szabolcs Nagy
TAMUKI Shoichi
Tobias Klauser
Tulio Magno Quites Machado Filho
Uroš Bizjak
Wilco Dijkstra
Zack Weinberg
Zong Li
-----BEGIN PGP SIGNATURE-----

iQEcBAABAgAGBQJcUyg2AAoJEHnEPfvxzyGHauAIAJmbTi6IHhY18D0NwFH002a/
Z/4L4jTd9/I8kaR+qYMGDi1tO+cTWtxO3jdlIU7/1VRdnL1h+HnlYTJlc64DVP9t
3W4lhSJRbK8HWlV0emmNHnBCgV6SxOMaMPN286WKLDTYI3OrOs16qkKneDqhWEoG
BS1rvxdkd27hOds3CY4xsgCFgeyo/aS+sqV2nMNdcpGBb1ZLNET3O3AkP155BwOF
utMl2xbQ5Ue17mOrw1TiOUJqvvf6FhNHFLT1dgBmgAVP+sXhjgL00co4sHh5xu5x
vJ1ju3KgzIYtxbiAIUTppia/nRFX4K8z+VL7f4aDeUm6cxuikECcpCVgH7if4gc=
=Fcnu
-----END PGP SIGNATURE-----

Adhemerval Zanella (43):
powerpc: Only enable TLE with PPC_FEATURE2_HTM_NOSC
Use libsupport for tst-spawn.c
Fix ifunc support with DT_TEXTREL segments (BZ#20480)
Fix misreported errno on preadv2/pwritev2 (BZ#23579)
libio: Flush stream at freopen (BZ#21037)
Fix build from commit 0b727ed
x86: Fix Haswell strong flags (BZ#23709)
Fix tst-preadvwritev2 build failure on HURD
posix: Add internal symbols for posix_spawn interface
support: Fix printf format for TEST_COMPARE_STRING
posix: Use posix_spawn on popen
posix: Use posix_spawn on system
Fix ChangeLog date from previous commit
posix: Fix segfault in maybe_script_execute
m68k: Fix sigaction kernel definition (BZ #23960)
alpha: Use Linux generic sigaction implementation
hppa: Remove kernel_sigaction.h
ia64: Remove kernel_sigaction.h
s390: Use generic kernel_sigaction.h
Fix BZ number for 43a45c2d82
Replace check_mul_overflow_size_t with __builtin_mul_overflow
termios: Define TIOCSER_TEMT with __USE_MISC (BZ#17783)
termios: Consolidate struct termios
termios: Consolidate termios c_cc symbolic constants
termios: Consolidate Input Modes definitions.
termios: Consolidate Output Modes definitions
termios: Consolidate Baud Rate Selection definitions (BZ#23783)
termios: Consolidate control mode definitions
termios: Consolidate local mode definitions
termios: Consolidate tcflow symbolic constants
termios: Remove Linux _IOT_termios
termios: Add powerpc termios-misc
termios: Consolidate termios.h
posix: Clear close-on-exec for posix_spawn adddup2 (BZ#23640)
nptl: Remove tst-cancel-wrappers test and related macros
nptl: Fix testcases for new pthread cancellation mechanism
x86_64: Remove wrong THREAD_ATOMIC_* macros
i386: Remove bogus THREAD_ATOMIC_* macros
nptl: Cleanup cancellation macros
posix: Fix tst-spawn.c issue from commit 805334b26c
elf: Fix LD_AUDIT for modules with invalid version (BZ#24122)
hurd: Fix libsupport xsigstack build
[elf] Revert 8e889c5da3 (BZ#24122)

Albert ARIBAUD (3ADEV) (12):
Y2038: provide size of default time_t for target architecture
Fix date typo in ChangeLog
Y2038: Add 64-bit time for all architectures
Y2038: make __tz_convert compatible with 64-bit-time
Y2038: add function __localtime64
Fix __TIMERSIZE and @theglibcadj typos
Y2038: add function __localtime64_r
Y2038: add function __gmtime64
Y2038: add function __gmtime64_r
Y2038: add function __ctime64
Y2038: add function __ctime64_r
Y2038: make __difftime compatible with 64-bit time

Alexandra Hájková (1):
Add an additional test to resolv/tst-resolv-network.c

Andreas K. Hüttel (1):
resolv: IDNA tests: AAAA (28) is valid, no fallthrough to default

Andreas Schwab (16):
RISC-V: Don't use ps_get_thread_area in libthread_db (bug 23402)
Don't build libnsl for new ABIs
Remove leading space from testrun.sh
Add missing unwind information to ld.so on powerpc32 (bug 23707)
Fix stack overflow in tst-setcontext9 (bug 23717)
Don't reduce test timeout to less than default
Don't use PSEUDO_END for non-PSEUDO function
Add more checks for valid ld.so.cache file (bug 18093)
RISC-V: properly terminate call chain (bug 23125)
libanl: properly cleanup if first helper thread creation failed (bug 22927)
RISC-V: don't assume PI mutexes and robust futexes before 4.20 (bug 23864)
Move *-le.abilist to le/*.abilist
Remove support for abilist-pattern
Reindent nptl/pthread_rwlock_common.c
Fix rwlock stall with PREFER_WRITER_NONRECURSIVE_NP (bug 23861)
nscd: avoid assertion failure during persistent db check

Anton Youdkevitch (1):
aarch64: optimized memcpy implementation for thunderx2

Arjun Shankar (3):
Clean up iconv/gconv_int.h for unnecessary declarations
Remove unnecessary locking when reading iconv configuration [BZ #22062]
Unconditionally call __gconv_get_path when reading iconv configuration

Assaf Gordon (1):
regex: fix heap-use-after-free error

Aurelien Jarno (4):
Update Alpha libm-test-ulps
ARM: fix kernel assisted atomics with GCC 8 (bug 24034)
en_US: define date_fmt (bug 24046)
Only build libm with -fno-math-errno (bug 24024)

Carlos O'Donell (11):
Add version.h, and NEWS update to ChangeLog.
Add convenience target 'install-locale-files'.
Fix ChangeLog date.
Update be translations.
Update be translations.
Update translations for be.
Fix test failure with -DNDEBUG.
Fix tst-setcontext9 for optimized small stacks.
abilist.awk: Treat .tdata like .tbss and reject unknown combinations.
Add --no-hard-links option to localedef (bug 23923)
x86: Add Hygon Dhyana support.

Charles-Antoine Couret (1):
argp: do not call _IO_fwide() if _LIBC is not defined

DJ Delorie (10):
RISC-V: Fix rounding save/restore bug.
Regen RISC-V rvd ULPs
Improve ChangeLog message.
Add test-in-container infrastructure.
Fix IA64 links-dso-program link.
links-dso-program: Fix build-programs=no build case.
malloc: tcache double free check
test-container: add "su" command to run test as root, add unshare hints
malloc: Add another test for tcache double free check.
test-container: move postclean outside of namespace changes

Darius Rad (1):
RISC-V: Update nofpu ULPs

David S. Miller (2):
Regenerate sparc ulps.
Add VDSO support to sparc.

Dmitry V. Levin (1):
Fix a few typos in comments

Florian Weimer (61):
Linux: Rewrite __old_getdents64 [BZ #23497]
mbstowcs: Remove outdated comment
error, error_at_line: Add missing va_end calls
nscd: Deallocate existing user names in file parser
nss_files: Fix file stream leak in aliases lookup [BZ #23521]
error, warn, warnx: Use __fxprintf for wide printing [BZ #23519]
Fix attribution of previous change in ChangeLog
Makeconfig (ASFLAGS): Always append required assembler flags
Add --with-nonshared-cflags option to configure
math: Regenerate s390 ulps
malloc: Add ChangeLog for accidentally committed change
__readlink_chk: Assume HAVE_INLINED_SYSCALLS
__readlink_chk: Remove micro-optimization
Makeconfig: Do not sort and deduplicate +cflags [BZ # 17248]
Avoid running some tests if the file system does not support holes
nscd: Fix use-after-free in addgetnetgrentX [BZ #23520]
test-container: EPERM from unshare is UNSUPPORTED
regex: Add test tst-regcomp-truncated [BZ #23578]
reallocarray: Declare under _DEFAULT_SOURCE
misc: New test misc/tst-gethostid
resource: Update struct rusage comments [BZ #23689]
time/tst-mktime2: Improve test error reporting
conform: XFAIL siginfo_t si_band test on sparc64
stdlib/test-bz22786: Avoid spurious test failures using alias mappings
stdlib/tst-strtod-overflow: Switch to support_blob_repeat
support_blob_repeat: Call mkstemp directory for the backing file
stdlib/test-bz22786: Avoid memory leaks in the test itself
support/test-container.c: Include <libc-pointer-arith.h>
support/shell-container.c: Use support_copy_file_range
posix: New function posix_spawn_file_actions_addchdir_np [BZ #17405]
support: Implement TEST_COMPARE_STRING
malloc: Convert the unlink macro to the unlink_chunk function
malloc: Use current (C11-style) atomics for fastbin access
support: Print timestamps in timeout handler
malloc: tcache: Validate tc_idx before checking for double-frees [BZ #23907]
CVE-2018-19591: if_nametoindex: Fix descriptor for overlong name [BZ #23927]
support: Implement support_quote_string
support_quote_string: Do not use str parameter name
support: Add signal support to support_capture_subprocess_check
posix: Do not include testcases.h, ptestcases.h in source tree
scripts/abilist.awk: Handle special _end symbol for Hurd
support: Close original descriptors in support_capture_subprocess
support: Implement <support/descriptors.h> to track file descriptors
inet/tst-if_index-long: New test case for CVE-2018-19591 [BZ #23927]
posix: New function posix_spawn_file_actions_addfchdir_np [BZ #17405]
compat getdents64: Use correct offset for retry [BZ #23972]
timespec_get (posix): Fix copyright header
manual: Document thread/task IDs for Linux
support: Do not require overflow builtin in support/blob_repeat.c
localedata: Remove executable bit from localedata/locales/bi_VU [BZ #23995]
locale: Rewrite locale/gen-translit.pl in Python
malloc: Always call memcpy in _int_realloc [BZ #24027]
nptl/tst-audit-threads: Switch to <support/test-driver.c>
intl: Do not return NULL on asprintf failure in gettext [BZ #24018]
Fix ChangeLog entry
Linux: Improve handling of resource limits in misc/tst-ttyname
manual: Use @code{errno} instead of @var{errno} [BZ #24063]
malloc: Revert fastbins to old-style atomics
resolv: Reformat inet_addr, inet_aton to GNU style
resolv: Do not send queries for non-host-names in nss_dns [BZ #24112]
CVE-2016-10739: getaddrinfo: Fully parse IPv4 address strings [BZ #20018]

Fredrik Noring (1):
MIPS: Use `.set mips2' to emulate LL/SC for the R5900 too

Gabriel F. T. Gomes (11):
Fix typo in the documentation of gcvt
Add tests for argp_error and argp_failure with floating-point parameters
Add test for warn, warnx, vwarn, and vwarnx with floating-point parameters
Add tests with floating-point arguments for err* and verr* functions
Use TEST_COMPARE_STRING in recently added test
Convert tst-efgcvt to the new test framework
Prepare vfscanf to use __strtof128_internal
Remove redirection of _IO_vfprintf
Add *-ldbl.h headers to include/bits
Add tests for the long double version of ecvt and fcvt
Set behavior of sprintf-like functions with overlapping source and destination

H.J. Lu (34):
x86: Rename get_common_indeces to get_common_indices
x86: Cleanup cpu-features-offsets.sym
x86: Don't include <init-arch.h> in assembly codes
x86: Move STATE_SAVE_OFFSET/STATE_SAVE_MASK to sysdep.h
test-container: Use xcopy_file_range for cross-device copy [BZ #23597]
i386: Use ENTRY and END in start.S [BZ #23606]
i386: Use _dl_runtime_[resolve|profile]_shstk for SHSTK [BZ #23716]
x86: Use RTM intrinsics in pthread mutex lock elision
x86: Use _rdtsc intrinsic for HP_TIMING_NOW
x86: Don't include <x86intrin.h>
x86: Support RDTSCP for benchtests
Check multiple NT_GNU_PROPERTY_TYPE_0 notes [BZ #23509]
x86/CET: Add a re-exec test with legacy bitmap
_dl_exception_create_format: Support %x/%lx/%zx
elf/dl-exception.c: Include <_itoa.h> for _itoa prototype
x86: Extend CPUID support in struct cpu_features
Add getcpu
Don't use __typeof__ (getcpu)
x86: Merge i386/x86_64 atomic-machine.h
manual/examples: Remove redundant "if not"
x86-64: Vectorize sincosf_poly and update s_sincosf-fma.c
Regenerate sysdeps/x86_64/fpu/libm-test-ulps
x86-64: Remove s_sincosf-sse2.S
riscv: Use __has_include__ to include <asm/syscalls.h> [BZ #24022]
soft-fp: Properly check _FP_W_TYPE_SIZE [BZ #24066]
Disable lazy binding on tests for minimal signal handler
x86-64 memchr/wmemchr: Properly handle the length parameter [BZ# 24097]
x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ# 24097]
x86-64 memcpy: Properly handle the length parameter [BZ# 24097]
x86-64 memrchr: Properly handle the length parameter [BZ# 24097]
x86-64 memset/wmemset: Properly handle the length parameter [BZ# 24097]
x86-64 strncmp family: Properly handle the length parameter [BZ# 24097]
x86-64 strncpy: Properly handle the length parameter [BZ# 24097]
x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ# 24097]

Ilya Leoshkevich (12):
S390: Use symbolic offsets for stack variables in 32-bit _dl_runtime_resolve
S390: Use symbolic offsets for stack variables in 32-bit _dl_runtime_profile
S390: Use symbolic offsets for stack variables in 64-bit _dl_runtime_resolve
S390: Use symbolic offsets for stack variables in 64-bit _dl_runtime_profile
S390: Do not clobber R0 in 32-bit _dl_runtime_resolve
S390: Do not clobber R0 in 32-bit _dl_runtime_profile
S390: Do not clobber R0 in 64-bit _dl_runtime_resolve
S390: Do not clobber R0 in 64-bit _dl_runtime_profile
S390: Test that lazy binding does not clobber R0
Move __fentry__ version definition to sysdeps/{i386,x86_64}
S390: Implement 64-bit __fentry__
S390: Fix unwind in 32-bit _mcount

Ilya Yu. Malakhov (1):
signal: Use correct type for si_band in siginfo_t [BZ #23562]

Istvan Kurucsai (3):
malloc: Additional checks for unsorted bin integrity I.
malloc: Add more integrity checks to mremap_chunk.
malloc: Check the alignment of mmapped chunks before unmapping.

Jim Wilson (1):
RISC-V: Update LP64D libm-test-ulps.

Joseph Myers (123):
Move SNAN_TESTS_TYPE_CAST out of math-tests.h.
Move SNAN_TESTS_PRESERVE_PAYLOAD out of math-tests.h.
Fix math/test-misc.c for undefined fenv.h macros.
Do not define various fenv.h macros for MIPS soft-float (bug 23479).
Consistently terminate libm-test-*.inc TEST lines with commas.
Move comment from libm-test-nextdown.inc to libm-test-nexttoward.inc.
Replace gen-libm-test.pl with gen-libm-test.py.
Move SNAN_TESTS_* out of math-tests.h.
Use Linux 4.18 in build-many-glibcs.py.
Update install.texi documentation of uses of Perl and Python.
Update syscall-names.list for Linux 4.18.
Add NT_VMCOREDD, AT_MINSIGSTKSZ from Linux 4.18 to elf.h.
Update struct signalfd_siginfo from Linux 4.18.
Update netinet/tcp.h from Linux 4.18.
Move ROUNDING_TESTS_* out of math-tests.h.
Don't redefine ROUNDING_TESTS_* in math/test-*-vlen*.h.
Move EXCEPTION_TESTS_* out of math-tests.h
Move EXCEPTION_ENABLE_SUPPORTED out of math-tests.h.
Update netinet/udp.h from Linux 4.18.
Move EXCEPTION_SET_FORCES_TRAP out of math-tests.h.
Split fenv_private.h out of math_private.h more consistently.
Make gen-libm-test.py treat plus_oflow and minus_oflow as non-finite.
Replace conform/list-header-symbols.pl with a Python script.
Do not include fenv_private.h in math_private.h.
Move fenv.h soft-float inlines from fenv_private.h to include/fenv.h.
Move float128 inlines from sysdeps/generic/math_private.h to include/math.h.
Remove alpha math_private.h.
Add build-many-glibcs.py --enable-obsolete-* configs.
Add build-many-glibcs.py support for building more GCC libraries.
Remove x86_64 math_private.h asms.
Include most of elf/ modules-names in modules-names-tests.
Use floor functions not __floor functions in glibc libm.
Use rint functions not __rint functions in glibc libm.
Fix sys/procfs.h pr_uid, pr_gid type (bug 23649).
Fix MIPS n32 pr_sigpend, pr_sighold, pr_flag type (bug 23656).
Update siginfo constants from Linux kernel (bug 21286).
Use ceil functions not __ceil functions in glibc libm.
Fix ldbl-128ibm ceill, floorl inlining of ceil, floor.
Unify many bits/mman.h headers.
Invert sense of list of i686-class processors in sysdeps/x86/cpu-features.h.
Use trunc functions not __trunc functions in glibc libm.
Unify some sys/procfs.h headers.
Unify more sys/procfs.h headers.
Complete sys/procfs.h unification.
Share MAP_* flags between more architectures.
Use round functions not __round functions in glibc libm.
Use copysign functions not __copysign functions in glibc libm.
Remove unnecessary math_private.h includes.
Move MREMAP_* to bits/mman-shared.h.
Add more fma tests.
Fix libnldbl_nonshared.a references to internal libm symbols (bug 23735).
Use bits/mman-linux.h for hppa.
Use common bits/msq.h for more architectures.
Use common bits/sem.h for more architectures.
Use common bits/shm.h for more architectures.
Use single bits/msq.h for all architectures.
Use single bits/sem.h for all architectures.
Move SHMLBA to its own header.
Use single bits/shm.h for all architectures.
Do not allow divide-by-zero exception for pow(+/- 0, -Inf).
Handle surrogate pairs in c16rtomb (bug 23794, DR#488, C2X).
Stop c32rtomb and mbrtoc32 aliasing wcrtomb and mbrtowc (bug 23793).
Use Linux 4.19 in build-many-glibcs.py.
Update kernel version in syscall-names.list to 4.19.
Use gen-libm-test.py to generate ulps table for manual.
Add new ELF note types from Linux 4.19 to elf.h.
Add IN_MASK_CREATE from Linux 4.19 to sys/inotify.h.
Remove pre-Python-3.4 compatibility from build-many-glibcs.py.
Patch to require Python 3.4 or later to build glibc.
Use tempfile.TemporaryDirectory in conform/glibcconform.py.
Convert linknamespace tests from Perl to Python.
Update and correct SPARC configuration for supported socket syscalls (bug 23848).
Disable -Wformat-overflow= warnings for some printf tests.
Avoid printf ("%s", NULL) in posix/bug-regex22.c.
Correct SH kernel-features.h undefines (bug 23862).
Fix __ASSUME_MLOCK2 for ARM, MicroBlaze (bug 23867).
Remove __ASSUME_SOCKETCALL.
Replace conformtest.pl with conformtest.py.
Update conform/Makefile mkdir commands.
Remove redundant macro definitions from ia64 sfp-machine.h.
Fix i686 build with GCC 9.
Fix armv7 build with GCC 9.
Fix sparc64 build with GCC 9.
Add hidden_tls_def macros, fix powerpc-soft build with GCC 9.
Fix mips build with GCC 9.
Use unique identifiers in conformtest.
Separate conformtest subtest generation and execution.
Combine more conformtest tests into single execution of the compiler.
Fix Arm __ASSUME_COPY_FILE_RANGE (bug 23915).
Touch more glibc source files in build-many-glibcs.py.
Fix Hurd build with read-only source directory.
Do not copy glibc sources in build-many-glibcs.py.
Replace gen-as-const.awk by gen-as-const.py.
Make gen-as-const.py handle '--' consistently with awk script.
Stop test-in-container trying to run other-OS binaries.
Update miscellaneous files from upstream sources.
Update timezone code from tzcode 2018g.
Move tst-signal-numbers to Python.
Use gen-as-const.py to process .pysym files.
Remove x86 mathinline.h hypot inline.
Do not clobber sp in _hurd_stack_setup.
Remove x86 mathinline.h asinh, acosh, atanh inlines.
Add test that MAP_* constants agree with kernel.
Do not clobber r12 for ia64 syscalls.
Remove __ASSUME_ST_INO_64_BIT.
Remove x86 mathinline.h sinh, cosh, tanh inlines.
Remove x86 mathinline.h.
Require GCC 5 or later to build glibc (bug 23993).
Update longlong.h.
Update nios2, sparc32 localplt.data for difftime changes (bug 24023).
Use Linux 4.20 in build-many-glibcs.py.
Update timezone code from tzcode 2018i.
Update copyright dates with scripts/update-copyrights.
Update copyright dates not handled by scripts/update-copyrights.
Update miscellaneous files from upstream sources.
Update syscall-names.list for Linux 4.20.
Add HWCAP_SSBS from Linux 4.20 to AArch64 bits/hwcap.h.
Add PACKET_IGNORE_OUTGOING from Linux 4.20 to netpacket/packet.h.
Add IPV6_MULTICAST_ALL from Linux 4.20 to bits/in.h.
Update MIPS libm-test-ulps.
Update Linux kernel version in tst-mman-consts.py.
Update powerpc-nofpu libm-test-ulps.
Use binutils 2.32 branch in build-many-glibcs.py.

Justus Winter (1):
hurd: Handle "pid" magical lookup retry

Kemi Wang (1):
Mutex: Add pthread mutex tunables

Leonardo Sandoval (5):
benchtests: Set float type on --threshold argument
benchtests: keep comparing even if function timings do not match
benchtests: include --stats parameter
benchtests: send non-consumable data to stderr
x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2

Mao Han (4):
Update config.guess and config.sub to current versions.
C-SKY: Add dynamic relocations to elf.h
Add statx conditionals for wordsize-32 *xstat.c
Add C-SKY port

Martin Jansa (2):
sysdeps/ieee754/soft-fp: ignore maybe-uninitialized with -O [BZ #19444]
sysdeps/ieee754: prevent maybe-uninitialized errors with -O [BZ #19444]

Martin Kuchta (1):
pthread_cond_broadcast: Fix waiters-after-spinning case [BZ #23538]

Martin Sebor (1):
Add support for GCC 9 attribute copy.

Mingli Yu (1):
Linux gethostid: Check for NULL value from gethostbyname_r [BZ #23679]

Moritz Eckert (1):
malloc: Mitigate null-byte overflow attacks

PanderMusubi (1):
bs_BA: Fix a small typo in comment (bug 24011).

Paul Clarke (1):
powerpc: Fix tiny bug in strncmp.c

Paul Eggert (21):
regex: fix memory leak in Gnulib
regex: Gnulib unibyte RRI uses bytes not chars
regex: port Gnulib code to z/OS POSIX environment
regex: fix uninitialized memory access
Fix tzfile low-memory assertion failure
Simplify tzfile fstat failure code
Merge mktime, timegm from upstream Gnulib
Fix mktime localtime offset confusion
mktime fix for Gnulib + coreutils
regex: __builtin_expect → __glibc_unlikely
regex: simplify by using intprops.h
mktime: fix EOVERFLOW bug
mktime: new test for mktime failure
mktime: simplify offset guess
mktime: make more room for overflow
mktime: fix bug with Y2038 DST transition
mktime: fix non-EOVERFLOW errno handling
mktime: DEBUG_MKTIME cleanup
regex: fix storage-exhaustion error
regex: simplify Gnulib port
regex: improve Gnulib port to AIX

Paul Pluzhnikov (4):
Fix BZ#23400 (creating temporary files in source tree), and undefined behavior in test.
[BZ #20271] Add newlines in __libc_fatal calls.
stdlib: assert on NULL function pointer in atexit etc. [BZ #20544]
Fix potential stack overflow [BZ #23490]

Pochang Chen (1):
malloc: Verify size of top chunk.

Rafael Avila de Espindola (1):
Simplify an #if #else #endif

Rafael Ávila de Espíndola (6):
Enable VDSO on x86_64 statically linked programs [BZ #19767]
Enable VDSO on powerpc statically linked programs (bug 19767)
Enable VDSO for static linking on aarch64
Enable VDSO on i386 statically linked programs
Enable VDSO for static linking on arm
Enable VDSO for static linking on mips

Rafal Luzynski (12):
ChangeLog: Fix an obvious typo.
en_IN: Set the correct date format for "%x" (bug 17426).
Indian and similar locales: Set the correct date format (bug 17426).
Italian and Swiss locales: Use the correct separators (bug 10797).
it_CH/it_IT locales: Correct some LC_TIME formats (bug 10425).
kl_GL: Fix spelling of Sunday, should be "sapaat" (bug 20209).
kl_GL: Update the month names and date formats (bug 23740).
NEWS: Fix a minor typo ("incosistent" -> "inconsistent").
NEWS: Fix another typo ("multithread..." -> "multi-threaded...").
sq_AL: Use the correct date and time formats (bug 10496, 23724).
Multiple locales: Use the correct 12-hour time formats (bug 10496).
ChangeLog: Fix an obvious typo in the previous commit.

Rajalakshmi Srinivasaraghavan (3):
powerpc: Rearrange little endian specific files
powerpc: Remove powerpc specific sinf and cosf optimization
Speedup first memmem match

Rogerio Alves (2):
powerpc: Fix VSCR position in ucontext (bug 24088)
powerpc: fix tst-ucontext-ppc64-vscr test for POWER 5/6.

Samuel Thibault (36):
hurd: Add missing symbols for proper libc_get/setspecific
hurd: Avoid PLTs for __pthread_get/setspecific
hurd: XFAIL absence of C11 threads implementation
hurd: set interrupt timeout to 1 minute
hurd: Return EIO on non-responding interrupted servers
hurd: Fix race between calling RPC and handling a signal
hurd: Fix cancellation just before RPC call
hurd: Fix race between calling RPC and handling a signal
hurd: return EIEIO instead of EIO
hurd: Document how to translate EIEIO error message
hurd: Fix build
Merge branch 'master' of git://sourceware.org/git/glibc
hurd: Fix errno* generation
Merge branch 'master' into errno
hurd: Add pci RPC stubs
hurd: Support msync
hurd: Fix last-minute refactoring
Hurd: Implement chdir support in posix_spawn
Hurd: Fix ulinks in fd table reallocation
Hurd: export _hurd_port_move
hurd: Document dtable_cloexec size convention.
hurd: Fix spawni's user_link reallocation
hurd: Fix build with GCC 9
hurd: Fix F_*LK* fcntl with __USE_FILE_OFFSET64
hurd: Support lockf at offset 0 with size 0 or 1.
hurd: Fix returning value for fcntl(F_*LK*)
htl: Fix comparing attr with default values
Fix test-as-const-jmp_buf-ssp.c generation on gnu-i386
hurd: Implement support for posix_spawn_file_actions_addfchdir_np
hurd: Fix linknamespace of spawni
hurd: Fix 64bit fcntl lock implementation
hurd: advertise *_setpshared as not supported
hurd: Check at_flags passed to faccessat
hurd: Support AT_EMPTY_PATH
hurd: Fix initial sigaltstack state
hurd: Fix initial sigaltstack state

Sergi Almacellas Abellana (1):
Currency symbol should not preceed amount for [BZ #23791]

Siddhesh Poyarekar (14):
Rename the glibc.tune namespace to glibc.cpu
Add ChangeLog for the last commit
[benchtests] Fix compare_strings.py for python2
benchtests: Clean up the alloc_bufs
[aarch64] Fix value of MIN_PAGE_SIZE for testing
[benchtests] Add mandatory attributes to workload tests
[benchtests] Add workload test properties to schema
[aarch64] Add an ASIMD variant of strlen for falkor
Print strlen benchmark output in json
Reallocate buffers for every run in strlen
Update libc.pot
Update translations
Prepare for 2.29 release
Tag 2.29 release

Stefan Liebler (63):
Test stdlib/test-bz22786 exits now with unsupported if malloc fails.
Fix segfault in maybe_script_execute.
S390: Regenerate ULPs.
Adjust name of ld.so in test-container.c.
Fix race in pthread_mutex_lock while promoting to PTHREAD_MUTEX_ELISION_NP [BZ #23275]
S390: Regenerate ULPs.
Add missing libnss_testX.so requirement for tst-nss-test3.
S390: Add configure check to detect z10 as mininum architecture level set.
S390: Use hwcap instead of dl_hwcap in ifunc-resolvers.
S390: Unify 31/64bit memset.
S390: Refactor memset ifunc handling.
S390: Implement bzero with memset.
S390: Unify 31/64bit memcmp.
S390: Refactor memcmp ifunc handling.
S390: Unify 31/64bit memcpy.
S390: Refactor memcpy/mempcpy ifunc handling.
S390: Remove s390 specific implementation of bcopy.
S390: Use memcpy for forward cases in memmove.
S390: Add configure check to detect z13 as mininum architecture level set.
S390: Add z13 memmove ifunc variant.
S390: Add z13 strstr ifunc variant.
S390: Add z13 memmem ifunc variant.
S390: Refactor strlen ifunc handling.
S390: Refactor strnlen ifunc handling.
S390: Refactor strcpy ifunc handling.
S390: Refactor stpcpy ifunc handling.
S390: Refactor strncpy ifunc handling.
S390: Refactor stpncpy ifunc handling.
S390: Refactor strcat ifunc handling.
S390: Refactor strncat ifunc handling.
S390: Refactor strcmp ifunc handling.
S390: Refactor strncmp ifunc handling.
S390: Refactor strchr ifunc handling.
S390: Refactor strchrnul ifunc handling.
S390: Refactor strrchr ifunc handling.
S390: Refactor strspn ifunc handling.
S390: Refactor strpbrk ifunc handling.
S390: Refactor strcspn ifunc handling.
S390: Refactor memchr ifunc handling.
S390: Refactor rawmemchr ifunc handling.
S390: Refactor memccpy ifunc handling.
S390: Refactor memrchr ifunc handling.
S390: Refactor wcslen ifunc handling.
S390: Refactor wcsnlen ifunc handling.
S390: Refactor wcscpy ifunc handling.
S390: Refactor wcpcpy ifunc handling.
S390: Refactor wcsncpy ifunc handling.
S390: Refactor wcpncpy ifunc handling.
S390: Refactor wcscat ifunc handling.
S390: Refactor wcsncat ifunc handling.
S390: Refactor wcscmp ifunc handling.
S390: Refactor wcsncmp ifunc handling.
S390: Refactor wcschr ifunc handling.
S390: Refactor wcschrnul ifunc handling.
S390: Refactor wcsrchr ifunc handling.
S390: Refactor wcsspn ifunc handling.
S390: Refactor wcspbrk ifunc handling.
S390: Refactor wcscspn ifunc handling.
S390: Refactor wmemchr ifunc handling.
S390: Refactor wmemset ifunc handling.
S390: Refactor wmemcmp ifunc handling.
S390: Refactor gconv_simple ifunc handling.
S390: Cleanup ifunc-resolve.h.

Steve Ellcey (1):
Remove extra space at end of line.

Szabolcs Nagy (17):
Clean up converttoint handling and document the semantics
Add new exp and exp2 implementations
Missed ChangeLog
Add new log implementation
Add new log2 implementation
Add new pow implementation
Fix the documentation comment of checkint in powf
Increase timeout of libio/tst-readline
Increase timeout of nss/tst-nss-files-hosts-multi
i64: fix missing exp2f, log2f and powf symbols in libm.a [BZ #23822]
Remove the error handling wrapper from exp and exp2
Remove the error handling wrapper from log
Remove the error handling wrapper from log2
Remove the error handling wrapper from pow
Fix powf overflow handling in non-nearest rounding mode [BZ #23961]
AArch64: Update dl-procinfo.c with new HWCAP
Fix the manual for old texinfo

TAMUKI Shoichi (4):
strftime: Consequently use the "L_" macro with character literals
manual: Fix the wording to "alternative" rather than "alternate"
strftime: Set the default width of "%Ey" to 2 [BZ #23758]
strftime: Pass the additional flags from "%EY" to "%Ey" [BZ #24096]

Tobias Klauser (1):
Add PF_XDP, AF_XDP and SOL_XDP from Linux 4.18 to bits/socket.h.

Tulio Magno Quites Machado Filho (4):
Fix _dl_profile_fixup data-dependency issue (Bug 23690)
powerpc: Add missing CFI register information (bug #23614)
Print cache size and geometry auxv types on LD_SHOW_AUXV=1
Add XFAIL_ROUNDING_IBM128_LIBGCC to more fma() tests

Uroš Bizjak (1):
alpha: Fix __remqu corrupting $f3 register

Wilco Dijkstra (13):
Simplify and speedup strstr/strcasestr first match
Improve performance of sincosf
Improve performance of sinf and cosf
Fix spaces in x86_64 ULP file
Use generic sinf/cosf in lgammaf_r
Speedup tanf range reduction
Update NEWS for sinf improvements
Remove unused math files
Fix strstr bug with huge needles (bug 23637)
[AArch64] Adjust writeback in non-zero memset
Refactor string benchtests
Improve bench-strlen
[AArch64] Add ifunc support for Ares

Zack Weinberg (11):
[manual] Job control is no longer optional.
Use STRFMON_LDBL_IS_DBL instead of __ldbl_is_dbl.
Add __vfscanf_internal and __vfwscanf_internal with flags arguments.
Use SCANF_ISOC99_A instead of _IO_FLAGS2_SCANF_STD.
Use SCANF_LDBL_IS_DBL instead of __ldbl_is_dbl.
Add __v*printf_internal with flags arguments
Add __vsyslog_internal, with same flags as __v*printf_internal.
Use PRINTF_FORTIFY instead of _IO_FLAGS2_FORTIFY (bug 11319)
Use PRINTF_LDBL_IS_DBL instead of __ldbl_is_dbl.
Use C99-compliant scanf under _GNU_SOURCE with modern compilers.
Tests for minimal signal handler functionality in MINSIGSTKSZ space.

Zong Li (3):
elf: Fix the ld flags not be applied to tst-execstack-mod.so
soft-fp: Use temporary variable in FP_FRAC_SUB_3/FP_FRAC_SUB_4
soft-fp: Add implementation for 128 bit self-contained

-----------------------------------------------------------------------

Comment 34 John Paul Adrian Glaubitz 2019-04-29 11:39:35 UTC

(In reply to Adhemerval Zanella from comment #32)
> I am not against in reverting back to use SYS_getdents for getdents64,
> although it is a subpar resolution for a kernel issue.  Newer architectures
> with mixed 32 and 64 bits support will continue to be broken without a
> proper kernel fix since they use SYS_getdents64 for getdents.

Could we revert to SYS_getdents for the time being? Currently, we cannot update the glibc package on qemu-emulated buildds because of this particular issue.

Comment 35 Adhemerval Zanella 2019-04-30 17:13:25 UTC

(In reply to John Paul Adrian Glaubitz from comment #34)
> (In reply to Adhemerval Zanella from comment #32)
> > I am not against in reverting back to use SYS_getdents for getdents64,
> > although it is a subpar resolution for a kernel issue.  Newer architectures
> > with mixed 32 and 64 bits support will continue to be broken without a
> > proper kernel fix since they use SYS_getdents64 for getdents.
> 
> Could we revert to SYS_getdents for the time being? Currently, we cannot
> update the glibc package on qemu-emulated buildds because of this particular
> issue.

The issue is the workaround would probably be permanent, since kernel developers didn't see to be working on fixing it, and it would hide a kernel issue on compat syscalls that might continue to appear in newer 32-bit ABI since LFS is not yet enforced as the only support ABI. It would also require all the internal boilerplate to handle this syscall.

My question is why is preventing buildds on qemu-emulated architectures to use LFS interfaces instead of non-LFS one?

Comment 36 John Paul Adrian Glaubitz 2019-04-30 17:15:44 UTC

(In reply to Adhemerval Zanella from comment #35)
> My question is why is preventing buildds on qemu-emulated architectures to
> use LFS interfaces instead of non-LFS one?

We're just using the defaults of qemu-user. Is there an option to force-enable LFS in qemu?

Comment 37 Adhemerval Zanella 2019-04-30 17:31:54 UTC

(In reply to John Paul Adrian Glaubitz from comment #36)
> (In reply to Adhemerval Zanella from comment #35)
> > My question is why is preventing buildds on qemu-emulated architectures to
> > use LFS interfaces instead of non-LFS one?
> 
> We're just using the defaults of qemu-user. Is there an option to
> force-enable LFS in qemu?

No sure in fact, my un(In reply to John Paul Adrian Glaubitz from comment #36)
> (In reply to Adhemerval Zanella from comment #35)
> > My question is why is preventing buildds on qemu-emulated architectures to
> > use LFS interfaces instead of non-LFS one?
> 
> We're just using the defaults of qemu-user. Is there an option to
> force-enable LFS in qemu?

In fact, it seems it would fail regardless of whether the process issue LFS or non-LFS syscall, as Florian has pointed out [1].  Florian could you confirm that even 32-bit binaries built with LFS support while running on qemu-system also built with LFS support still fail? Because I am failing to see where exactly it would be broken in this situation.

[1] https://lore.kernel.org/lkml/20181229015453.GA6310@bombadil.infradead.org/T/

Comment 38 Florian Weimer 2019-04-30 17:50:32 UTC

I feel we are running in circles here.  Let me summarize what I think is going on.

The core issue is that neither POSIX nor glibc have an LFS interface for seekdir/telldir, and the d_off value is expected to be compatible with the non-LFS seekdir, telldir functions.

The kernel implements an arbitrary truncation for both getdents and getdents64 (so both LFS and non-LFS interfaces) when running 32-bit compat mode.  It is not possible at all to get the same truncation in 64-bit mode.  qemu-user fakes a *different* truncation when emulating the getdents system call.  Since hardly anyone uses d_off/seekdir/telldir, this does not matter.

With the range checking for d_off in the current glibc implementation of getdents, we suddenly insist on a 32-bit-compatible d_off value from a getdents64 system call in 32-bit compat mode, like the actual kernel implements this.

So I think without new kernel/userspace APIs, the only way to fix this is to change the binaries running under qemu-user (and not qemu-user itself) to LFS interfaces, and stop using telldir/seekdir, in favor of d_off from readdir64/LFS-readdir with lseek64/LFS-lseek.

I hope this summary makes sense.

Comment 39 Adhemerval Zanella 2019-04-30 18:00:38 UTC

(In reply to Florian Weimer from comment #38)
> I feel we are running in circles here.  Let me summarize what I think is
> going on.
> 
> The core issue is that neither POSIX nor glibc have an LFS interface for
> seekdir/telldir, and the d_off value is expected to be compatible with the
> non-LFS seekdir, telldir functions.
> 
> The kernel implements an arbitrary truncation for both getdents and
> getdents64 (so both LFS and non-LFS interfaces) when running 32-bit compat
> mode.  It is not possible at all to get the same truncation in 64-bit mode. 
> qemu-user fakes a *different* truncation when emulating the getdents system
> call.  Since hardly anyone uses d_off/seekdir/telldir, this does not matter.
> 
> With the range checking for d_off in the current glibc implementation of
> getdents, we suddenly insist on a 32-bit-compatible d_off value from a
> getdents64 system call in 32-bit compat mode, like the actual kernel
> implements this.
> 
> So I think without new kernel/userspace APIs, the only way to fix this is to
> change the binaries running under qemu-user (and not qemu-user itself) to
> LFS interfaces, and stop using telldir/seekdir, in favor of d_off from
> readdir64/LFS-readdir with lseek64/LFS-lseek.

Ok, so this my understanding as well. To make it clear the truncation and thus the issue only happens when a binary runs with a qemu-system built without LFS, so the question is which build step in debian buildd infrastructure is not being built with LFS support. The idea is once binaries are not using non-LFS getdents anymore this issue won't happen.

> 
> I hope this summary makes sense.

Comment 40 John Paul Adrian Glaubitz 2019-04-30 23:03:12 UTC

(In reply to Florian Weimer from comment #38)
> So I think without new kernel/userspace APIs, the only way to fix this is to
> change the binaries running under qemu-user (and not qemu-user itself) to
> LFS interfaces, and stop using telldir/seekdir, in favor of d_off from
> readdir64/LFS-readdir with lseek64/LFS-lseek.

How do I achieve this? Does this mean I have to patch 12.000 Debian source packages to switch the binaries to LFS mode? If this really involves patching all of these packages, then I don't think we will ever realistically achieve this.

Comment 41 Jessica Clarke 2019-04-30 23:19:52 UTC

(In reply to John Paul Adrian Glaubitz from comment #40)
> (In reply to Florian Weimer from comment #38)
> > So I think without new kernel/userspace APIs, the only way to fix this is to
> > change the binaries running under qemu-user (and not qemu-user itself) to
> > LFS interfaces, and stop using telldir/seekdir, in favor of d_off from
> > readdir64/LFS-readdir with lseek64/LFS-lseek.
> 
> How do I achieve this? Does this mean I have to patch 12.000 Debian source
> packages to switch the binaries to LFS mode? If this really involves
> patching all of these packages, then I don't think we will ever
> realistically achieve this.

Easiest thing would be to patch gcc/config/whatever.h to include:

    builtin_define ("_LARGEFILE_SOURCE=1");         \
    builtin_define ("_LARGEFILE64_SOURCE=1");       \
    builtin_define ("_FILE_OFFSET_BITS=64");        \

though I don't know if that causes problems for compiling glibc itself. Note that this does count as an ABI break for things using off_t in structs exposed in their API.

Comment 42 John Paul Adrian Glaubitz 2019-04-30 23:40:32 UTC

(In reply to James Clarke from comment #41)
> Easiest thing would be to patch gcc/config/whatever.h to include:
> 
>     builtin_define ("_LARGEFILE_SOURCE=1");         \
>     builtin_define ("_LARGEFILE64_SOURCE=1");       \
>     builtin_define ("_FILE_OFFSET_BITS=64");        \
> 
> though I don't know if that causes problems for compiling glibc itself. Note
> that this does count as an ABI break for things using off_t in structs
> exposed in their API.

I just noticed that at least dash seems to have been patched to enable LFS now, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916255. So, it could be that at least our original bug could already be addressed now. But I think there was another important package affected by this regression although I don't remember.

Comment 43 John Paul Adrian Glaubitz 2019-05-01 07:52:57 UTC

(In reply to John Paul Adrian Glaubitz from comment #42)
> (In reply to James Clarke from comment #41)
> > Easiest thing would be to patch gcc/config/whatever.h to include:
> > 
> >     builtin_define ("_LARGEFILE_SOURCE=1");         \
> >     builtin_define ("_LARGEFILE64_SOURCE=1");       \
> >     builtin_define ("_FILE_OFFSET_BITS=64");        \
> > 
> 
> I just noticed that at least dash seems to have been patched to enable LFS
> now, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916255. So, it
> could be that at least our original bug could already be addressed now. But
> I think there was another important package affected by this regression
> although I don't remember.

Okay, so I can confirm that dash is no longer affected after #916255. However, qt5-qmake still misbehaves. Looking at the build log, qt5-qmake is built with _LARGEFILE_SOURCE=1 but not _LARGEFILE64_SOURCE=1 [1] while qt4-x11 is [2].

I will rebuild qtbase-opensource-source with _LARGEFILE64_SOURCE to see if that helps.

> [1] https://buildd.debian.org/status/fetch.php?pkg=qtbase-opensource-src&arch=m68k&ver=5.11.3%2Bdfsg1-1&stamp=1554325976&raw=0
> [2] https://buildd.debian.org/status/fetch.php?pkg=qt4-x11&arch=m68k&ver=4%3A4.8.7%2Bdfsg-18&stamp=1555145133&raw=0

Comment 44 John Paul Adrian Glaubitz 2019-05-01 08:29:26 UTC

Okay, interesting. It's not qmake that's broken (it's actually built with LARGEFILE64_SOURCE), but qtchooser:

(sid-sh4-sbuild)root@epyc:/# qmake
qmake: could not find a Qt installation of ''
(sid-sh4-sbuild)root@epyc:/# ls -l `which qmake`
lrwxrwxrwx 1 root root 9 Nov 27 20:22 /usr/bin/qmake -> qtchooser
(sid-sh4-sbuild)root@epyc:/# /usr/bin/qtchooser -qt=5 -run-tool=qmake
qtchooser: could not find a Qt installation of '5'
(sid-sh4-sbuild)root@epyc:/#

For comparison, here is qtchooser natively:

root@epyc:~> qtchooser -qt=5 -run-tool=qmake
Usage: /usr/lib/qt5/bin/qmake [mode] [options] [files]

QMake has two modes, one mode for generating project files based on
some heuristics, and the other for generating makefiles. Normally you
shouldn't need to specify a mode, as makefile generation is the default
mode for qmake, but you may use this to test qmake on an existing project
(...)

I have already tried building qtchooser with  -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE, but that didn't help so far.

Source of qtchoser is here: https://code.qt.io/cgit/qtsdk/qtchooser.git/

Comment 45 John Paul Adrian Glaubitz 2019-05-02 13:55:31 UTC

(In reply to John Paul Adrian Glaubitz from comment #44)
> I have already tried building qtchooser with  -D_LARGEFILE64_SOURCE
> -D_LARGEFILE_SOURCE, but that didn't help so far.
> 
> Source of qtchoser is here: https://code.qt.io/cgit/qtsdk/qtchooser.git/

Okay, if anyone is reading this. Building qtchooser with -D_FILE_OFFSET_BITS=64 fixes the problem for me.

Comment 46 Adhemerval Zanella 2019-05-02 13:58:23 UTC

I think maybe an option would to just avoid to return EOVERFLOW, as Joseph has a suggestion in comment #22, and make it clear in documentation the pitfalls about using getdents in non-LFS mode for scenarios (as Florian described).

Comment 47 dflo 2019-05-03 16:07:38 UTC

I'd gladly help testing this patch.  Those of us Gentoo ARM enthusiasts that use qemu to build packages have hit this issue as well.  I agree, it would be better if Qemu/kernel properly handled these cases, but until they do it might be easier to revert back to previous incorrect behaviour.

I've noticed more than just Dash, but also shared-mime-info, as well as Gentoo's libsandbox affected by this.  The latter wasn't as simple as enabling LFS since it is a syscall wrapper itself.  It actually had to be patched where it uses readdir() internally with _FILE_OFFSET_BITS=64.

It's not the packages I now know about that scare me, it's the potentially dozens of unknowns that will mysteriously not work as intended because (again, not glibc fault) no one bothered to check the return errno and catch the error at the very least.

Comment 48 dflo 2019-05-28 13:42:23 UTC

Created attachment 11808 [details]
Patch to bypass the return EOVERFLOW condition

Here's a patch I tried as a proof of concept.  I #if'd out the check for overflow in getdents.c, and installed glibc-2.29 in a gentoo chroot.

This worked around any issues with all of the packages that I noticed broken prior to the change.

Comment 49 John Paul Adrian Glaubitz 2019-12-10 11:31:00 UTC

Just as a heads-up: This issue still persists and while it can be fixed in some cases by building the affected packages with large file support enabled [1], there are still many packages which will just crash on qemu-user because of this problem.

So, I think we need some sort of work-around in glibc to address this issue. I understand this isn't a bug in glibc per se, but it's a change in glibc that negatively affects an important usecase which is qemu-user emulation.

> [1] https://salsa.debian.org/qt-kde-team/qt/qtchooser/commit/cc069f62f1c70528c31fb582245b1f2e799609a3

Comment 50 Aladjev Andrew 2019-12-23 22:24:59 UTC

Hello. I am reproducing issue inside "arm-unknown-linux-gnueabi".

glibc.patch:
sysdeps/unix/sysv/linux/getdents.c
- return INLINE_SYSCALL_ERROR_RETURN_VALUE (EOVERFLOW);
+ return INLINE_SYSCALL_ERROR_RETURN_VALUE (d_off >> 32);

fit.c:
int main (int argc, char *argv[]) {
  if (argc < 2) {
    return 1;
  }
  DIR *dir = opendir(argv[1]);
  if (dir == NULL) {
    return 1;
  }
  errno = 0;
  struct dirent *entry = readdir(dir);
  if (entry == NULL) {
    printf("%d\n", errno);
  }
  closedir(dir);
  return 0;
}

Reproducing:
LD_LIBRARY_PATH='/usr/arm-unknown-linux-gnueabi/lib' find /etc -type d -exec /tmp/fit \"{}\" \;

This thing returns no offset overflows for /proc, /dev and /sys directories. It returns not zero offset for any ext4 directory.

Comment 51 Aladjev Andrew 2019-12-24 00:15:28 UTC

This issue is related to ext4 implementation, ext4 provides junk instead of real offset. We are watching on the tail of some ancient ext4 workaround. This tail is documented well:

fs/ext4/dir.c:
* ext4_dir_llseek() calls generic_file_llseek_size to handle htree
* directories, where the "offset" is in terms of the filename hash
* value instead of the byte offset.

* Because we may return a 64-bit hash that is well beyond offset limits,
* we need to pass the max hash as the maximum allowable offset in
* the htree directory case.

* Upper layer (for example NFS) should specify FMODE_32BITHASH or
* FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
* directly on both 32-bit and 64-bit nodes, under such case, neither
* FMODE_32BITHASH nor FMODE_64BITHASH is specified.

This ancient thing is simple as a stone. Grandfathers didn't want to add separate hash field and they invented "hash2pos" and "pos2hash" bicycle. This "hash" shouldn't be actually visible for glibc. It is visible only because it become a "pos".

if ((filp->f_mode & FMODE_32BITHASH) ||
  (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
    return major >> 1;
  else
    return ((__u64)(major >> 1) << 32) | (__u64)minor;

"major >> 1" is the target value that we need to keep. If "d_off" overflow appeared - we can shift it ">> 32" and provide this value for user. The problem is that this value should be shifted "<< 32" before further usage.

So I am not sure how to provide a fix. Maybe it will be better to force "FMODE_32BITHASH" in kernel somehow. Maybe qemu can trigger this mode, maybe not.

Please switch from ext4 to good filesystem if you want reliable solution.

Comment 52 Aladjev Andrew 2019-12-24 14:10:21 UTC

I've created kernel bug https://bugzilla.kernel.org/show_bug.cgi?id=205957. I am just a tiny little developer, I am not experienced in kernel development, please help me with this bug.

Comment 53 Aladjev Andrew 2020-01-09 21:21:22 UTC

Please review getdents function once again:

https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/getdents.c#L21

What does it mean "!_DIRENT_MATCHES_DIRENT64"?

https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/bits/dirent.h#L54

#if defined __OFF_T_MATCHES_OFF64_T && defined __INO_T_MATCHES_INO64_T
# define _DIRENT_MATCHES_DIRENT64	1
#else
# define _DIRENT_MATCHES_DIRENT64	0
#endif

From architecture perspective "!_DIRENT_MATCHES_DIRENT64" means "we need to emulate getdents using getdents64".

So this issue can be titled correctly as "getdents emulation using getdents64 is not working in several cases".

/* Pack the dirent64 struct down into 32-bit offset/inode fields, and ensure that no overflow occurs. */

The main assumption is wrong. Packing "dirent64" down to "dirent" is not possible in general case. This issue is very popular. Musl libc has the same issue.

https://github.com/ifduyue/musl/blob/master/src/internal/syscall.h#L131-L134

/* fixup legacy 32-bit-vs-lfs64 junk */
#define SYS_getdents SYS_getdents64

Only UClibc (and uclibc-ng) works almost correctly with "getdents"
https://github.com/wbx-github/uclibc-ng/blob/master/libc/sysdeps/linux/common/getdents.c#L94-L116

You can enable "ARCH_HAS_DEPRECATED_SYSCALLS" config option and "getdents" won't be emulated with "getdents64", it will use regular "getdents" syscall.

Is it possible to fix "getdents" emulation? Yes, we need to replace "getdents64" syscall with "__X32_SYSCALL_BIT + __NR_getdents64". This syscall guarantees that packing "dirent64" down to "dirent" is possible. This replacement requires "CONFIG_X86_X32=y" in your host kernel config.

If I were glibc/musl core developer I will rewrite "getdents" in the following way:

1. If x32 syscall is available - use "__X32_SYSCALL_BIT + __NR_getdents64".
2. Otherwise do not try to emulate "getdents" using "getdents64", use regular "getdents" syscall.

Comment 54 Aladjev Andrew 2020-01-15 05:49:26 UTC

Created attachment 12210 [details]
getdents emulation for glibc

Comment 55 Aladjev Andrew 2020-01-15 05:49:50 UTC

Created attachment 12211 [details]
getdents emulation for qemu

Comment 56 Aladjev Andrew 2020-01-15 05:53:57 UTC

I've created 2 new patches: first one is using 0xffff (free 16 bit syscall number) for getdents emulation in glibc and second one is redirecting this syscall to x86 x32 kernel syscall.

If you have "CONFIG_X86_X32=y" and "qemu_binfmt" registered in your host kernel - than you will be able to build complete image using qemu user. I've tested it using "arm-unknown-linux-gnueabi" and "mips-unknown-linux-gnu", works perfect.

Comment 57 Thorsten Glaser 2020-09-17 21:42:40 UTC

Jessica,

>Easiest thing would be to patch gcc/config/whatever.h to include:
>
>    builtin_define ("_LARGEFILE_SOURCE=1");         \
>    builtin_define ("_LARGEFILE64_SOURCE=1");       \
>    builtin_define ("_FILE_OFFSET_BITS=64");        \

this will not work, unfortunately: glibc’s <fts.h> refuses to work with LFS.

But why are *new* ports supporting nōn-LFS at all anyway? Just define off_t as long long int on *all* new architectures, similar how it’s done with 64-bit time_t for new ILP32 arches… and similar to how the BSDs all operate as well (off_t is a quad there, period; this with my MirBSD developer hat).

Sorry for being late to this discussion, just found this from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916276 via https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970460 and having seen this impact Debian. (Current hat: Debian Developer)

Comment 58 Danny Milosavljevic 2020-10-02 08:54:44 UTC

The title should be changed to be more general because this does not only break qemu-user.  In fact, mentioning qemu would make it seem that it can be fixed in qemu-user--which it can't.

The same happens on aarch64 if running armhf executables (no qemu anywhere).

More details on https://lists.gnu.org/archive/html/guix-patches/2020-10/msg00059.html .

In order to test, patch fuse-2.9.9 like this:

diff -ru orig/fuse-2.9.9/lib/fuse_lowlevel.c fuse-2.9.9/lib/fuse_lowlevel.c
--- orig/fuse-2.9.9/lib/fuse_lowlevel.c 1970-01-01 01:00:01.000000000 +0100
+++ fuse-2.9.9/lib/fuse_lowlevel.c      2020-09-25 17:09:26.744879224 +0200
@@ -257,7 +257,7 @@
        struct fuse_dirent *dirent = (struct fuse_dirent *) buf;
 
        dirent->ino = stbuf->st_ino;
-       dirent->off = off;
+       dirent->off = off | 0x1234567890123;
        dirent->namelen = namelen;
        dirent->type = (stbuf->st_mode & 0170000) >> 12;
        strncpy(dirent->name, name, namelen);

Then run make.

mkdir -p /tmp/foo
Then invoke:
   examples/hello_ll /tmp/foo

Then run this test program:

#include <stdio.h>
#include <errno.h>
#include <assert.h>
#include <dirent.h>
#if defined( __ILP32__)
#warning ILP32
#endif

int main() {
        DIR* d;
        struct dirent* ent;
        d = opendir("/tmp/foo");
        if (d == NULL) {
                perror("opendir");
                return 1;
        }
        errno = 0;
        assert(sizeof(ent->d_off) == sizeof(off_t));
        while ((ent = readdir(d)) != NULL) {
                printf("%llX %s\n", (unsigned long long) ent->d_off, ent->d_name);
                if (ent->d_off > 0xffffffff)
                        printf("BIG\n");
        }
        if (errno)
                perror("readdir");
        return sizeof(off_t);
}

Compile once with -D_FILE_OFFSET_BITS=64, once with -D_FILE_OFFSET_BITS=32 and once with no -D_FILE_OFFSET_BITS.

You get this result:

system   _FILE_OFFSET_BITS off_t   d_off-sizeof   d_off-values
---------------------------------------------------------------
x86_64   -                 8 Byte  8 Byte         8 Byte
i686     -                 4 Byte  4 Byte         4 Byte
i686     64                8 Byte  8 Byte         FAIL*
i686     32                4 Byte  4 Byte         FAIL*
armhf    -                 4 Byte  4 Byte         FAIL*
armhf    64                8 Byte  8 Byte         8 Byte
armhf    32                4 Byte  4 Byte         FAIL*
a64armhf -                 4 Byte  4 Byte         FAIL*
a64armhf 64                8 Byte  8 Byte         8 Byte
a64armhf 32                4 Byte  4 Byte         FAIL*
aarch64  -                 8 Byte  8 Byte         8 Byte

*: Using FUSE filesystem with big d_off value.

None of those tests were done with qemu.  They were all native.

"i686" means "i686 on x86_64".

I argue that the only safe way to fix that once and for all is to use _FILE_OFFSET_BITS=64 on 32 bits.

I would implore glibc maintainers to mandate choosing a _FILE_OFFSET_BITS and fail compilation otherwise.  15 years of migration to LFS is more than enough.

Patch to do that:  At the end of dirent.h, add:

#ifndef _LIBC
#if __SIZEOF_LONG__ < 8
#ifndef __USE_FILE_OFFSET64
#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 32
#warning \"Using -D_FILE_OFFSET_BITS=32 and using readdir is a bad idea, see <https://bugzilla.kernel.org/show_bug.cgi?id=205957>\"
#else
#undef readdir
#define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@
#endif
#endif
#endif
#endif

And then in posix/glob.c at the beginning:

  #undef readdir

This makes it much easier for distributions to find the problem.

Otherwise the problem would be hidden in the sense that a lot of programs COMPILE just fine without _FILE_OFFSET_BITS--but they fail at runtime in unexpected ways.

Unexpected because they usually don't fail right away at runtime but only on the first strange readdir result.  A strange readdir result is one where d_off > 2**32 but it doesn't fit into the d_off slot.

Worse, a lot of clients do not check errno and just leave all the other files of that directory out once that happens.  When the strange readdir result appears depends on filesystem internals.

Comment 59 Danny Milosavljevic 2020-10-02 09:36:03 UTC

(In reply to Adhemerval Zanella from comment #32)
> I am not against in reverting back to use SYS_getdents for getdents64

SYS_getdents has 32 bit slots, including d_off, in the result and thus the kernel cannot tell you the truth in the result.  Having the kernel paper over this seems unwise--this would be/is basically the kernel lying to you.  As with all lies, the kernel then has to keep some kind of table which lies it told to whom and be consistent with them.  Why do that?

> although it is a subpar resolution for a kernel issue.  Newer architectures
> with mixed 32 and 64 bits support will continue to be broken without a
> proper kernel fix since they use SYS_getdents64 for getdents.

The kernel is the wrong place to work around this.  glibc should be using 64 bit struct dirent so it can actually handle the truth.

> What I think we should do is:
> 
>   1. *Deprecate* non-LFS usage in a multi-step way as discussed in
> libc-alpha [1]. We will need to take care of the issue brought by Joseph,
> but it will mean eventually the non-LFS interfaces will be just provided as
> compatibility symbols.

I agree.

>   2. Push to distro on 32-bits to *stop* building packages in non-LFS mode
> as default. Some distro already gets this right, but it seems some still
> lacking support.

For that to happen, please make glibc at least emit a warning--although with a problem this bad, I'd prefer an error--if _FILE_OFFSET_BITS != 64 and SIZEOF_LONG < 8 and readdir is used.

I use this in dirent.h:

#ifndef _LIBC
#if __SIZEOF_LONG__ < 8
#ifndef __USE_FILE_OFFSET64
#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 32
#warning \"Using -D_FILE_OFFSET_BITS=32 and using readdir is a bad idea, see <https://bugzilla.kernel.org/show_bug.cgi?id=205957>\"
#else
#undef readdir
#define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@
#endif
#endif
#endif
#endif

It's much better to find problems this way than to have programs fail at random times at runtime depending on file system internals.

Or use gcc's "deprecated" attribute on readdir, with a message, in order to at least warn.  But, really, does this sound like something harmless enough to only warn?  It does not to me.

>   3. Continue to push kernel developers to provide a correct fix for this
> issue. 

We shouldn't do that in the kernel (see beginning of this text).

It's impossible to store a 64 bit result into a 32 bit slot.

Also, if you call SYS_getdents64, you should expect a 64 bit result.  It's in the name.

Please don't use SYS_getdents.  Just please mandate LFS instead.  This should have been done long (decades) ago.

Comment 60 Florian Weimer 2020-10-02 09:46:51 UTC

(In reply to Danny Milosavljevic from comment #59)
> It's impossible to store a 64 bit result into a 32 bit slot.

You can do something like that if you can maintain a translation table. The kernel cannot do it due to the way the getdents64 system call works. glibc can do it for telldir/seekdir (allocating table slots on demand), which are the interfaces that are actually problematic.

> Please don't use SYS_getdents.  Just please mandate LFS instead.  This should have been done long (decades) ago.

LFS does not change the return type of telldir. So it does not fix the issue. We need to maintain a translation table for telldir and seekdir in DIR. It can be filled on demand, so that applications that do not call telldir are pretty much unaffected. The only thing that is a bit tricky is that we have to pre-allocate the slot during readdir because telldir must not fail.

Comment 61 Danny Milosavljevic 2020-10-02 10:41:42 UTC

(In reply to Florian Weimer from comment #60)
> (In reply to Danny Milosavljevic from comment #59)
> > It's impossible to store a 64 bit result into a 32 bit slot.
> 
> You can do something like that if you can maintain a translation table. The
> kernel cannot do it due to the way the getdents64 system call works. glibc
> can do it for telldir/seekdir (allocating table slots on demand), which are
> the interfaces that are actually problematic.
> LFS does not change the return type of telldir.

>So it does not fix the issue. 

I know what you mean.  Given the seekdir and telldir interface as it is now, mandating LFS does not fix telldir and seekdir, because they use "long", not "off_t".

The correct solution is to change the POSIX standard, too.  Short term, that won't be done.  But it still SHOULD be done.

In the mean time, I agree, you could make a mapping table for the rare cases where telldir and seekdir are actually used.  The question is where to store the latest (64 bit) d_off in the mean time (until telldir is called and you need it)...

However, the current practical problem is much worse:

32 bit apps on 64 bits cannot reliably call *readdir* without using LFS.  So far, Guix distribution has had to patch: gcc's libstdc++-v3, libidn2, fontconfig, libtasn1, openssl, libtool/libltdl, rhash, cmake-bootstrap, cmake, cyrus-sasl in order to even be able to *compile any end user program* as a distribution using cmake.  That had broken the entire distribution on 32 bit, to the point I'm now asking to remove 32 bit support entirely from our homepage.

Just to be clear, that is *without* using qemu.

This problem affects even 32 bit distributions without LFS on 32 bit kernels (that is not a typo)!

And if you enable LFS, it totally works fine in practice.

I estimate that compared to that, seekdir users are few.  And I agree that those  users still have a problem even after enabling LFS.

> We need to maintain a translation table for telldir and seekdir in
> DIR. It can be filled on demand, so that applications that do not call
> telldir are pretty much unaffected. The only thing that is a bit tricky is
> that we have to pre-allocate the slot during readdir because telldir must
> not fail.

I argue the easiest fixing step to do is still to mandate LFS.
Then glibc works in practice.  Without, it *really* does not work--even in theory.

But sure, also put workarounds into telldir and seekdir.  If possible.  Is it possible?

(readdir has been broken the last few glibc releases--I had thought it was my imagination that the compilation of gcc would always be stuck in an endless loop on armhf--but now it all makes sense).

Comment 62 Danny Milosavljevic 2020-10-02 11:03:04 UTC

(In reply to Florian Weimer from comment #60)
> (In reply to Danny Milosavljevic from comment #59)
> > It's impossible to store a 64 bit result into a 32 bit slot.
> 
> You can do something like that if you can maintain a translation table.

Mathematically speaking, no, you can't.  There cannot be a 1:1 mapping between all 64 bit values and all 32 bit values.

I know what you mean--in practice it could be good enough, if the directory doesn't have too many entries (or, depending on implementation, telldir isn't called too often--though first that implementation with only telldir doing the counting has to be possible.  Is it?).

But that's just kicking the can down the road--eventually, someone somewhere will have that many entries.  And then, the mapping will fail.

Comment 63 Jessica Clarke 2020-10-02 13:17:15 UTC

(In reply to Danny Milosavljevic from comment #62)
> (In reply to Florian Weimer from comment #60)
> > (In reply to Danny Milosavljevic from comment #59)
> > > It's impossible to store a 64 bit result into a 32 bit slot.
> > 
> > You can do something like that if you can maintain a translation table.
> 
> Mathematically speaking, no, you can't.  There cannot be a 1:1 mapping
> between all 64 bit values and all 32 bit values.
> 
> I know what you mean--in practice it could be good enough, if the directory
> doesn't have too many entries (or, depending on implementation, telldir
> isn't called too often--though first that implementation with only telldir
> doing the counting has to be possible.  Is it?).
> 
> But that's just kicking the can down the road--eventually, someone somewhere
> will have that many entries.  And then, the mapping will fail.

If you have more than 4 billion files then you really should not be using a 32-bit system. That is a very different (if related) problem from a single >4 GiB file. Even https://lwn.net/Articles/400629/ split the 1 billion files up into 1 thousand directories each with 1 million files.

Comment 64 Adhemerval Zanella 2020-10-02 14:22:00 UTC

> (In reply to Adhemerval Zanella from comment #32)
> > although it is a subpar resolution for a kernel issue.  Newer architectures
> > with mixed 32 and 64 bits support will continue to be broken without a
> > proper kernel fix since they use SYS_getdents64 for getdents.
> 
> The kernel is the wrong place to work around this.  glibc should be using 64
> bit struct dirent so it can actually handle the truth.

I agree and this an overlook from we glibc maintainers to allow newer 32-bit ABIs to support non-LFS interface. Current pratice now is to enforce 64-bit off_t for all newer ABIs (for instance as done for arc and riscv32).

However, there are still legacy ABIs which supports non-LFS and even one that support without having the legacy kernel interface (nios2 and csky for instance).

> 
> > What I think we should do is:
> > 
> >   1. *Deprecate* non-LFS usage in a multi-step way as discussed in
> > libc-alpha [1]. We will need to take care of the issue brought by Joseph,
> > but it will mean eventually the non-LFS interfaces will be just provided as
> > compatibility symbols.
> 
> I agree.
> 
> >   2. Push to distro on 32-bits to *stop* building packages in non-LFS mode
> > as default. Some distro already gets this right, but it seems some still
> > lacking support.
> 
> For that to happen, please make glibc at least emit a warning--although with
> a problem this bad, I'd prefer an error--if _FILE_OFFSET_BITS != 64 and
> SIZEOF_LONG < 8 and readdir is used.
> 
> I use this in dirent.h:
> 
> #ifndef _LIBC
> #if __SIZEOF_LONG__ < 8
> #ifndef __USE_FILE_OFFSET64
> #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 32
> #warning \"Using -D_FILE_OFFSET_BITS=32 and using readdir is a bad idea, see
> <https://bugzilla.kernel.org/show_bug.cgi?id=205957>\"
> #else
> #undef readdir
> #define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@
> #endif
> #endif
> #endif
> #endif
> 
> It's much better to find problems this way than to have programs fail at
> random times at runtime depending on file system internals.
> 
> Or use gcc's "deprecated" attribute on readdir, with a message, in order to
> at least warn.  But, really, does this sound like something harmless enough
> to only warn?  It does not to me.

This is quite disruptive and with potentialy breakage in a lot of scenarios. As suggested by Joseph [1], we need to make it more seamlessly over multiple releases. We already have the bug tracer to make 64-bit LFS default (BZ#13047), but we also need to take care of BZ#14106, BZ#15766, and glibc build/tests itself.

What I would like to get on 2.33 is to move ld.so and libc.so to use non-LFS internally first by using explicit *64 interface.  It will make build glibc itself with default LFS easier.

The point that will require a *lot* of work is to check and adapt the testcases to systematically check for all LFS interfaces in the various modes. 

> 
> >   3. Continue to push kernel developers to provide a correct fix for this
> > issue. 
> 
> We shouldn't do that in the kernel (see beginning of this text).
> 
> It's impossible to store a 64 bit result into a 32 bit slot.
> 
> Also, if you call SYS_getdents64, you should expect a 64 bit result.  It's
> in the name.
> 
> Please don't use SYS_getdents.  Just please mandate LFS instead.  This
> should have been done long (decades) ago.

I am not inclined to keep using non-LFS internally, ideally I would like to remove *all* non-LFS usage even on non-LFS symbols.  It would be similar to work done on y2038 support, with the advantage that it won't need to handle ENOSYS and add non-LFS fallback (we will just need to handel overflow as some ABIs do with 'generic' internal interfaces).

And I have a working solution for this issue [1]. I did not get much review on my last try [3], but debian and gentoo developers told me that it has fixed their issues on both qemu and bootstrap.

The bulk of the change is:

--
It allows to obtain the expected entry offset on telldir and set
it correctly on seekdir on platforms where long int is smaller
than off64_t.

On such cases telldir will mantain an internal list that maps the
DIR object off64_t offsets to the returned long int (the function
return value).  The seekdir will then set the correct offset from
the internal list using the telldir as the list key.

It also removes the overflow check on readdir and the returned value
will be truncated by the non-LFS off_t size.  As Joseph has noted
in BZ #23960 comment #22, d_off is an opaque value and since
telldir/seekdir works regardless of the returned dirent d_off value.

Finally it removed the requirement to check for overflow values on
telldir (BZ #24050).
--

And Florian has raised some question about making 'telldir' fails. The standard does not really allow it, but I think it is feasible concession to deprecated interface.

I will commit the first two patches, they have been acked on previous iterations (they are mainly refactoring and some interfaces fixes) and send remmaining ones to review. Maybe we get them on 2.33 and we can backport if required.

[1] https://sourceware.org/legacy-ml/libc-alpha/2019-01/msg00124.html
[2] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/bz23960
[3] https://sourceware.org/pipermail/libc-alpha/2020-April/112866.html

Comment 65 Thorsten Glaser 2020-10-02 23:06:43 UTC

>I agree and this an overlook from we glibc maintainers to allow newer
>32-bit ABIs to support non-LFS interface.

Indeed. (Again, from a BSD PoV, wondering why this is so at all.)

>Current pratice now is to enforce 64-bit
>off_t for all newer ABIs (for instance as done for arc and riscv32).

Good.

>However, there are still legacy ABIs which supports non-LFS and even
>one that support without having the legacy kernel interface (nios2 and
>csky for instance).

Ouch. That’s going to be tricky to fix.

But enabling LFS on architectures that support nōn-LFS
breaks <fts.h> so it’s not a generally usable fix either.

----

>In the mean time, I agree, you could make a mapping table for the rare
>cases where telldir and seekdir are actually used.

Can you make the linker choose a readdir implementation
based on the presence of any of telldir and seekdir?

Perhaps something with putting a readdir that just does
its job, declared weak, into one archive, and a single .o
containing telldir, seekdir, and a readdir that also
maps 64-bit to 32-bit values into another, so that the
latter is only chosen if telldir/seekdir are actually
called? (AIUI it’s even only needed if *seek*dir is
actually called, right?)

Comment 66 Florian Weimer 2020-10-03 13:54:37 UTC

(In reply to Danny Milosavljevic from comment #62)
> (In reply to Florian Weimer from comment #60)
> > (In reply to Danny Milosavljevic from comment #59)
> > > It's impossible to store a 64 bit result into a 32 bit slot.
> > 
> > You can do something like that if you can maintain a translation table.
> 
> Mathematically speaking, no, you can't.  There cannot be a 1:1 mapping
> between all 64 bit values and all 32 bit values.
> 
> I know what you mean--in practice it could be good enough, if the directory
> doesn't have too many entries (or, depending on implementation, telldir
> isn't called too often--though first that implementation with only telldir
> doing the counting has to be possible.  Is it?).
> 
> But that's just kicking the can down the road--eventually, someone somewhere
> will have that many entries.  And then, the mapping will fail.

Most Linux file systems use some hash-based approach, so that they do not have to maintain a separate lookup table for seeking in directories. A simple offset does not work because there are POSIX (and quality-of-implementation) requirements that after seekdir, the same sequence of entries is produced even if unrelated directory entries are created and removed. Because of the hashing involved, directories with tens of millions of entries may run into problems even with a 64-bit hash.

If a file system uses a separate data structure for directory seeking, it won't have a problem to generate 30-bit offsets, in which case glibc could avoid translation completely (if it reserves the offsets in the range INT_MAX/2 + 1 … INT_MAX for translation, which should be large enough).

Comment 67 Florian Weimer 2020-10-03 13:55:39 UTC

(In reply to Thorsten Glaser from comment #65)
> Can you make the linker choose a readdir implementation
> based on the presence of any of telldir and seekdir?

I don't think glibc should add complexity to optimize static linking on 32-bit platforms. There are other libcs that have that as their area of expertise.

Comment 68 John Paul Adrian Glaubitz 2021-08-24 11:48:31 UTC

Just as a heads-up to anyone stumbling over this bug report, the latest set of patches addressing the issue can be found here:

> https://sourceware.org/pipermail/libc-alpha/2020-October/118883.html

I have been using these patches without any issues on Debian for m68k and sh4.

Comment 69 John Paul Adrian Glaubitz 2022-05-15 20:53:26 UTC

For anyone else running into this problem:

The issue can be worked around by putting the target chroot onto a btrfs filesystem.

Comment 70 Adhemerval Zanella 2022-05-16 12:06:17 UTC

(In reply to John Paul Adrian Glaubitz from comment #69)
> For anyone else running into this problem:
> 
> The issue can be worked around by putting the target chroot onto a btrfs
> filesystem.

Does it still fail with my patches applied?

Comment 71 John Paul Adrian Glaubitz 2022-05-16 20:11:42 UTC

(In reply to Adhemerval Zanella from comment #70)
> (In reply to John Paul Adrian Glaubitz from comment #69)
> > For anyone else running into this problem:
> > 
> > The issue can be worked around by putting the target chroot onto a btrfs
> > filesystem.
> 
> Does it still fail with my patches applied?

It used to work with your patches, but that no longer seems to be the case which I find strange.

Comment 72 Adhemerval Zanella 2022-05-16 20:15:38 UTC

(In reply to John Paul Adrian Glaubitz from comment #71)
> (In reply to Adhemerval Zanella from comment #70)
> > (In reply to John Paul Adrian Glaubitz from comment #69)
> > > For anyone else running into this problem:
> > > 
> > > The issue can be worked around by putting the target chroot onto a btrfs
> > > filesystem.
> > 
> > Does it still fail with my patches applied?
> 
> It used to work with your patches, but that no longer seems to be the case
> which I find strange.

Do you have any information of what might be failing? It would be good to know if this is another non-LFS interface pitfall.

Comment 73 John Paul Adrian Glaubitz 2022-12-06 15:50:54 UTC

(In reply to Adhemerval Zanella from comment #72)
> (In reply to John Paul Adrian Glaubitz from comment #71)
> > (In reply to Adhemerval Zanella from comment #70)
> > > (In reply to John Paul Adrian Glaubitz from comment #69)
> > > > For anyone else running into this problem:
> > > > 
> > > > The issue can be worked around by putting the target chroot onto a btrfs
> > > > filesystem.
> > > 
> > > Does it still fail with my patches applied?
> > 
> > It used to work with your patches, but that no longer seems to be the case
> > which I find strange.
> 
> Do you have any information of what might be failing? It would be good to
> know if this is another non-LFS interface pitfall.

I can only say at the moment that it occurs with Java applications even with your patches applied.

For example, when running the "ant" command inside an emulated m68k chroot, I am getting the following error:

root@nofan:/# ant -version
Failed to locateorg.apache.tools.ant.Main
ant.home: /usr/share/ant
Classpath: /usr/share/ant/lib/ant-launcher.jar
Launcher JAR: /usr/share/java/ant-launcher-1.10.12.jar
Launcher Directory: /usr/share/java
root@nofan:/#

When working correctly - which is the case on btrfs and xfs, for example - the output looks like this:

(sid-m68k-sbuild)root@z6:/# ant -version
Apache Ant(TM) version 1.10.12 compiled on July 11 2022
(sid-m68k-sbuild)root@z6:/#