Bug 22390

Summary:	localedef triggers ppc64/ppc64le kernel bug in 4.12+ when run with explicit ld.so invocation
Product:	glibc	Reporter:	Florian Weimer <fweimer>
Component:	locale	Assignee:	Florian Weimer <fweimer>
Status:	RESOLVED MOVED
Severity:	normal	CC:	dan, tuliom
Priority:	P2	Flags:	fweimer: security-
Version:	2.26
Target Milestone:	---
Host:		Target:
Build:		Last reconfirmed:
Attachments:	log.4332 core.4332.xz reproducer.c

Description Florian Weimer 2017-11-03 06:31:37 UTC

We occasionally see a crash in localedef during “make install”.  I was able to capture a backtrace from a core file:

#0  sysmalloc (nb=nb@entry=1361968, av=av@entry=0x7fffbe400ed8 <main_arena>)
    at malloc.c:2768
#1  0x00007fffbe2b0f64 in _int_malloc (av=0x7fffbe400ed8 <main_arena>, 
    bytes=1361953) at malloc.c:4134
#2  0x00007fffbe2b6c4c in _int_realloc (nb=1361968, oldsize=680992, 
    oldp=0x7fffffb52690, av=0x7fffbe400ed8 <main_arena>) at malloc.c:4626
#3  __GI___libc_realloc (oldmem=0x7fffffb526a0, bytes=1361952) at malloc.c:3245
#4  0x0000000010049684 in xrealloc (p=<optimized out>, n=<optimized out>)
    at programs/xmalloc.c:102
#5  0x0000000010006938 in idx_table_add (value=105661, wc=<optimized out>, 
    t=0x7fffffb36438) at programs/3level.h:127
#6  find_idx (ctype=ctype@entry=0x7fffffb36420, table=0x7fffffb369a8, 
    table@entry=0xfbf, max=max@entry=0x7fffffb369b0, 
    act=act@entry=0x7fffffb369b8, idx=idx@entry=170244)
    at programs/ld-ctype.c:1234
#7  0x000000001000fe78 in find_idx (idx=170244, act=<optimized out>, 
    max=<optimized out>, table=<optimized out>, ctype=<optimized out>)
    at programs/ld-ctype.c:1526
#8  charclass_ucs4_ellipsis (ldfile=<optimized out>, ldfile=<optimized out>, 
    now=0x7fffffb362a0, step=1, handle_digits=<optimized out>, 
    ignore_content=<optimized out>, class_bit=<optimized out>, 
    class256_bit=<optimized out>, last_wch=170244, repertoire=<optimized out>, 
    charmap=<optimized out>, ctype=<optimized out>) at programs/ld-ctype.c:1542
#9  ctype_read (ldfile=<optimized out>, result=<optimized out>, 
    charmap=0x7ffffe0d0640, repertoire_name=<optimized out>, 
    ignore_content=<optimized out>) at programs/ld-ctype.c:2401
#10 0x000000001003c3ec in locfile_read (result=0x7fffffb35ed0, 
    charmap=0x7ffffe0d0640) at programs/locfile.c:173
#11 0x000000001000490c in load_locale (category=<optimized out>, 
    name=0x7fffffb35e90 "i18n", repertoire_name=0x0, charmap=0x7ffffe0d0640, 
    copy_locale=0x0) at programs/localedef.c:621
#12 0x0000000010010c9c in ctype_read (ldfile=0x7fffffb35d10, 
    result=0x7fffffb35980, charmap=0x7ffffe0d0640, repertoire_name=0x0, 
    ignore_content=<optimized out>) at programs/ld-ctype.c:2141
#13 0x000000001003c3ec in locfile_read (result=0x7fffffb35980, 
    charmap=0x7ffffe0d0640) at programs/locfile.c:173
#14 0x000000001000490c in load_locale (category=<optimized out>, 
    name=0x7fffffb35940 "nl_NL", repertoire_name=0x0, charmap=0x7ffffe0d0640, 
    copy_locale=0x0) at programs/localedef.c:621
#15 0x0000000010010c9c in ctype_read (ldfile=0x7ffffe0d0520, 
    result=0x7ffff5cbfef0, charmap=0x7ffffe0d0640, repertoire_name=0x0, 
    ignore_content=<optimized out>) at programs/ld-ctype.c:2141
#16 0x000000001003c3ec in locfile_read (result=0x7ffff5cbfef0, 
    charmap=0x7ffffe0d0640) at programs/locfile.c:173
#17 0x0000000010003720 in main (argc=<optimized out>, argv=0x7ffff5cc0520)
    at programs/localedef.c:252

2763        {
2764          remainder_size = size - nb;
2765          remainder = chunk_at_offset (p, nb);
2766          av->top = remainder;
2767          set_head (p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
2768          set_head (remainder, remainder_size | PREV_INUSE);
2769          check_malloced_chunk (av, p, nb);
2770          return chunk2mem (p);
2771        }

The fault happens here:

   0x00007fffbe2af090 <+384>:   xor     r8,r30,r26
   0x00007fffbe2af094 <+388>:   subf    r9,r31,r9
   0x00007fffbe2af098 <+392>:   addic   r26,r8,-1
   0x00007fffbe2af09c <+396>:   ori     r10,r31,1
   0x00007fffbe2af0a0 <+400>:   subfe   r26,r26,r8
   0x00007fffbe2af0a4 <+404>:   add     r31,r22,r31
   0x00007fffbe2af0a8 <+408>:   rldicr  r26,r26,2,61
   0x00007fffbe2af0ac <+412>:   ori     r9,r9,1
   0x00007fffbe2af0b0 <+416>:   or      r26,r10,r26
   0x00007fffbe2af0b4 <+420>:   std     r31,88(r30)
   0x00007fffbe2af0b8 <+424>:   addi    r3,r22,16
   0x00007fffbe2af0bc <+428>:   std     r26,8(r22)
=> 0x00007fffbe2af0c0 <+432>:   std     r9,8(r31)

Most variables have been optimized out, but:

remainder = 0x800000004800
r9 = 0x2b801
r31 = 0x800000004800

“info file” says this:

        0x0000000010000000 - 0x0000000010010000 is load1a
        0x0000000010010000 - 0x0000000010010000 is load1b
        0x0000000010060000 - 0x0000000010070000 is load2
        0x0000000010070000 - 0x0000000010080000 is load3
        0x00007fffba840000 - 0x00007fffbbce0000 is load4
        0x00007fffbcaf0000 - 0x00007fffbdf90000 is load5
        0x00007fffbe1a0000 - 0x00007fffbe1a0000 is load6
        0x00007fffbe1d0000 - 0x00007fffbe1d0000 is load7
        0x00007fffbe1e0000 - 0x00007fffbe1e0000 is load8
        0x00007fffbe1f0000 - 0x00007fffbe200000 is load9a
        0x00007fffbe200000 - 0x00007fffbe200000 is load9b
        0x00007fffbe3e0000 - 0x00007fffbe3e0000 is load10
        0x00007fffbe3f0000 - 0x00007fffbe400000 is load11
        0x00007fffbe400000 - 0x00007fffbe410000 is load12
        0x00007fffbe410000 - 0x00007fffbe430000 is load13
        0x00007fffbe430000 - 0x00007fffbe440000 is load14a
        0x00007fffbe440000 - 0x00007fffbe440000 is load14b
        0x00007fffbe470000 - 0x00007fffbe480000 is load15
        0x00007fffbe480000 - 0x00007fffbe490000 is load16
        0x00007ffff5ca0000 - 0x00007ffff5cd0000 is load17
        0x00007ffffe0d0000 - 0x0000800000030000 is load18

So address appears to be above the top of the heap.

This glibc is based on 2.26.  We've only seen this on ppc64 so far (not s390x).

Comment 1 Florian Weimer 2017-11-03 08:04:13 UTC

The crashing command is run from the localedata directory in the source tree.  The full command line looks like this:

/builddir/build/BUILD/glibc-2.26-65-ga76376df7c/build-ppc64-redhat-linux/locale/localedef --alias-file=../intl/locale.alias --no-archive -i locales/nl_AW -c -f charmaps/UTF-8 --prefix=/builddir/build/BUILDROOT/glibc-2.26-16.fc27.ppc64 nl_AW

It is executed with the built glibc, not the installed one, although both versions are quite close in this case.

Comment 2 Florian Weimer 2017-11-03 08:06:16 UTC

And it is necessary to set I18NPATH=. before running this command.

Comment 3 Andreas Schwab 2017-11-03 08:46:25 UTC

Also seen on arm and ppc64le.

Comment 4 Florian Weimer 2017-11-03 10:58:05 UTC

Other crashes at the same place, with:

        0x00007ffffca60000 - 0x0000800000030000 is load15
remainder = 0x800000004000
r9 = 0x2c001
r31 = 0x800000004000

        0x00007ffffc910000 - 0x0000800000060000 is load16
remainder = 0x800000032c00
r9 = 0x2d401
r31 = 0x800000032c00

So far, I have only seen this with an explicit loader invocation.  But with that, it also reproduces with the installed glibc.

Comment 5 Florian Weimer 2017-11-03 12:36:25 UTC

Created attachment 10571 [details]
log.4332

strace log from crash

Comment 6 Florian Weimer 2017-11-03 12:37:09 UTC

Created attachment 10572 [details]
core.4332.xz

coredump for log.4332 run

Comment 7 Florian Weimer 2017-11-03 12:40:10 UTC

Backtrace from core.4332:

#0  sysmalloc (nb=nb@entry=7340064, av=av@entry=0x7fffbe610ee8 <main_arena>)
    at malloc.c:2768
#1  0x00007fffbe4a9984 in _int_malloc (av=0x7fffbe610ee8 <main_arena>, 
    bytes=7340049) at malloc.c:4134
#2  0x00007fffbe4af92c in _int_realloc (nb=7340064, oldsize=3670032, 
    oldp=0x7fffffa35800, av=0x7fffbe610ee8 <main_arena>) at malloc.c:4626
#3  __GI___libc_realloc (oldmem=0x7fffffa35810, bytes=7340048) at malloc.c:3245
#4  0x0000000010049684 in xrealloc (p=<optimized out>, n=<optimized out>)
    at programs/xmalloc.c:102
#5  0x0000000010006938 in idx_table_add (value=142819, wc=<optimized out>, 
    t=0x7fffff267408) at programs/3level.h:127
#6  find_idx (ctype=ctype@entry=0x7fffff2673f0, 
    table=table@entry=0x7fffff267978, max=max@entry=0x7fffff267980, 
    act=act@entry=0x7fffff267988, idx=<optimized out>)
    at programs/ld-ctype.c:1234
#7  0x0000000010010934 in find_idx (idx=<optimized out>, act=<optimized out>, 
    max=<optimized out>, table=<optimized out>, ctype=<optimized out>)
    at programs/ld-ctype.c:1388
#8  ctype_read (ldfile=<optimized out>, result=<optimized out>, 
    charmap=0x7ffffd801610, repertoire_name=<optimized out>, 
    ignore_content=<optimized out>) at programs/ld-ctype.c:2312
#9  0x000000001003c3ec in locfile_read (result=0x7fffff266ea0, 
    charmap=0x7ffffd801610) at programs/locfile.c:173
#10 0x000000001000490c in load_locale (category=<optimized out>, 
    name=0x7fffff266e60 "i18n", repertoire_name=0x0, charmap=0x7ffffd801610, 
    copy_locale=0x0) at programs/localedef.c:621
#11 0x0000000010010c9c in ctype_read (ldfile=0x7fffff266ce0, 
    result=0x7fffff266950, charmap=0x7ffffd801610, repertoire_name=0x0, 
    ignore_content=<optimized out>) at programs/ld-ctype.c:2141
#12 0x000000001003c3ec in locfile_read (result=0x7fffff266950, 
    charmap=0x7ffffd801610) at programs/locfile.c:173
#13 0x000000001000490c in load_locale (category=<optimized out>, 
    name=0x7fffff266910 "nl_NL", repertoire_name=0x0, charmap=0x7ffffd801610, 
    copy_locale=0x0) at programs/localedef.c:621
#14 0x0000000010010c9c in ctype_read (ldfile=0x7ffffd801510, 
    result=0x7fffe4459b00, charmap=0x7ffffd801610, repertoire_name=0x0, 
    ignore_content=<optimized out>) at programs/ld-ctype.c:2141
#15 0x000000001003c3ec in locfile_read (result=0x7fffe4459b00, 
    charmap=0x7ffffd801610) at programs/locfile.c:173
#16 0x0000000010003720 in main (argc=<optimized out>, argv=0x7fffe445a120)
    at programs/localedef.c:252

remainder = 0x8000004b5830
r9 = 0x2a7d1
r31 = 0x8000004b5830

Memory map:

        0x0000000010000000 - 0x0000000010010000 is load1a
        0x0000000010010000 - 0x0000000010010000 is load1b
        0x0000000010060000 - 0x0000000010070000 is load2
        0x0000000010070000 - 0x0000000010080000 is load3
        0x00007fffb3e60000 - 0x00007fffb5300000 is load4
        0x00007fffb6110000 - 0x00007fffb75b0000 is load5
        0x00007fffb77d0000 - 0x00007fffb77d0000 is load6
        0x00007fffb7800000 - 0x00007fffb7800000 is load7
        0x00007fffb7810000 - 0x00007fffb7810000 is load8
        0x00007fffbe3e0000 - 0x00007fffbe3f0000 is load9a
        0x00007fffbe3f0000 - 0x00007fffbe3f0000 is load9b
        0x00007fffbe5f0000 - 0x00007fffbe5f0000 is load10
        0x00007fffbe600000 - 0x00007fffbe610000 is load11
        0x00007fffbe610000 - 0x00007fffbe620000 is load12
        0x00007fffbe620000 - 0x00007fffbe620000 is load13
        0x00007fffbe630000 - 0x00007fffbe650000 is load14
        0x00007fffbe650000 - 0x00007fffbe660000 is load15a
        0x00007fffbe660000 - 0x00007fffbe660000 is load15b
        0x00007fffbe690000 - 0x00007fffbe6a0000 is load16
        0x00007fffbe6a0000 - 0x00007fffbe6b0000 is load17
        0x00007fffe4430000 - 0x00007fffe4460000 is load18
        0x00007ffffd800000 - 0x00008000004e0000 is load19

If I'm not mistaken, 0x8000004b5830 + 16 is still within the 0x00007ffffd800000 - 0x00008000004e0000 mapping, and brk actually returned 0x00008000004e0000.  So this seems like a kernel bug.

Comments?

Comment 8 Florian Weimer 2017-11-03 12:46:48 UTC

I forgot to mention that I see this with kernel-4.13.9-200.fc26.ppc64 (from Fedora).

Comment 9 Florian Weimer 2017-11-03 17:08:20 UTC

I submitted this to the kernel people:

https://marc.info/?l=linux-mm&m=150972872411797&w=2
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-November/165567.html

Comment 10 Florian Weimer 2017-11-05 13:26:02 UTC

(In reply to Andreas Schwab from comment #3)
> Also seen on arm and ppc64le.

The arm issue is likely different.  ppc64le is the same (I see the crash there as well).  The upstream kernel maintainers identified the 128 TB memory layout by default as the likely cause of this regression.

Comment 11 Florian Weimer 2017-11-05 14:52:52 UTC

Created attachment 10574 [details]
reproducer.c

I'm attaching a simplified reproducer.  It needs to be run with an explicit linker invocation, so that the heap is placed above ld.so.

The reproducer is still probabilistic.  The increment might be so large that the second sbrk call fails.  But if the sbrk succeeds, the assignment at the end will always segfault when running under a kernel which has the bug.

Comment 12 Florian Weimer 2017-11-05 14:54:38 UTC

Identified as a kernel bug.