Bug 20568 - Segfault with wide characters and setlocale/fgetwc/UTF-8
Summary: Segfault with wide characters and setlocale/fgetwc/UTF-8
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: locale (show other bugs)
Version: 2.24
: P2 normal
Target Milestone: 2.30
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-07 20:09 UTC by Tobias Stoeckmann
Modified: 2019-05-16 09:12 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2016-09-28 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Stoeckmann 2016-09-07 20:09:49 UTC
I have spotted a bug which looks rather obscure to me. Please see this C code as a minimal way to reproduce this issue:

---
#include <locale.h>
#include <stdio.h>
#include <wchar.h>

int
main(void)
{
        setlocale(LC_ALL, "");
        fgetwc(stdin);
        return 0;
}
---

$ gcc -o poc poc.c
$ python -c 'print 13*"\t"' | LC_CTYPE=en_US.UTF-8 ./poc
Segmentation fault
$ python -c 'print 13*"\t"' | LC_CTYPE=POSIX ./poc
$ _

It means that I have to enter around 13 tabulator characters to trigger the issue, but it won't hurt to add a few more. I was able to reproduce this on other distributions with glibc 2.24, so I don't think that it's specific to one of them.

Also, this issue only happens with an LC_CTYPE of an UTF-8 locale. I have tested en_US and de_DE, which both trigger this issue. With POSIX or C, the segmentation fault is not triggered.

I hope this helps you to track down this bug, as I was unable to figure out the flush mechanisms in glibc in a reasonable time. :)


The stack trace on my system with glibc 2.24 looks like this:

(gdb) bt
#0  __GI__IO_wfile_sync (fp=0xb77295a0 <_IO_2_1_stdin_>) at wfileops.c:534
#1  0xb75e2bc6 in _IO_default_setbuf (fp=0xb77295a0 <_IO_2_1_stdin_>, p=0x0, len=0) at genops.c:523
#2  0xb75df2e2 in _IO_new_file_setbuf (fp=0xb77295a0 <_IO_2_1_stdin_>, p=0x0, len=0) at fileops.c:459
#3  0xb75e3516 in _IO_unbuffer_all () at genops.c:921
#4  _IO_cleanup () at genops.c:966
#5  0xb75a5632 in __run_exit_handlers (status=0, listp=0xb77293dc <__exit_funcs>, run_list_atexit=true, run_dtors=true) at exit.c:96
#6  0xb75a56f1 in __GI_exit (status=0) at exit.c:105
#7  0xb758f1b2 in __libc_start_main (main=0x804846b <main>, argc=1, argv=0xbfef4004, init=0x80484b0 <__libc_csu_init>, fini=0x8048510 <__libc_csu_fini>, 
    rtld_fini=0xb774d7a0 <_dl_fini>, stack_end=0xbfef3ffc) at ../csu/libc-start.c:323
#8  0x08048391 in _start () at ../sysdeps/i386/start.S:115
Comment 1 Florian Weimer 2016-09-28 16:44:12 UTC
This starts happening with tis commit:

commit 18d26750dd8fd328a78cf639fd0ec2494680a2a4
Author: Paul Pluzhnikov <ppluzhnikov@google.com>
Date:   Sun Mar 8 09:46:53 2015 -0700

    Cleanup: in preparation for fixing BZ #16734, fix memory leaks exposed by
    switching fopen()ed streams from mmap to malloc.
Comment 2 Florian Weimer 2016-09-28 16:59:05 UTC
Related discussion: https://lists.debian.org/debian-glibc/2016/09/msg00173.html
Comment 3 Fiodor 2018-12-15 18:48:58 UTC
Confirm this bug.

cat 1.c && gcc 1.c && ./a.out
#include <locale.h>
#include <wchar.h>
#include <stdio.h>

int main()
{
    setlocale(LC_ALL, "ru_RU.UTF-8");
    getwc(stdin);
    return 0;
}
11111111111111111111
*** stack smashing detected ***: <unknown> terminated
Аварийный останов (стек памяти сброшен на диск)
[faust@archlinux РАзная всячина]$ cat 1.c && clang 1.c && ./a.out
#include <locale.h>
#include <wchar.h>
#include <stdio.h>

int main()
{
    setlocale(LC_ALL, "ru_RU.UTF-8");
    getwc(stdin);
    return 0;
}
222222222222222222
*** stack smashing detected ***: <unknown> terminated
Аварийный останов (стек памяти сброшен на диск)
[faust@archlinux РАзная всячина]$ gdb ./a.out
GNU gdb (GDB) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/faust/Проекты/C/РАзная всячина/a.out 
22222222222222222222222
*** stack smashing detected ***: <unknown> terminated

Program received signal SIGABRT, Aborted.
0x00007ffff7de7d7f in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7de7d7f in raise () from /usr/lib/libc.so.6
#1  0x00007ffff7dd2672 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff7e2a878 in __libc_message () from /usr/lib/libc.so.6
#3  0x00007ffff7ebd415 in __fortify_fail_abort () from /usr/lib/libc.so.6
#4  0x00007ffff7ebd3c6 in __stack_chk_fail () from /usr/lib/libc.so.6
#5  0x00007ffff7e282dc in do_length () from /usr/lib/libc.so.6
#6  0x00007ffff7e27ca5 in _IO_wfile_sync () from /usr/lib/libc.so.6
#7  0x00007ffff7e2ef26 in _IO_default_setbuf () from /usr/lib/libc.so.6
#8  0x00007ffff7e2babe in __GI__IO_file_setbuf () from /usr/lib/libc.so.6
#9  0x00007ffff7e2f9a1 in _IO_cleanup () from /usr/lib/libc.so.6
#10 0x00007ffff7dea552 in __run_exit_handlers () from /usr/lib/libc.so.6
#11 0x00007ffff7dea58e in exit () from /usr/lib/libc.so.6
#12 0x00007ffff7dd422a in __libc_start_main () from /usr/lib/libc.so.6
#13 0x000055555555507e in _start ()
(gdb) q
A debugging session is active.

        Inferior 1 [process 2703] will be killed.

Quit anyway? (y or n) y
[faust@archlinux РАзная всячина]$ /lib64/libc.so.6 -v
GNU C Library (GNU libc) stable release version 2.28.
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 8.2.1 20180831.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
<https://bugs.archlinux.org/>.
[faust@archlinux РАзная всячина]$ uname -ar
Linux archlinux 4.19.8-arch1-1-ARCH #1 SMP PREEMPT Sat Dec 8 13:49:11 UTC 2018 x86_64 GNU/Linux
[faust@archlinux РАзная всячина]$
Comment 4 Igor Liferenko 2018-12-20 06:18:58 UTC
Hi,

I have just been beaten by this issue.

Tested on version 2.11.2 - this bug is not there.
The next earliest version that I tested on is 2.24 - bug is there.

The bug starts to show when 9 characters are input.

This bug does not show if "setlocale" is commented or if "fclose(stdin);" is
added before "return 0".

Hope this helps.

Regards,
Igor
Comment 5 Igor Liferenko 2018-12-20 07:04:01 UTC
I have done additional testing for number of input bytes.
Here is the report, where the range is number of input bytes and
text after '=' is the result of executing the following command:

    printf '%0.s1' $(seq N) | ./a.out

where N is the desired number of input bytes.

0-9 = terminated normally

10-61 = *** stack smashing detected ***: <unknown> terminated Aborted

62-105 = Segmentation fault

106-121 = *** stack smashing detected ***: <unknown> terminated Aborted

122-137 = Segmentation fault

138-??? = terminated normally

etc...
Comment 6 OxFF-Alex 2019-05-14 14:31:44 UTC
I also have this bug.

$ uname -a
Linux serbinov 4.18.0-17-generic #18~18.04.1-Ubuntu SMP Fri Mar 15 15:27:12 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -d
Description:	Ubuntu 18.04.2 LTS

But a have three different crashes depending on number of input symbols:

Code:
int main()
{
    setlocale(LC_ALL, "ru_RU.UTF-8");
    getwc(stdin);
    return 0;
}

Results:
$
$ ./a.out
111111111111111111111111
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
$
$ ./a.out
111111111111111111111111111111111111111111111111111111111111111111111111111111
free(): invalid pointer
Aborted (core dumped)
$
$ ./a.out
1111111111111111111111111111111111111
Segmentation fault (core dumped)
Comment 7 OxFF-Alex 2019-05-14 14:36:15 UTC
In additional:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=ru_RU.UTF-8
LC_TIME=ru_RU.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=ru_RU.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=ru_RU.UTF-8
LC_NAME=ru_RU.UTF-8
LC_ADDRESS=ru_RU.UTF-8
LC_TELEPHONE=ru_RU.UTF-8
LC_MEASUREMENT=ru_RU.UTF-8
LC_IDENTIFICATION=ru_RU.UTF-8
LC_ALL=
$
Comment 8 Andreas Schwab 2019-05-14 14:57:38 UTC
At this point in _IO_wfile_sync, delta is always negative:

	  nread = (*cv->__codecvt_do_length) (cv, &fp->_wide_data->_IO_state,
					      fp->_IO_read_base,
					      fp->_IO_read_end, delta);
Comment 9 Andreas Schwab 2019-05-14 15:32:00 UTC
That should fix it:

diff --git a/libio/wfileops.c b/libio/wfileops.c
index 5bc785b2b6..b30ef81813 100644
--- a/libio/wfileops.c
+++ b/libio/wfileops.c
@@ -508,11 +508,11 @@ _IO_wfile_sync (FILE *fp)
 	     generate the wide characters up to the current reading
 	     position.  */
 	  int nread;
-
+	  size_t wnread = fp->_wide_data->_IO_read_ptr - fp->_wide_data->_IO_read_base;
 	  fp->_wide_data->_IO_state = fp->_wide_data->_IO_last_state;
 	  nread = (*cv->__codecvt_do_length) (cv, &fp->_wide_data->_IO_state,
 					      fp->_IO_read_base,
-					      fp->_IO_read_end, delta);
+					      fp->_IO_read_end, wnread);
 	  fp->_IO_read_ptr = fp->_IO_read_base + nread;
 	  delta = -(fp->_IO_read_end - fp->_IO_read_base - nread);
 	}
Comment 10 cvs-commit@gcc.gnu.org 2019-05-15 14:49:07 UTC
The master branch has been updated by Andreas Schwab <schwab@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=32ff397533715988c19cbf3675dcbd727ec13e18

commit 32ff397533715988c19cbf3675dcbd727ec13e18
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue May 14 17:14:59 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
Comment 11 Andreas Schwab 2019-05-15 14:50:57 UTC
Fixed in 2.30.
Comment 12 cvs-commit@gcc.gnu.org 2019-05-15 15:23:56 UTC
The release/2.29/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c6177be4b92d5d7df50a785652d1912db511423e

commit c6177be4b92d5d7df50a785652d1912db511423e
Author: Andreas Schwab <schwab@suse.de>
Date:   Wed May 15 17:09:05 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
    
    (cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)
Comment 13 cvs-commit@gcc.gnu.org 2019-05-15 15:35:10 UTC
The release/2.28/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d948478bc586dec2fe3edd49e8e55f3893b3f854

commit d948478bc586dec2fe3edd49e8e55f3893b3f854
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue May 14 17:14:59 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
    
    (cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)
Comment 14 cvs-commit@gcc.gnu.org 2019-05-15 15:44:34 UTC
The release/2.27/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f9c3c12f3365c3e26aa11a31c6effea7d959f0ba

commit f9c3c12f3365c3e26aa11a31c6effea7d959f0ba
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue May 14 17:14:59 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
    
    (cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)
Comment 15 cvs-commit@gcc.gnu.org 2019-05-16 08:47:06 UTC
The release/2.26/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4385ec1d8af4203b23dce8c9dc2f1aff5acaf094

commit 4385ec1d8af4203b23dce8c9dc2f1aff5acaf094
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue May 14 17:14:59 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
    
    (cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)
Comment 16 cvs-commit@gcc.gnu.org 2019-05-16 08:48:14 UTC
The release/2.26/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=4385ec1d8af4203b23dce8c9dc2f1aff5acaf094

commit 4385ec1d8af4203b23dce8c9dc2f1aff5acaf094
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue May 14 17:14:59 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
    
    (cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)
Comment 17 cvs-commit@gcc.gnu.org 2019-05-16 09:12:35 UTC
The release/2.25/master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=60bc81ba47915817fb89bc2b80b0176ac1eeba07

commit 60bc81ba47915817fb89bc2b80b0176ac1eeba07
Author: Andreas Schwab <schwab@suse.de>
Date:   Tue May 14 17:14:59 2019 +0200

    Fix crash in _IO_wfile_sync (bug 20568)
    
    When computing the length of the converted part of the stdio buffer, use
    the number of consumed wide characters, not the (negative) distance to the
    end of the wide buffer.
    
    (cherry picked from commit 32ff397533715988c19cbf3675dcbd727ec13e18)